Forum OpenACS Q&A: some advice on a search...

Collapse
Posted by David Kuczek on
I could need some advice on a search...

We have one table where users can insert job_offers. Additionally we
have a table that holds information on the job_locations. Every
job_offer can have multiple job_locations.

job_offers has the following rows:

job_id, title, ...., location, postal_code, n_locations

job_locations has the following rows:

location_id, ..., location, postal_code

offers_locations_map has the following rows:

offers_locations_id, location_id, job_id

When a company inserts a job, a trigger counts how many locations were
selected for that job and inserts an integer into n_locations...

If only one location got selected (most usual) our server additionally
inserts into job_offers the postal_code and the location in order to
make the search quicker.

Now, I want to search all job_offers where postal_code > x for
example. The search should only search the job_locations table via the
offers_locations_map when n_locations > 1.
How do I do this? Does this make sense?

Collapse
Posted by Alex Sokoloff on
I think you might want to rethink your design since you end up, as you are no doubt aware, trying to search for the same information in one of 2 places, depending on the number of locations per job. How does that "make the search quicker" compared to doing the obvious: joining the three tables and searching on postal_code in job_locations?

If only the rare job has more than one location, it's possible your design might be faster (anyone know?). If that's the case, you can get your data with a union statement.

Collapse
Posted by David Kuczek on
I am everything but a sql guru, but the only problem that I had with joining the tables was that if some user selected i.e. 10 locations for a job, the rows from job_offers like title etc. would be searched 10 times in case somebody wanted to check out which job had postal_code > x and title = 'whatever'...

Am I wrong here?

Can I avoid postgres to look for the title in all 10 rows (only one would be necessary) while still checking out all rows concerning postal_code?

Collapse
Posted by Alex Sokoloff on

Well, I don't pass myself off as a sql guru either... But how about:

select distinct jo.job_id, jo.title
from job_offers jo, job_locations jl, offer_location_map olm
where jo.job_id = olm.job_id
and olm.location_id = jl.location_id
and jl.postal_code > :bind_variable

This won't work if you want to attach list of locations to each row. In that case you'll need to write a pgplsql function or do some processing of rows in tcl.

Collapse
Posted by David Kuczek on
If I had one job in my job_offers table and that job would have ten office_locations, your query would search all ten postal_codes of offers_locations but only one row of job_offers?

Didn't you forget: and lower(title) like lower('%$title%')?

Is there a call to see what exactly postgres is proceding with this search?

What do you mean with: "This won't work if you want to attach list of locations to each row."

What would be the difference if I used group by instead of distinct?

Thanks

Collapse
Posted by Gilbert Wong on

First, you don't need a offers_location_id in the offers_location_map since each job_id and location_id combination is unique. You can define that table as:

create table offers_locations_map (
location_id integer references job_locations(location_id),
job_id integer references job_offers(job_id),
primary key (location_id,job_id)
);

Another thing you might want to do is to count the job locations in the TCL code instead of running triggers in the DB.

Regarding the search, I think Alex has the right idea. Because you are doing a join, if you do a condition like:

select
...
where jo.job_id = olm.job_id
and olm.location_id = jl.location_id
and jl.postal_code > :bind_variable
and lower(jo.title)=lower('%title%')

I would think that the database would be smart enough only search the jo table once for that condition and throw out those rows that don't match.

Collapse
Posted by Alex Sokoloff on
And, since we're worried about performance, the clause:

and lower(jo.title)=lower('%title%')

...will trigger a full table scan. To get a search that uses an index, you need to:

1) create a trigger that populates a column lower_title in job_offers with lower(title)

2) create an index on lower_title

3) restrict the search to beginning of title or full title, and use a clause like:

and jo.lower_title = lower('${title_variable}%')

To be able to search - quickly - for a word anywhere within the title, you'll need to use Neophytos' openfts wizardry which is still very new (or intermedia with Oracle).

Collapse
Posted by Dan Wickstrom on
You can also create a functional index to avoid messing around with an extra field and a trigger.
Collapse
Posted by Alex Sokoloff on
Those don't seem to work so well in Oracle... maybe they do in Postgres?
Collapse
Posted by Dan Wickstrom on
I thought we were talking about postgresql not oracle.  As far as I know, functional indices work well in postgresql.
Collapse
Posted by David Kuczek on
Thanks for your help,

could you point me to a functional index inside openacs? I only found the description and a very little example on google...

Collapse
Posted by Dan Wickstrom on
They're hard to find, but just searching for indices that use the keywords upper and lower, I'm able to find a few.  One using upper is located at ./ticket-tracker/sql/postgresql/category-create.sql:94.  I also found one at ./bookmarks/sql/postgresql/bookmarks-create.sql:96, and I found a comment in the code that indicated that functional indices don't work on functions with more than one argument.  However, I don't think that is a true statement.  IIRC, the only limitation is that you can't have a constant arguement.  In the case where you need a constant arg, then you need to create a wrapper function for indexing.  An example where this might be useful would be when creating a functional index using substr.