Forum OpenACS Q&A: some advice on a search...

Posted by David Kuczek on 10/11/01 07:07 PM

I could need some advice on a search...

We have one table where users can insert job_offers. Additionally we
have a table that holds information on the job_locations. Every
job_offer can have multiple job_locations.

job_offers has the following rows:

job_id, title, ...., location, postal_code, n_locations

job_locations has the following rows:

location_id, ..., location, postal_code

offers_locations_map has the following rows:

offers_locations_id, location_id, job_id

When a company inserts a job, a trigger counts how many locations were
selected for that job and inserts an integer into n_locations...

If only one location got selected (most usual) our server additionally
inserts into job_offers the postal_code and the location in order to
make the search quicker.

Now, I want to search all job_offers where postal_code > x for
example. The search should only search the job_locations table via the
offers_locations_map when n_locations > 1.
How do I do this? Does this make sense?

2: Response to some advice on a search... (response to 1)

Posted by Alex Sokoloff on 10/11/01 08:18 PM

I think you might want to rethink your design since you end up, as you are no doubt aware, trying to search for the same information in one of 2 places, depending on the number of locations per job. How does that "make the search quicker" compared to doing the obvious: joining the three tables and searching on postal_code in job_locations?

If only the rare job has more than one location, it's possible your design might be faster (anyone know?). If that's the case, you can get your data with a union statement.

3: Response to some advice on a search... (response to 1)

Posted by David Kuczek on 10/11/01 09:22 PM

I am everything but a sql guru, but the only problem that I had with joining the tables was that if some user selected i.e. 10 locations for a job, the rows from job_offers like title etc. would be searched 10 times in case somebody wanted to check out which job had postal_code > x and title = 'whatever'...

Am I wrong here?

Can I avoid postgres to look for the title in all 10 rows (only one would be necessary) while still checking out all rows concerning postal_code?

4: Response to some advice on a search... (response to 1)

Posted by Alex Sokoloff on 10/12/01 12:47 AM

Well, I don't pass myself off as a sql guru either... But how about:

select distinct jo.job_id, jo.title
from job_offers jo, job_locations jl, offer_location_map olm
where jo.job_id = olm.job_id
and olm.location_id = jl.location_id
and jl.postal_code > :bind_variable

This won't work if you want to attach list of locations to each row. In that case you'll need to write a pgplsql function or do some processing of rows in tcl.

5: Response to some advice on a search... (response to 1)

Posted by David Kuczek on 10/12/01 01:30 AM

If I had one job in my job_offers table and that job would have ten office_locations, your query would search all ten postal_codes of offers_locations but only one row of job_offers?

Didn't you forget: and lower(title) like lower('%$title%')?

Is there a call to see what exactly postgres is proceding with this search?

What do you mean with: "This won't work if you want to attach list of locations to each row."

What would be the difference if I used group by instead of distinct?

Thanks

6: Response to some advice on a search... (response to 1)

Posted by Gilbert Wong on 10/12/01 01:58 AM

First, you don't need a offers_location_id in the offers_location_map since each job_id and location_id combination is unique. You can define that table as:

create table offers_locations_map (
location_id integer references job_locations(location_id),
job_id integer references job_offers(job_id),
primary key (location_id,job_id)
);

Another thing you might want to do is to count the job locations in the TCL code instead of running triggers in the DB.

Regarding the search, I think Alex has the right idea. Because you are doing a join, if you do a condition like:

select
...
where jo.job_id = olm.job_id
and olm.location_id = jl.location_id
and jl.postal_code > :bind_variable
and lower(jo.title)=lower('%title%')

I would think that the database would be smart enough only search the jo table once for that condition and throw out those rows that don't match.

7: Response to some advice on a search... (response to 1)

Posted by Alex Sokoloff on 10/12/01 03:26 AM

And, since we're worried about performance, the clause:

and lower(jo.title)=lower('%title%')

...will trigger a full table scan. To get a search that uses an index, you need to:

1) create a trigger that populates a column lower_title in job_offers with lower(title)

2) create an index on lower_title

3) restrict the search to beginning of title or full title, and use a clause like:

and jo.lower_title = lower('${title_variable}%')

To be able to search - quickly - for a word anywhere within the title, you'll need to use Neophytos' openfts wizardry which is still very new (or intermedia with Oracle).

8: Response to some advice on a search... (response to 1)

Posted by Dan Wickstrom on 10/12/01 03:32 AM

You can also create a functional index to avoid messing around with an extra field and a trigger.

9: Response to some advice on a search... (response to 1)

Posted by Alex Sokoloff on 10/12/01 03:47 AM

Those don't seem to work so well in Oracle... maybe they do in Postgres?

10: Response to some advice on a search... (response to 1)

Posted by Dan Wickstrom on 10/12/01 03:58 AM

I thought we were talking about postgresql not oracle. As far as I know, functional indices work well in postgresql.

11: Response to some advice on a search... (response to 1)

Posted by David Kuczek on 10/12/01 09:10 AM

Thanks for your help,

could you point me to a functional index inside openacs? I only found the description and a very little example on google...

12: Response to some advice on a search... (response to 1)

Posted by Dan Wickstrom on 10/12/01 01:45 PM

They're hard to find, but just searching for indices that use the keywords upper and lower, I'm able to find a few. One using upper is located at ./ticket-tracker/sql/postgresql/category-create.sql:94. I also found one at ./bookmarks/sql/postgresql/bookmarks-create.sql:96, and I found a comment in the code that indicated that functional indices don't work on functions with more than one argument. However, I don't think that is a true statement. IIRC, the only limitation is that you can't have a constant arguement. In the case where you need a constant arg, then you need to create a wrapper function for indexing. An example where this might be useful would be when creating a functional index using substr.