Forum OpenACS Development: Oracle SE vs. EE
makes very minor use of it - primarily bitmapped indexes.
Part of me says "let them eat EE" (to paraphrase Marie Antoinette).
If users want Oracle, make 'em pay the 3x-4x premium Oracle charges
for EE. If that scares them, well, more will be motivated to try the
PostgreSQL version of OpenACS 4.
That's my nasty side speaking.
My practical side tells me that we really want to give some sort
of official support for SE and to make installation easy, while
at the same time make it obvious that they'll get some additional
functionality with EE.
One question I have is whether or not the (ex | current) aD'ers here
have any empirical evidence that the use of bitmapped indexes
substantioally increase performance for ACTUAL WORKING ACS 4 SITES?
If I understand the implementation correctly, there's a space/time
tradeoff involved. I wonder, though, if in practice overall ACS 4
throughput is improved substantially.
If we do support EE extensions, we need to figure out a mechanism to
enable or disable use. I'm not thrilled with the idea of breaking out
a set of EE vs. SE datamodel files, on top of the PG vs. Oracle
Thoughts and ideas (and hard data) are welcome ...
If someone has EE maybe a note in the application that says, "you may get more performance with bitmapped indexes. Then again you may not and in any case the permissioning is so slow you won't know the difference anyway!"
There really aren't that many EE-specific features used in ACS 4/Tcl. Back in February I posted a list of tweaks to the ACS 4.1 datamodel that we needed to get it to load cleanly into SE. Both Don and Bruno Mattarllo asked in that thread if anyone at aD had ever done any testing. My guess is aD's silence means that no one ever did. Had aD just not up and walked away from ACS 4 Tcl I suspect there would have been more EE specific features used going forward. (Which would have been their perogative since they said all along you needed EE...)
- I read somewhere, and I think I verified it once for myself, that
the SE version doesn't support online rebuilds of an interMedia
index, meaning that your searching goes offline whie it's
rebuilding. Of course, the results from interMedia are so
whacked the users might not notice! :)
Having said that, I strongly believe that OpenACS should support
the SE version. Tags could be put in the SQL files telling the EE
achievers where to edit to take advantage of the extra features,
just like there used to be comments indicating the right indexes
to move to a separate tablespace if you had one lying around.
Once upon a time (pre-IPO fever) Philip used to say very
emphatically that the ACS should always work with the SE
version. That message got left behind somewhere along the
road to fame and fortune...
Don got bboard searches working on the OpenACS using standard db table search (is that right?)
Another approach might be for OpenACS to incorporate an open source search engine, using perhaps htdig or swish.
The argument appears to be that since your content is mostly in the db, the db is the best place for searches. The counter argument is who cares? Nothing is available that can't be found via a link, so IF a search engine can be found that can search selected portions of your site, and/or get urls with appropriate query parameters, then just use that. (In much the same way, the ACS doesn't reimplement mail, it just delegates the task to qmail.)
What is the OpenACS 4 (5) approach to searching?
interMedia, for Postgres support. And I believe that that solution
should support both Postgres and Oracle if at all possible.
However, for those who use Oracle, interMedia is there and for
one reason or another they might want to use it. IMHO OpenACS
should support it as fully as possible; maybe someone will
eventually learn how to really use it and teach the rest of us.
This does not mean I advocate being EE-centric, however; I just
don't think interMedia should be ignored or dropped. I really
have no idea how much of the lameness I see in it is due to
interMedia itself and how much is due to the ACS's limited use of
it; I suspect there is plenty of blame to go around. :)
datamodel, or is it something that can be controlled outside the
datamodel. Like setting a predicate within a table or calling an API
function to turn it on or ???
If it doesn't have to be buried within the datamodel, then it's not an
issue. People who want the dynamic update can use EE and set it up
to do immediate indexing outside the OpenACS install. We would, of
course, want to document this.
I guess the question needs to be more narrowly defined...are there EE
features that have to be enabled during install time (i.e. bitmap
indexes) that are so valueable that we need to support them?
Obviously EE supports things like hot failover and the like that
aren't part of SE. Setting up things like this are outside the scope of
the OpenACS boostrap installer's realm, though, so IMO aren't
pertinent. If Intermedia indexing can be configured outside the
initial install I think it, too, falls into this class and we can
safely ignore it. Other than documenting how to turn on immediate
indexing, of course.
I think it's a question of resources and it's a matter of practicality.
I would like to see OpenACS resources building an OpenACS 4 that will be best of breed on PostgreSQL or SAP. That's why I suggest looking at seeing what an opensource search engine solution might do for us.
If we put our efforts towards an opensource solution, we at first be taking a step backwards, but soon, we'll be further ahead than the current Intermedia solution. But if we can rely on Intermedia and it's current one keyword at a time search, we may never have the best of breed search solution that can differentiate an OpenACS site from a ColdFusion/Php-nuke/IIS site.
Which is not to say I don't want Intermedia to work with OpenACS using Oracle. Just that I think that Intermedia support should be more a goal more an outcome of the Classic ACS 3.4 project, than a design goal of an OpenACS 4 design.
AD was bound to always use the newest technologies of Oracle for some reason, which includes using all those cool EE features. It's a shame they did not get to use more .
Honestly, if I remember my conversation with the AD German oracle crack correctly, using EE is not necessary, at least in performance, for the stuff we do with the ACS. It might be more convenient, but not necessary. Therefore I would focus [on the Oracle side] on SE and if someone is interested in doing an EE port (for whatever reason), he should go ahead.
And try to keep the search engine issue seperate from the database. If someone likes a special search engine (e.g. Excalibur or Intermedia), he should be able to use it regardless of the DB (though I'm not sure if you could get and run Intermedia without Oracle). Therefore, getting a search engine connector would be a cool thing to have. But hey, who am I to talk, if I've got not even the slightest clue how to do this
Materialized views were also being investigated last I checked, and partitioning looks to me like an EE feature that is worth exploring.
Does silence when asked about bitmapped indexes indicate a lack of testing? Nope, it indicates that the person who knew the answer to the question didn't read that message, and that there wasn't anyone assigned to go get answers from other people at aD if they didn't personally know the answer. There really are simple logical explanations for this sort of thing: Question didn't get answered != no answer exists. I *believe* the answer is that the bitmapped indexes were added due to lab performance testing, and that they aren't critical to performance but do help a bit, so they should be kept around if you happen to have EE (and, since aD has EE, that's what is used). That's third-hand knowledge, though, which is why I have the "i believe" weasel words up there.
As for better Intermedia indexing, it's not hard to modify im_convert to do stemming or fuzzy matches and stuff. I did it for a client at aD. It slows down the queries by a noticeable amount, but it's probably worth it unless search traffic is very high. I think a lot of the problem with site wide search is that it's hard to tell how good or bad a match is. Unfortunately this is a feature of Intermedia - it's hard to interpret the numbers it returns except as a relative weighting within the current search, so one could assume the first hit was 100% and normalize to that, but one can't tell how good the first hit is (i.e. "Sorry user, there were no great matches, try modifying your search" vs. "Hey user, here are 100 links that may have something to do with your search, and this one at the top is the best match, but I won't tell you how good a match it is").
If somebody is feeling motivated, 2 things that would be good are:
1) trying to figure out how to answer "where do these results stop being relevant?"
2) modifying the search results page to show relevant excerpts from the page rather than page summaries.
Those two things were on my back burner a few months ago, but they never made it off the back burner, unfortunately.
<blockquote><i>I *believe* the answer is that the bitmapped
indexes were added due to lab performance testing, and that they aren't
critical to performance but do help a bit, so they should be kept around if you
happen to have EE (and, since aD has EE, that's what is used). </i></blockquote>
Look - we *know* that at least *some* aD people *believe* that bitmapped indexes are worth using. I knew that. You don't really know whether or not any empirical data exists. Telling me this doesn't help me - I'm already uncertain as to whether or not empirical data exists.
You do say one thing interesting - that *if* the empirical data exists then you *believe* that it "helps a bit" but "isn't crucial".
That's interesting, but doesn't tell me anything I can use. The fact that you work for aD doesn't make your *opinion* any more informed than mine, or anyone else's here.
I've heard opinions on the subject before, that's why I specifically asked for "empirical evidence that the use of bitmapped indexes substantioally
increase performance for ACTUAL WORKING ACS 4 SITES? "
If I want an opinion rather than empirical data from real sites I'll ask you, OK? Silence on aD's part doesn't prove that no testing was done, you are right about that. However ... it doesn't prove that testing *was* done, either, and given aD's track record I'm of the *opinion* that no one here, at least none of the non-aD types, would be surprised to find out that there was no realistic testing whatsover.
BTW an ex-aD source who doesn't really want to post in public says that he believes that no empirical testing was done...
Didn't really mean to end up on this tangent, so the short version is: I really like the ACS, I want to use ACS/OpenACS (depending on what I decide my personal tolerance for Oracle is), I may do some work on it later, but in the meantime I'm drawing on past experience and piping up when I feel like I might be useful or have something to say (i.e. I have worked with Intermedia/SWS a bit and am interested in the search engine problem)
In that document, however I recall reading that materialized views are not available in SE.
b-tree indexes. I've left the bitmap indexes in as comments.
While bitmap indexes perhaps make sense for objects and their
friends, I certainly don't see why they're needed for apm packages!
How many packages are there in a full installation? A few dozen?
The APM spends all if its time parsing .info files with a Tcl
parser, not extracting rows from the package, package version, and
package files tables! A b-tree index is certainly sufficient, and I
bet that for some of the queries Oracle selects a sequential scan
anyway (indexed scans can actually be slower on small tables than a
The online rebuild is currently specified in a PL/SQL procedure
which is part of the Site Wide Search package. It is now in an
"alter index" statement; it used to be part of a "create index"
statement in 3.x.
I think all you need to do is remove the word "online"
(search-tables.sql) and add a comment telling the user to put it
back if they have EE.
Oh, and make sure that some mention is made of all these
comments in the installation docs, and perhaps use an
identifying string in each one that people can grep for.
to problems. Has that changed in the past few months?
Maybe it should have a "-- enable if you have EE ... if you dare!"
comment on it.
There were serious problems w/ 8.1.5. There may have been some problems with 18.104.22.168. It looks like they were mostly sorted out with 22.214.171.124 (or .3? Never could get clear on exactly what the patch # was, have heard it named as both). Many of the problems are discussed in the ASJ article. I added a comment that the right way to do it seems to have ended up being a dbms job. I modified the docs a bit a long time ago, but unfortunately it looks like they still recommend ad-context-server, which we stopped using a while back in favor of a dbms job. I'm not sure if the problem was ever really "online", I think it was more that rebuilding the same thing twice at once used to cause problems.
Also, there was a bug in some versions of ACS 3.3 whereby multiple sites on the same box would overwrite each others jobs upon running the sws create script. Phong and Mike Bryzek fixed it somewhere around 3.4. Don't remember exact version #s.
End result? Syncing via dbms_job with online seems to work on Oracle 126.96.36.199, and it all seems to transparently quietly and happily work in ACS4 for the lucky people on Solaris boxes, and it seems to be possible to get going for people on Linux but I never learned the details (think you just have to comment out the INSO stuff since the filters aren't available on Linuxm, or replace their ctxhx conversion program with your own)
As we get closer to having an OpenACS 4 release it would be great if you or someone else with extensive Intermedia experience would review the existing ACS 4 docs, fold in whatever information can be gleaned else where (i.e. the ASJ notes), etc. The OpenACS team has historically been PG-centric for obvious reasons, so could use lots of help with Oracle-specific issues. Ken Kennedy, who is helping with the OpenACS 4 porting effort, is a professional Oracle DBA and has offered to help with Oracle stuff, though I don't know if he's had much experience with InterMedia.
So whatever help I can be I'm happy to offer, but don't expect miracles...*grin*
That's really pretty cool, you know ... Oracle guru at work, fiendisly slavering over a Postgres system at home ...