Forum OpenACS Development: Oracle SE vs. EE

Collapse
Posted by Don Baccus on
Currently OpenACS 4 requires Oracle Enterprise Edition.  It really
makes very minor use of it - primarily bitmapped indexes.

Part of me says "let them eat EE" (to paraphrase Marie Antoinette).
If users want Oracle, make 'em pay the 3x-4x premium Oracle charges
for EE.  If that scares them, well, more will be motivated to try the
PostgreSQL version of OpenACS 4.

That's my nasty side speaking.

My practical side tells me that we really want to give some sort
of official support for SE and to make installation easy, while
at the same time make it obvious that they'll get some additional
functionality with EE.

One question I have is whether or not the (ex | current) aD'ers here
have any empirical evidence that the use of bitmapped indexes
substantioally increase performance for ACTUAL WORKING ACS 4 SITES?

If I understand the implementation correctly, there's a space/time
tradeoff involved.  I wonder, though, if in practice overall ACS 4
throughput is improved substantially.

If we do support EE extensions, we need to figure out a mechanism to
enable or disable use.  I'm not thrilled with the idea of breaking out
a set of EE vs. SE datamodel files, on top of the PG vs. Oracle
split.

Thoughts and ideas (and hard data) are welcome ...

Collapse
Posted by Jon Griffin on
My immediate reaction is dump EE.

If someone has EE maybe a note in the application that says, "you may get more performance with bitmapped indexes. Then again you may not and in any case the permissioning is so slow you won't know the difference anyway!"

Collapse
Posted by Michael A. Cleverly on
I agree with Jon. Let's make the target SE. Anyone who can afford $100K versus $20K can afford to create bitmap indicies on their own where needed.

There really aren't that many EE-specific features used in ACS 4/Tcl. Back in February I posted a list of http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid=000Yx2&topic%5fid=ACS%20Design&topic= (tweaks to the ACS 4.1 datamodel) that we needed to get it to load cleanly into SE. Both Don and Bruno Mattarllo asked in that thread if anyone at aD had ever done any testing. My guess is aD's silence means that no one ever did. Had aD just not up and walked away from ACS 4 Tcl I suspect there would have been more EE specific features used going forward. (Which would have been their perogative since they said all along you needed EE...)

Collapse
Posted by Jerry Asher on
If you target SE, then performance will be good in SE.  If you target EE (as aD did), then you're almost certainly going to be screwed in SE and perhaps equally screwed in lesser mortal dbs (pg, sap, etc.)
Collapse
Posted by Janine Ohmer on
I believe there is one other difference of some slight significance
- I read somewhere, and I think I verified it once for myself, that
the SE version doesn't support online rebuilds of an interMedia
index, meaning that your searching goes offline whie it's
rebuilding.  Of course, the results from interMedia are so
whacked the users might not notice! :)

Having said that, I strongly believe that OpenACS should support
the SE version.  Tags could be put in the SQL files telling the  EE
achievers where to edit to take advantage of the extra features,
just like there used to be comments indicating the right indexes
to move to a separate tablespace if you had one lying around.

Once upon a time (pre-IPO fever) Philip used to say very
emphatically that the ACS should always work with the SE
version.  That message got left behind somewhere along the
road to fame and fortune...

Collapse
Posted by Jerry Asher on
Intermedia is incredibly powerful, and yet, aD ACS classic could barely use it, with searches limited mostly to a single ORing of keywords.  It didn't use the stemming, the dictionaries, the phrase stuff at all.

Don got bboard searches working on the OpenACS using standard db table search (is that right?)

Another approach might be for OpenACS to incorporate an open source search engine, using perhaps htdig or swish.

The argument appears to be that since your content is mostly in the db, the db is the best place for searches.  The counter argument is who cares? Nothing is available that can't be found via a link, so IF a search engine can be found that can search selected portions of your site, and/or get urls with appropriate query parameters, then just use that.  (In much the same way, the ACS doesn't reimplement mail, it just delegates the task to qmail.)

What is the OpenACS 4 (5) approach to searching?

Collapse
Posted by Janine Ohmer on
Obviously there needs to be some searching solution other than
interMedia, for Postgres support.  And I believe that that solution
should support both Postgres and Oracle if at all possible.

However, for those who use Oracle, interMedia is there and for
one reason or another they might want to use it.  IMHO OpenACS
should support it as fully as possible;  maybe someone will
eventually learn how to really use it and teach the rest of us.

This does not mean I advocate being EE-centric, however;  I just
don't think interMedia should be ignored or dropped.  I really
have no idea how much of the lameness I see in it is due to
interMedia itself and how much is due to the ACS's limited use of
it;  I suspect there is plenty of blame to go around. :)

Collapse
Posted by Don Baccus on
Well...is the dynamic update of Intermedia indexes controlled by the
datamodel, or is it something that can be controlled outside the
datamodel.  Like setting a predicate within a table or calling an API
function to turn it on or ???

If it doesn't have to be buried within the datamodel, then it's not an
issue.  People who want the dynamic update can use EE and set it up
to do immediate indexing outside the OpenACS install.  We would, of
course, want to document this.

I guess the question needs to be more narrowly defined...are there EE
features that have to be enabled during install time (i.e. bitmap
indexes) that are so valueable that we need to support them?

Obviously EE supports things like hot failover and the like that
aren't part of SE.  Setting up things like this are outside the scope of
the OpenACS boostrap installer's realm, though, so IMO aren't
pertinent.  If Intermedia indexing can be configured outside the
initial install I think it, too, falls into this class and we can
safely ignore it.  Other than documenting how to turn on immediate
indexing, of course.

Collapse
Posted by Jerry Asher on
Oracle EE is a good thing.  A client that can pay for it qualifies as a client that can pay bills.  Nevertheless, I'm drawn to OpenACS for it's openness, it's goal of running well on open source solutions.

I think it's a question of resources and it's a matter of practicality.

I would like to see OpenACS resources building an OpenACS 4 that will be best of breed on PostgreSQL or SAP.  That's why I suggest looking at seeing what an opensource search engine solution might do for us.

If we put our efforts towards an opensource solution, we at first be taking a step backwards, but soon, we'll be further ahead than the current Intermedia solution.  But if we can rely on Intermedia and it's current one keyword at a time search, we may never have the best of breed search solution that can differentiate an OpenACS site from a ColdFusion/Php-nuke/IIS site.

Which is not to say I don't want Intermedia to work with OpenACS using Oracle.  Just that I think that Intermedia support should be more a goal more an outcome of the Classic ACS 3.4 project, than a design goal of an OpenACS 4 design.

Collapse
Posted by Malte Sussdorff on
I second Jerry's opinion (both on the client AND on focusing on OpenSource).

AD was bound to always use the newest technologies of Oracle for some reason, which includes using all those cool EE features. It's a shame they did not get to use more 😉.

Honestly, if I remember my conversation with the AD German oracle crack correctly, using EE is not necessary, at least in performance, for the stuff we do with the ACS. It might be more convenient, but not necessary. Therefore I would focus [on the Oracle side] on SE and if someone is interested in doing an EE port (for whatever reason), he should go ahead.

And try to keep the search engine issue seperate from the database. If someone likes a special search engine (e.g. Excalibur or Intermedia), he should be able to use it regardless of the DB (though I'm not sure if you could get and run Intermedia without Oracle). Therefore, getting a search engine connector would be a cool thing to have. But hey, who am I to talk, if I've got not even the slightest clue how to do this 😉

Collapse
Posted by David Eison on
Janine is right, Intermedia's "rebuild online" isn't part of SE on 8.1.6.  Haven't been able to find a comparison for 8.1.7 but no reason to believe it's changed.  Heard a rumor that Intermedia indexes would automatically be kept up to date in 8.1.7, but can't find any documentation to support that and don't have 8.1.7 to play with.  Phong or M.Bryzek would probably know.

Materialized views were also being investigated last I checked, and partitioning looks to me like an EE feature that is worth exploring.

Does silence when asked about bitmapped indexes indicate a lack of testing?  Nope, it indicates that the person who knew the answer to the question didn't read that message, and that there wasn't anyone assigned to go get answers from other people at aD if they didn't personally know the answer.  There really are simple logical explanations for this sort of thing: Question didn't get answered != no answer exists.  I *believe* the answer is that the bitmapped indexes were added due to lab performance testing, and that they aren't critical to performance but do help a bit, so they should be kept around if you happen to have EE (and, since aD has EE, that's what is used).  That's third-hand knowledge, though, which is why I have the "i believe" weasel words up there.

As for better Intermedia indexing, it's not hard to modify im_convert to do stemming or fuzzy matches and stuff.  I did it for a client at aD.  It slows down the queries by a noticeable amount, but it's probably worth it unless search traffic is very high.  I think a lot of the problem with site wide search is that it's hard to tell how good or bad a match is.  Unfortunately this is a feature of Intermedia - it's hard to interpret the numbers it returns except as a relative weighting within the current search, so one could assume the first hit was 100% and normalize to that, but one can't tell how good the first hit is (i.e. "Sorry user, there were no great matches, try modifying your search" vs. "Hey user, here are 100 links that may have something to do with your search, and this one at the top is the best match, but I won't tell you how good a match it is").

If somebody is feeling motivated, 2 things that would be good are:
1) trying to figure out how to answer "where do these results stop being relevant?"
2) modifying the search results page to show relevant excerpts from the page rather than page summaries.

Those two things were on my back burner a few months ago, but they never made it off the back burner, unfortunately.

Collapse
Posted by David Eison on
Oh, and while I have no problem with other search engines being investigated, I would love for whatever solution is come up with to still be available at the SQL level (i.e. it should return IDs that you can do something intelligent with, not just document names).  There is obviously so much more you can do by having "where" and "join" readily available that it would be a shame to throw that out (people may not have done many interesting things with that capability to date, but it's a toolkit so powerful and useful options should be kept open)
Collapse
Posted by Don Baccus on
Gee, I ask a question and I get a lecture followed by a non-answer:
I *believe* the answer is that the bitmapped indexes were added due to lab performance testing, and that they aren't critical to performance but do help a bit, so they should be kept around if you happen to have EE (and, since aD has EE, that's what is used).
Look - we *know* that at least *some* aD people *believe* that bitmapped indexes are worth using. I knew that. You don't really know whether or not any empirical data exists. Telling me this doesn't help me - I'm already uncertain as to whether or not empirical data exists.

You do say one thing interesting - that *if* the empirical data exists then you *believe* that it "helps a bit" but "isn't crucial".

That's interesting, but doesn't tell me anything I can use. The fact that you work for aD doesn't make your *opinion* any more informed than mine, or anyone else's here. I've heard opinions on the subject before, that's why I specifically asked for "empirical evidence that the use of bitmapped indexes substantioally increase performance for ACTUAL WORKING ACS 4 SITES? "

If I want an opinion rather than empirical data from real sites I'll ask you, OK? Silence on aD's part doesn't prove that no testing was done, you are right about that. However ... it doesn't prove that testing *was* done, either, and given aD's track record I'm of the *opinion* that no one here, at least none of the non-aD types, would be surprised to find out that there was no realistic testing whatsover.

Collapse
Posted by David Eison on
I do not work for aD, and do not wish for my ramblings on here to be interpreted as anything coming from aD, since they are not.
Collapse
Posted by Don Baccus on
OK, I was wrong about that ... I'll stand on the rest of my comment, though.

BTW an ex-aD source who doesn't really want to post in public says that he believes that no empirical testing was done...

Collapse
Posted by Michael A. Cleverly on
David, didn't you use to work for aD?
Collapse
Posted by David Eison on
Yes, until a little over a month ago.  So I do not wish to be seen as an aD source; right now I'm a member of the openacs.org community, providing personal ramblings with a background in ACS/Java 3.4 and 4.0/4.1.

Didn't really mean to end up on this tangent, so the short version is: I really like the ACS, I want to use ACS/OpenACS (depending on what I decide my personal tolerance for Oracle is), I may do some work on it later, but in the meantime I'm drawing on past experience and piping up when I feel like I might be useful or have something to say (i.e. I have worked with Intermedia/SWS a bit and am interested in the search engine problem)

Collapse
Posted by C. R. Oldham on
There is a document on Oracle's site that describes exactly what is different between SE and EE.  I can't remember where it is or what it's called.

In that document, however I recall reading that materialized views are not available in SE.

Collapse
Posted by Don Baccus on
OK, I've removed the bitmap indexes and replaced them with default
b-tree indexes.  I've left the bitmap indexes in as comments.

While bitmap indexes perhaps make sense for objects and their
friends, I certainly don't see why they're needed for apm packages!

How many packages are there in a full installation?  A few dozen?
The APM spends all if its time parsing .info files with a Tcl
parser, not extracting rows from the package, package version, and
package files tables!  A b-tree index is certainly sufficient, and I
bet that for some of the queries Oracle selects a sequential scan
anyway (indexed scans can actually be slower on small tables than a
sequential scan).

Collapse
Posted by Janine Ohmer on
Wow, this thread took off while I wasn't looking! )

The online rebuild is currently specified in a PL/SQL procedure
which is part of the Site Wide Search package.  It is now in an
"alter index" statement;  it used to be part of a "create index"
statement in 3.x.

I think all you need to do is remove the word "online"
(search-tables.sql) and add a comment telling the user to put it
back if they have EE.

Oh, and make sure that some mention is made of all these
comments in the installation docs, and perhaps use an
identifying string in each one that people can grep for.

Collapse
Posted by Don Baccus on
Last I heard, though, "online" wasn't being used at arsdigita.com due
to problems.  Has that changed in the past few months?

Maybe it should have a "-- enable if you have EE ... if you dare!"
comment on it.

Collapse
Posted by David Eison on
I tried to sort out the "state of the art" several months ago, and it was like this:
There were serious problems w/ 8.1.5.  There may have been some problems with 8.1.6.0.  It looks like they were mostly sorted out with 8.1.6.2 (or .3?  Never could get clear on exactly what the patch # was, have heard it named as both).  Many of the problems are discussed in the ASJ article.  I added a comment that the right way to do it seems to have ended up being a dbms job.  I modified the docs a bit a long time ago, but unfortunately it looks like they still recommend ad-context-server, which we stopped using a while back in favor of a dbms job.  I'm not sure if the problem was ever really "online", I think it was more that rebuilding the same thing twice at once used to cause problems.

Also, there was a bug in some versions of ACS 3.3 whereby multiple sites on the same box would overwrite each others jobs upon running the sws create script.  Phong and Mike Bryzek fixed it somewhere around 3.4.  Don't remember exact version #s.

End result?  Syncing via dbms_job with online seems to work on Oracle 8.1.6.2, and it all seems to transparently quietly and happily work in ACS4 for the lucky people on Solaris boxes, and it seems to be possible to get going for people on Linux but I never learned the details (think you just have to comment out the INSO stuff since the filters aren't available on Linuxm, or replace their ctxhx conversion program with your own)

Collapse
Posted by Don Baccus on
This is helpful, thanks, David.  The INSO filters are available for Linus with Oracle 8.1.7.  I've not checked the Oracle site recently, as of a couple of months ago only 8.1.7 EE was available (uncompressed, grrr).

As we get closer to having an OpenACS 4 release it would be great if you or someone else with extensive Intermedia experience would review the existing ACS 4 docs, fold in whatever information can be gleaned else where (i.e. the ASJ notes), etc.  The OpenACS team has historically been PG-centric for obvious reasons, so could use lots of  help with Oracle-specific issues.  Ken Kennedy, who is helping with the OpenACS 4 porting effort, is a professional Oracle DBA and has offered to help with Oracle stuff, though I don't know if he's had much experience with InterMedia.

Collapse
Posted by Ken Kennedy on
I've got a minor amount of experience with InterMedia, but I'm no guru. We have some in-house interest in using it more, though, so I may end up learning.... I don't presently have a Linux box with Oracle installed (postgres only on my machines at home, and we use Solaris and HP-UX at work), but it sounds like that isn't as much of an issue now.

So whatever help I can be I'm happy to offer, but don't expect miracles...*grin*

Collapse
Posted by Don Baccus on
Well...you're probably best placed to help us figure out ways to write code that works effectively for both platforms, Ken (as we've discussed in e-mail in regard to date expressions).

That's really pretty cool,  you know ... Oracle guru at work, fiendisly slavering over a Postgres system at home ...

Collapse
Posted by Ken Kennedy on
Oracle guru...ha. *grin* You make me laugh...

But I play one on TV!