Forum OpenACS Q&A: Performing queries on a cached query result?

1: Performing queries on a cached query result?

Posted by James Thornton on 03/13/03 08:45 PM

Last summer, I worked with ColdFusion on a project, and it enabled you to cache a query result and then perform queries on the cached query.

I am working on a OpenACS PG site where I need to randomize the order of a huge query result (essentially pull an entire table into memory), cache it in memory so it remains constant throughout the day, and query the data for specific subsets.

At the moment we are not pulling the entire table or randomizing the order of the query data; instead, we are caching the results of small queries with util_memoize.

Is there a way to do this with ns_sets?

Thanks.

2: Re: Performing queries on a cached query result? (response to 1)

Posted by David Walker on 03/13/03 10:13 PM

You could probably do it with ns_sets if you only need to pull data based on a single column. (I haven't tried it)

Your best bet is probably to build a temp table and query that.

I had an idea that it might be pretty cool to embed a lightweight sql engine (PicoSQL is one I ran across) into an AOLServer module that connects to tables in memory for use with reasonably complex information caching such as query results.

3: Re: Performing queries on a cached query result? (response to 2)

Posted by Andrew Piskorski on 03/14/03 06:09 PM

You definitely do not want to use ns_set for large amounts of data. If you need to lookup any values in that ns_set by key name, the performance will suck very, very badly. Remember, time to find a single key by name in an ns_set scales linearly with ns_set size... Use a Tcl array or nsv instead.

Extending AOLserver and/or the Tcl Thread Extension's nsv support to include in memory relational tables, with full ACId guarantees (no D for Durability of course as it's all in memory only), would be very nice. Such a feature would have been very convenient in some of the projects I've done. I don't really care whether it has a SQL query interface or not, just as long as it has the requisite power.

But, I've never yet had a probject where I really needed such functionality, I could always get by just with plain old AOLserver nsvs, which is what I did. It merely would have been very, very nice to be able to conveniently and efficiently model that in-memory information relationally.

I hadn't heard of picoSQL before, that's interesting. It claims to be multi-user, but is it thread safe? Can it be conveniently used in-memory only rather than writing to disk?

Last I looked, SQLite only supports a single write user and so can't be used effectively from multiple threads. Metakit says it does not "offer true concurrent access", but I think Jean-Claude Wippler plans to make it do so eventually. It's also unclear how easily either of those could be used as in-memory databases. The SLIB Portable Scheme Library has it's own Relational Database implementation that might be useful to look at for anyone building this sort of thing.

4: Re: Performing queries on a cached query result? (response to 1)

Posted by Talli Somekh on 03/15/03 05:38 AM

James, it kind of sounds like you are looking for an OLAP solution, no? Check out Mondrian an open source OLAP tool. It's in Java, so you might be able to use nsJava to solve your needs.

Just a guess.

talli

5: Re: Performing queries on a cached query result? (response to 1)

Posted by James Thornton on 04/20/03 07:42 AM

Just found eXtremeDB: http://www.mcobject.com/softwaredocs.htm -- "eXtremeDB is an in-memory database system (IMDS), developed specifically for use in embedded applications such as industrial control and industrial test equipment software, telecommunications equipment software, etc."

There is a shared-memory version, and it is said to be thread-safe.

Speed: "With a 206-Mhz StrongARM processor, eXtremeDB performed 100,000 hash index lookup operations in 0.79 seconds."

FAQ: http://www.mcobject.com/faq.htm

6: Re: Performing queries on a cached query result? (response to 1)

Posted by Jean-Claude Wippler on 04/18/04 05:18 PM

Saw a reference to Metakit here. If you are looking for an in-memory DB for use from Tcl, MK supports this: open a db with the "-shared" flag and it becomes available in every Tcl interp. Also, in-memory DB's are supported by simply leaving off the filename ("mk::file open db").

Details at http://www.equi4.com/metakit.html

7: Metakit, SQLite, Ratcl, etc. (response to 6)

Posted by Andrew Piskorski on 04/18/04 08:03 PM

Jean-Claude, thanks for chiming in. I hadn't realized that Metakit could be used in-memory now. Could you give us a brief update on how well Metakit would scale for concurrent access by multiple threads within the same process?

Btw, SQLite can also be used either on-disk or in-memory now, and there's been discussion on the SQLite list lately about concurrency improvements. It sounds like they have a workable design for an MVCC model with table level locking for writers, but it's not yet clear whether it will ever be implemented or not.

See also the Ratcl thread, about Jean-Claude's new prototype package for relational algebra in Tcl (eventually other languages as well). It might ultimately be quite useful as a simple in-memory RDBMS, probably with read/write mutex locking per table/view.

With any sort of in-memory database, I figure table level locking is probably good enough, even if it's strictly pessimistic mutex locking, rather than the preferable MVCC model where locks block only other writers, never readers.

Note that table locking is probably always less scalable than simple Nsv key/value pairs, which you can always split up btween more and more mutex buckets. And of course nsv/tsv is simple and just the right thing in many cases - but when it's not, having a real in-memory RDBMS database would be very, very handy.

A long time ago, someone also pointed out Konstantin Knizhnik's FastDB and GigaBASE, which do have a sort of MVCC with one database-wide lock for writers, but IMNSHO are hardly "relational" at all, so I have trouble even imagining when or why I would ever want to use them. (No joins, unions, or foreign key references at all; instead each row is a C++ object and that object is allowed to have references - including dangling refereces - to other objects in other tables, so you are free do query that table, then try to do more queries chasing all the references. Yuck!)

I mentioned the Scheme SLIB relational database above, way back when. If I remember correctly, it is strictly one reader/write only, so is unlikely to be useful, even as an example, for the type of concurrent web-oriented applications we tend to talk about here.

8: Re: Performing queries on a cached query result? (response to 5)

Posted by Andrew Piskorski on 04/18/04 09:25 PM

eXtremeDB, mentioned above, is commercial of course. You have to fill out a registration form in order to read any Docs, so I did not bother. But their top level blurb says:

"eXtremeDB stores data in the exact form in which it is used by the application - no mapping a C data element to a relational representation, for example"

And their white paper appears to be a misguided attempt to tout the supposed superiority of their one record at a time "navigational API" over SQL-like set-based APIs. Therefore, yuck, this product is Not Interesting.