Forum OpenACS Q&A: Response to How would one handle 12,000 db backed request per second?

12000 hits in a second. You say each query involves three tables, presumably a join? How many rows are going to be returned?

We're talking about 83 MICROSECONDS per query, dude. On my Celeron 500 laptop PG executes relatively simple queries returning one or a few rows in the ten-to-forty millisecond range. Oracle does the same.

That's two orders of magnitude too slow for what you want to do. Even a 16-way Sun monster's going to leave you about an order of magnitude short.

And frequent updating will hose you. Updating requires disk writes because the REDO log must be physically, not logically, written. Then every few minutes the RDBMS will checkpoint to disk, causing a filesystem synch or (in the case of Oracle and raw devices) buffers flushed physically to the disk.

At the risk of offending you, something scaled to this extent requires an expert, someone who already knows the answers or can tell you if it is even possible with any combination of general-purpose RDBMS software and hardware. If it is possible, a novice is unlikely to get it right.

And in this context I'd say that everyone who's posted here (not just you) is a rank novice. This is way off the map of our experience.

I'm going to tell you something that I don't know absolutely for certain, but which I think would be a fairly safe bet if I could find someone to take me up on it:

I bet that there are no general-purpose RDBMS installations out there processing transactions at this pace. Banks have their own transaction processing systems, SABRE (used by airlines) is a custom-built database system, etc etc. I think you're up in the realm of custom-based solutions.

I know there's at least one SQL but non-RDBMS system out there designed to handle massive hit rates of financial queries for the stock market. I forget the name, it's designed for datamining (i.e. querying not updating or inserting new data). They build a massive datastructure in RAM that essentially partitions the data both by row and by column. The approach is very specialized and not appropriate for general RDBMS activity. You don't do updates in this context, at least not frequent ones with ACID protection, etc.

This is the kind of solution space you're talking about if you're really serious about servicing that many queries in a second.