Forum OpenACS Q&A: Building a marketplace: NYSE, NASDAQ, CBOT
A client would like to build such an exchange that can handle offers to sell, offers to purchase as well as programmed trades (buy X if the price becomes less than Y, sell X if the price becomes greater than Z). This exchange is expected to meet a peak demand of approximately 500 transactions per second.
I would love to find out if this level of transaction can be (has been) handled with Oracle or PG, or if folks have approached this with specialized databases.
If this has been implemented with relational models, what do those models look like? How has the business logic been implemented? And how does one efficiently handle programmed trades? (I am blinded by a past experience which would suggest avoiding the relational world if possible, and tossing the whole thing into one giant Rete net.)
Any references? And if you have extensive experience building such markets, send me an email, I may have a job or contract for you in a few weeks.
And I could swear there was a reference here. If not here, then probably the PG hackers list.
Essentially the technique seemed to be to organize data not by tables but by denormalization into in-RAM structures based on knowledge of the types of queries that would be requested.
Not at all general purpose in the way an RDBMS is meant to be, and certainly not a replacement paradigm, but rather targetted for explicit needs, in this case fast queries against huge amounts of financial/market data.
This doesn't help much but at least perhaps will whet your appetite for searching further, both here and elsewhere.
Nice to hear from you, Jerry ... sounds like you're involved with some fairly heavy shit!
Most likely, you want to read up on TP (transaction processing) monitors which monitor the state of transactions across more than one database. They are usually pretty pricy. Your programmed trades sound like triggers implemented on every update to the price of the item.
Alternatively, put a big mainframe on the task with lots of fast disk.
there has been two discussions on high scalability maybe you find something valueable there...
When I was working with production systems, and hooking them into legacy systems and RDBM systems in realtime what struck me was the following.
Consider that you have an application with tons and tons of data. It may have a relatively simple data model. What is not so simple is the business logic, which if you could, could be modeled as many many many triggers. If A and B then update C. If C and not D then delete E.
Well that's a production system, but implemented in an RDBMS, and it's a damn shame the two communities could never speak to one another. Basically these productions systems have all the power of SQL and the SQL is embedded into rules. Another word for a rule would be a trigger. So a rete net has three components: a huge collection of data, a huge collection of IF->THEN rules, and an agenda. The agenda keeps a list of rules that have completely satisfied their IF clause and are ready to fire their THEN clause.
What a rete net does is take those hundreds and thousands of triggers and it precompiles, optimizes, and joins all the queries. So you have this huge network of condition nodes, A, ~A, B, C<2, D>Z, that appear in various rules. Whenever a new piece of data (object, record) enters the system, it trickles through this network. Sometimes that piece of data will find that it completes the match of one of these precomputed queries. When that happens, the "then" portion of that rule is added to the agenda. (There may be various ways to prioritize that agenda, depending perhaps on how complex the rule was, or how salient the system or developer thought the rule was.)
I believe a terrific application for it, might be in building a trading system with many, many different sorts of programmed trades. Trades that could apply depending on various arbitrage, option, or other sorts of conditions.
The problem with these systems is that they have always positioned themselves as better/different from an RDBMS, which is silly since folks that can buy Oracle can buy anything, and since all of these systems have to get their data from somewhere, and store their data back.
I have always wanted to see a data blade/plug in version of one of these things into a popular db: Oracle, DB/2, or PG. Anyway, it's cool stuff, and has usually turned into a way to poor your VC money down the hole slowly.
You have this huge network of condition fragments, and when a new piece of data (a record) comes in, it trickles through that network. Sometimes that piece of data finds that it and some data that has already entered the system completely satisifies the IF clause of a rule. When that happens, the THEN clause is added to the agenda. The THEN clause can access the specific records that matched the IF clause. Sometimes a piece of data results in many many rule matches being added to the agenda. Sometimes a piece of data doesn't satisfy a complete rule, and when that happens, it remains in the network in the queries that it helps partially match.
- Derivative Contracts
- Trading Packages
It would be nice to have this kind of Rete-algorithm capability in an RDBMS like Oracle or PostgreSQL. I don't know how triggers are implemented in Oracle exactly, but they definitely are not as sophisticated as rule-based systsms which use the Rete algorithm. I ran into this back when I was implementing an online bond trading site (which worked, but never even remotely approached the size of the NYSE!).
I forget the exact error, but Oracle does not allow recursive trigger execution. So, if you have programmed trades in the system, and you execute one order which changes system information (like "volume of this security traded today"), which could then cause another trade to execute - you have a problem. One can imagine ways to program around this limitation in Oracle, but a more general solution for such scenarios may call for something like the "coupling modes" introduced by HiPAC. At the time, our approach was simply to Not Do That; we avoided adding any features that would cause recursive trigger execution.
- Quotes: The market maker would enter new ask and offer prices, the quotes would then hit the ticker.
- Orders: Were "standing" orders for executions. If a quote triggered an order, it would send it to the execution processor.
- Executions: Could be received from the floor, from broker offices, or from our automated system. The trade was recorded to a "log" type file, and then it was broadcast to the ticker wire.
- Clearing: Handled offline at end off day. We literally made a tape to walk over to an IBM mainframe on the other side of the data center.
The quotation and trading system didn't have to hold a lot of information in memory: just the current states for all stocks. When a new quote or trade hit, it was logged and posted to a ticker. Oddly enough, we posted to SIAC (NY) and our own wire, then had code to filter out the trades when they came back over the NY wire. There were business rules about how far a quote could move (e.g. no more than a 1/4 point at a time on a downtick after an uptick, but maybe more after another downtick), and what constituted a valid trade.
The ordering system was handled by another group, so I have less experience with it. As far as my system was concerned, I fed it a ticker, and it sent back trades just like a broker terminal. But you can see how you eliminate some "networks" of triggers. The ordering system just responded to one message at a time per process thread as fast as messages were broadcast. There were some rules about how you could trade against a quote for a certain length of time (seconds), so you attached the quote to the trade order (and then the execution). That gave the system some flexibility in executing trades in a volatile market. Also, other entities could run their own automated ordering system if they liked, and they just posted trades to the execution system.
Clearing was scary, with big heavy IBM iron doing dark and mysterious things by people who wrote code in Cobol or JCL. They had impact on margin calls, and if things did not clear promptly or properly all sorts of expensive things could happen. Apparently PHLX nearly went bankrupt overnight (before my time there) because of a fouled tape. The exchange was financially responsible for losses it caused, and that night, if things hadn't recovered, it would have been millions.
CLIPS was intended was as a toolkit for building expert systems, and unfortunately, nothing I saw in the docs or FAQ mentioned integration with RDBMSs like Postgres. (There has definitely been research work into that sort of thing elsewhere, though.) It is also unclear how applicable CLIPS would be to non expert system production systems, like Jerry's large scale automated online exchange that started off this thread. Question 24 in the CLIPS FAQ, "Can CLIPS be used for real time applications?", implies that it may not scale to such demanding service loads, but mentions only the underlying Rete algorithm itself as a possibly bottleneck, so perhaps updating CLIPS to use newer algorithms like LEAPS would address any such problems.
Some details that probably won't help you on your current project but seemed interesting at the time. We were going to build two completely redundant sites hooked up with many and redundant T1s and VPNs -- the T1s were expected to have the least latency, but the VPNs were expected to be the most reliable as they are routable and supposed self healing.
IBM has a product like this for their Z servers, but it only works over about a 40 mile distance -- uh, that's good, they literally sell that as letting you have your main system in Manhattan and a backup in New Jersey. But it was too close for our needs (earthquake, hurricane, wars can still get to both sites.) So we had a client requirement to separate the sites by at least a thousand miles and an international border.
Over the course of about six months, while splashing their names all over the papers, Worldcom and Qwest's T1 prices dropped at least 70%. Soon they were so low I just wanted to buy them a case at a time and hand them out as office prices. T1s stretching across an ocean at a couple hundred a month.
We had tons of servers to offload and partition the problem (for instance we had two redundant machines whose only job was to be the syslog servers for everyone -- we felt that would make it harder for any potential hacker to eliminate his tracks and the cost for that is basically zero) but the system was actually fairly simple and not all that expensive until we started having SANs and the likes tossed at us to get the disk performance we thought we would need in order to satisfy the marketing projections.
Ignoring all the other servers and HSRP routers in the picture, with two completely redundant sites each with two database servers, and with one database server designated primary and the others receiving constant updates and keeping everything in sync, and with our entire database able to be kept in memory at one time, we eventually decided to throw away the disk drives. Our model could fit into memory, we didn't feel we would have a problem with locks, and we replicated the requests to the hot stand by machines. And it turned out that more than one productized real time database engine does the same thing. Run it without ACID on disk and it ticks over at 50K transactions a second on good but not great hardware and then run it with ACID and disks and it comes down to a few hundred transactions a second.
It was a real choice project, and when it was over, I'd never want another (well not for a couple of weeks.)
So we tossed out the SAN and then went the RAIDs and basically we were planning on building a specialized application engine that kept sync locally with his buddy and kept sync remotely over the VPN and T1s. And by paying enough for high availability hardware we felt our replication streams would do the job for us and give us the five nines we sough. And sticking on our business hats, we realized that well, in our domain, if worse came to worse during a failover and someone's transaction was lost (and we didn't see how that could be), that that was actually a cost that occurred infrequently enough to be covered in customer support's budget. I don't think that's a total copout, I actually think that given the right domain, it's a reasonable line item just as shrinkage is.
Now our lead enginer, he was brilliant and outstanding in every way and he was a good man too. Humanitarian man, man of wit, of humor. We worshiped the man, like a god, and followed every order however ridiculous.
October 9th, 0430 hours, sector 13420, head 15: I watched our server crawl along the edge of a straight razor. That's my dream. That's my nightmare. Crawling, slithering, along the edge of a straight razor, and surviving.So I'd like to say this worked and the continent did tip over, but what I think happened is that someone realized that our methods, marketing and engineering had become unsound and very obviously insane.
They, we, were out there operating without any decent restraint. Totally beyond the pale of any acceptable engineering and business conduct. And he was still in the field commanding his troops.
And so the project was terminated.
RADIO "PBR Street Gang, this is Almighty, over... This is Almighty, standing by, over. This is Almighty, how do you copy, over..."The horror. The horror..."
What about using solid state disks? You know, those devises that look like disks to the OS but are actually battery backed RAM. Wouldn't that have let you keep the Durability part of ACID, and yet maintain the much-faster-than-disks transaction rates?
So did the design end up using Rete nets in any way?
What real time database were you using, and how did you like it?
We designed this with fixed capacities in place (large capacity, but unchanging without restarting the application.) Like a reservoir, it has a fixed capacity or else it overflows. It was required to have a huge capacity, but it was fixed, and knowing that number let us determine how large to create and pre-allocate various tables that an RDBMS would not demand you create or pre-allocate.
We had A, C, I in memory and hard-coded in the code, and the D was to bubble up from the rest of the system we would put in place to meet the required system uptime. I believe this is analogous to how many OODBM systems can work for you.
In a non real-time basis, as system demand permitted, sweeps through our object model were to stream into a traditional database to permit analysis and reporting and things like that and to clean up memory.