Forum OpenACS Q&A: Response to Article on ACS x Zope

[AL]
Just thought I'd post item below with detailed technical explanation of ZODB write performance issues from someone who, unlike me, actually KNOWS. I believe it totally confirms my previously unsubstantiated strong belief that an RDBM approach (whether ACS or Z SQL) is absolutely essential for certain write intensive applications (including Ecommerce where it is also essential for other reasons relating to audit trails, interface with internal systems via ODBC etc etc).

The article also supports my belief that "Z Patterns" is highly relevant both in the Zope ZODB/Z SQL context and in the context of data abstraction for ports of an RDBM based application platform such as ACS between different SQL dialects. I can't imagine how Z Patterns could possibly provide a useful isolation of application issues from fundamentally different persistent storage mechanisms such as ZODB, and a *specific* SQL RDBM (and also LDAP etc) without *also* being useful for managing multiple SQL dialects. (Note the reference to easy porting from ZODB to "Oracle or Sybase").

This of course does not resolve issues about whether "Z patterns" is ready for prime time yet, but whether it is or not, a project such as OpenACS, where multiple SQL dialects is a *fundamental* issue, could only benefit from whatever effort it takes to *very* closely study Z Patterns - if only in the expectation of learning from their mistakes.

BTW the references to "relational storage" below are about use of ZODB stored in an RDBM instead of a flat file or simple dbm - often confused with the entirely different approach of using SQL to a separate RDBM for persistence without ZODB (the article is advocating the latter and explaining how Z Patterns allows easy common application development for either or both).

Seeya, Albert

[reply context snipped]

[Ty Sarna mailto:tsarna@endicor.com ]

Unfortunately, this doesn't deal with cases where the conflicting state
is contained in many objects (see note by PJE in the ZODB Wiki).

Also, there is a whole other area of difficulty for high-write-volume
ZODBs, which is the ammount of IO that needs to be done. First, by
nature ZODB can't rewrite a single attribute of an object, it has to
rewrite the entire thing.

Indexing is also a bear from an IO perspective. First, BTrees currently
keep a count at each level, so every change to a btree changes a node at
each level of the BTree. For a ZCatalog, there are a lot of btrees
(something like 2n+4 for n indexes, I think -- don't quote me on that,
it's been a while), and each one changes (last I looked, every index was
updated even if the value indexed in a particular one hadn't changed.
This may have been improved since). Not only is this bad from a hotspot
point of view (always a conflict on the root node of the tree), but you
end up doing a *lot* of IO. During my experiments that led to
BerkeleyStorage, I was watching the Data.fs grow by 47K per transaction
for adding indexed objects of ~1K in size. Watching this with
tranalyzer, this turns out to be 1K of object, and 46K of updated btree
pages :). Note that BerkeleyStorage only prevents the file from growing
that much -- it still has to do all that IO (in fact, it has to do ~2-3
times that much IO, due to the nature of BerkeleyDB. A relational
storage would have similar issues. For ammount of IO done, FileStorage
is about as efficient as you can possibly be -- it's just that it trades
that off against space reclamation).

Also, with any kind of Berkeley or Relational storage, there is a second
hidden IO and storage penalty: you're storing a btree inside a btree. In
other words, the lower-level DB uses btrees to store your objects,
including interior nodes of the higher-level ZODB btree. Every interior
node of the ZODB Btree needs a leaf node (and supporting interior nodes)
in the DB's btrees. so you get taxed twice, on both I/O and storage
space used.

Not to discourage anyone from using ZODB, necessarily. There are a lot
of things it's fantastic for, and without a doubt ZODB is getting better
at handling higher write ratios. Over time there will be more and
more applications that previously would have required an external SQL or
other kind of database that can be done in ZODB instead. However, there
will also IMHO always be applications that ZODB just isn't as suitable
for. You have to thing long and hard before committing to one or ther
other. And then there's the worry of what happens if you chose wrong.

We were faced with exactly these issues, and the extremes of them, to
boot. We have a *large*, *very* high write ratio, lots of indexes type
of application based on ZPublisher/DTML that we'd like to port
to/replace with something Zope based. Yet we might need to make another
instance of this same type of application used by only a few people with
a small ammount of data -- it would really suck to have to have to have
another instance of the same expensive database system to support a
miniscule ammount of data, because everything was coded only with SQL in
mind).

This is what led ultimately to ZPatterns -- you can write applications
and not have to decide up front on ZODB or SQL. And you can change your
mind later (Seen that TV commercial? suddenly your online store is
selling a zillion items per month instead of the 1000 you planned for.
oops!). You can even decide on an instance by instance basis. You
configure with ZODB for a small department or client, and Oracle or
Sybase for a huge one -- and the small guy doesn't have to pay for the
DB license and DBA!). Since then, we've discovered a number of other
benefits to the model.

Hmmm... I didn't intend to write a ZPatterns advertisement when I
started, honest! But this seems to have turned into one nonetheless :^)

_______________________________________________
Zope maillist - mailto:Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope-dev )