Forum OpenACS Q&A: Response to Article on ACS x Zope - DB abstraction

Collapse
Posted by Albert Langer on
[BA]
Everyone keeps talking about DB abstraction like it's not big deal, as if it
were something we are simply choosing not to work on.

[AL]
Not everyone. In the first part of the message you are responding to I said:

"The real issue is the data abstraction layer. There's a LOT more to porting
between different RDBMs than just the drivers."

[BA]
Let's put it very simply: Oracle has outer joins. Postgres does not. If you
have some sort of DB abstraction where you supposedly have nothing to change
when you switch from Oracle to Postgres, it means one of three things:

your DB abstraction never uses outer joins, so you're less efficient on
Oracle, and in fact you're stuck with the lowest common denominator of SQL.

your Postgres query breaks because it doesn't know outer joins.

you've written a complete SQL replacement that encompasses all the concepts of
SQL (tables, joins, references, aggregates) that gets automagically transformed
to the correct underlying RDBMS-specific version of SQL.

[AL]
Let's add some further complications:

a) Postgresql is not just an "almost SQL92" RDBM but also an ORDBM which has inheritance, can create some amazing writeable views using "Rules" and can dynamically load data types and functions, aggregates, indexes and operators that can move a lot of stuff normally done in middleware right into the database engine where it can be done more efficiently. Actually using that stuff could eliminate a lot of middleware business rules and result in a far simpler set of queries and updates much closer to the application domain model than is possible using Oracle. Actually using those features would require a significant (although simplifying) restructure of the application itself. This would break easy upgrading to new releases of ACS classic.

b) Anyone optimizing a general web ecommerce package for a particular site will not be able to do much using just DBA techniques with indexing and tablespaces because there may not be many disk spindles to play with available. Small sites that cannot follow Oracle recommendations for more than a dozen disk spindles will still need to do some efficiency optimization based on the actual characteristics of their sites (actual local ratios of browsing, products, categories, subcategories, sessions, users, lookups, discounts, purchases etc). This will require local SQL tuning using "Explain" AFTER the tuning done by the module developers.

Thus when going for optimum efficiency, there is NO WAY to avoid using different SQL code, not just for different RDBMs, but even for different sites using the same RDBM.

On the other hand, many simple queries can even use "lowest common denominator SQL" independently of particular dialects, without much impact on efficiency. (Especially if they are really working on "views" created in DDL and stored procedures or rules optimized for each RDBM).

The issue then is what can a data abstraction layer do to make the maintenance of multiple versions of SQL DDL and DML as painless as possible for SQL developers and make it as invisible as possible to non-SQL developers working on the UI etc. It certainly cannot cannot completely eliminate the need for multiple SQL versions. But perhaps it could dramatically simplify the development and maintenance of those versions and reuse of their (considerable) commonalities?

[BA]
I can't say for sure that I know how Zope handles it, but my bet is on the
first solution: that you're left with common-denominator SQL power. Sure, it's
functional, but I'd be interested to see the actual queries performed and the
resulting performance.

[AL]
I don't fully understand how Zope handles it yet either, but I can confirm that you would lose that bet.

As I mentioned a week ago in this thread:

"2. For access to RDBMs databases via SQL, Zope provides pretty much exactly the
sort of data abstraction facilities that Ben was talking about in:

https://openacs.org/doc/sql-abstraction.html
"

The simple version just provides convenience for plugging parameters into SQL dialect specific templates. See the Z SQL documentation:

http://www.zope.org/Documentation/Guides/ZSQL

This is however a big advance on what ACS will be able to do, even with the proposed sql-abstraction, because people needing to edit SQL scripts and people needing to edit UI interfaces can be easily given fine grained permissions to do so from their web browsers without being able to screw everything else up easily.

The next level currently requires some vOOdOO to understand "ZPatterns". See especially the section on "Dynamic Data Objects" in the contents map at:

http://www.zope.org/Members/pje/Wikis/ZPatterns/HomePage/map

Suffice to say that it is NOT one of the three options that are possible without "very OO database OO" but a fourth option made possible by OO. Yes, you still need RDBM specific (or site specific) SQL for optimization, because there is NO WAY that can be avoided. But it hides this behind umpteen layers of framework code so that it is completely insulated from the application domain queries and updates and UI templates written by non-SQL developers. This makes life much harder for people implementing the framework (ie core Zope developers), just as life is hard for people writing the internals of Postgresql. But it makes life much easier for people customizing UIs, people customizing queries and updates based on application domain models and for people optimizing RDBMs engine access, because they can all keep out of each other's way. (If it's the same bunny doing them all, at least they can think about just one thing at a time).

[BA]
There is a reason for using an RDBMS, and that is the efficient storage and
retrieval of data. Adding abstraction on top of that can only make things less
efficient. It's not an easy compromise to make to simply give up efficiency for
compatibility.

[AL]
If "abstraction" meant least common denominator SQL, that would be true. If it just means that you are passing through multiple layers of OO framework on the way to the RDBM specific SQL, the overhead is insignificant compared with the benefits in portability and ease of development. (Especially when the framework is running in a single process and not being marshalled and unmarshalled continually like COM).

[BA]
OpenACS is trying to see how best to approach it, but the answer
is far from simple.

[AL] The answer is so far from simple that only open source projects could tackle it 😉

Data abstraction isn't an OpenACS problem, OpenACS is just encountering that common problem.

OpenACS (and ACS) people know far more about RDBMs than most Zopers ever will. (The discussion of "Indexing Agents" in the reference above strikes even me with my very limited RDBM understanding as somewhat naive).

Zopers and other OO experts know far more about abstraction than most RDBM experts ever will.

I think you should get together on this one. It's HARD.