Forum OpenACS Development: Functions in .xql files...

Collapse
Posted by Jeff Davis on
As a followup to the irc discussion of functions which prevent creating generic queries here is a rough count of the top ten functions in -postgresql.xql files:
  140 acs_permission__permission_p
  119 acs_object__name
   37 content_item__get_title
   31 site_node__url
   27 content_item__get_path
   23 person__name
   23 content_item__get_live_revision
   23 content_item__get_latest_revision
   19 content_item__set_live_revision
   18 ttracker_option__option_name
acs_permission__permission_p and Dirk's favourite acs_object__name are the biggest culprit in the way of a more generic future. We should also conside removing the site_node__url calls since as Peter pointed out, it's an expensive call especially when you consider its always cached in memory.

Something to think about I guess.

Collapse
Posted by Don Baccus on
The content_item ones fall into the class of methods that follow from thinking that

content_item__get_live_revision(item_id)

is somehow more "abstract" than

content_item.live_revision

(i.e. referencing the column directly)

The queries would all run faster if code just referenced the columns directly.  They're never going away, no meaningful information is being hidden, etc etc.

Now, in C++ one can hide the underlying variable from the user and control access via "get/set_" functions.  If you don't provide a "set_foo" function and "foo" is private, you can guarantee that only your class implementation can change the value of foo.

So there really is some value in this approach in C++ (and certain other languages).

But SQL doesn't support "private" columns in a table so we're really getting nothing but slower queries and needless Oracle/PG specific .xql entries out of this approach.

That's just my somewhat strong opinion, how do others feel?

Collapse
Posted by Tilmann Singer on
I totally agree. Until now I had some doubt that content_item__get_live_revision does something else, e.g. checking for the publish_status and returning null if it's not live, but I just looked it up and it doesn't - it just retrieves the live_revision column value. It's fair enough after all to assume that an item is live as soon as it has an entry in live_revision.
Collapse
Posted by Don Baccus on
There's a function named something like "get_best_revision" or something like that.  It returns either the live revision or latest revision if there is no live revision.  Ignoring the fact that actually using it in a UI page would probably confuse the hell out of an editor, it is at least an example of a function that does something!
Collapse
Posted by Janine Ohmer on
That's just my somewhat strong opinion, how do others feel?

Well, following a call stack down 5 layers just to eventually return one lousy integer used to drive me crazy in my C++ coding days, too. IMHO one of the things required to be a really good OO programmer is knowing when to break the rules, because sometimes all they do is get in the way. You can hope that the compiler is optimizing out all those extra function calls, but you don't really *know* that it is...

So that's my long-winded way of saying yes, I agree - gratuitous use of abstraction is worse than none at all.

Collapse
Posted by Jun Yamog on
Hi,

Those db api are real expensive.  Although there are some queries that they make sense.  If the result set is only one, and I need to make use them on the select clause I use them.

Anyway the way I think we can do abstaction is on the tcl level.  Like in the cms that I am working on I there is a tcl proc "get_item -revision live".  This provides abstraction for developers that has less knowledge in CR.  More advance developers can go into just querying the data model directly.  Another advantage of abstraction either on tcl or sql is that we can fix optimization centrally.  Let say "get_item" tcl proc had a performance problem. When I fix the query things will get fixed, without hunting down those query with grep.

CCM does also try to provide this abstraction on the Java level.  Like its possible to have "ContentItem item = ContentItem.getLive(itemID);"

Collapse
Posted by Don Baccus on
Jun's saying something I was arguing a long time ago when Richard Li and friends were convinced I was an asshole and an idiot (they were half right :)  More Tcl API abstraction is what is wanted and needed, no doubt about it.  Tcl is the level we work at when we write applications on top of the datamodel.

Janine ... C++ includes the "inline" function modifier so the kind of efficiency hit one sees in SQL need not be there.  So one can choose to be religious without paying for it in cycles.  Not true in our space (I'm just saying that the approach overall makes more sense in C++ and no sense at all in our world, but "more sense" doesn't necessarily mean "sensible" :)  If you force people to do it in C++ it doesn't cost them execution speed due to "inline" being available (unless they're too stupid to use it) and does leave the door open to later more tightly controlling who changes the data.

But in our SQL world you get no control and just loose efficiency.  No gain, pain.  Hey, that's not the prescription for building a stronger body!  Damn!

Collapse
Posted by Janine Ohmer on
Janine ... C++ includes the "inline" function modifier so the kind of efficiency hit one sees in SQL need not be there.

Hmm... I don't recall seeing it used. This was code written by longtime C programmers who were rewriting everything in C++ mainly because it was the buzzword of the day, and they did everything pretty much the way the docs for beginners said to.

However, even if "inline" had been used it would have been just as annoying to debug. I run into this in OACS, too - going from function to function where all each one does is return the value of another function. Bleah! :)

Collapse
Posted by Jun Yamog on
Hi Don,

Sorry if I did not understand your comment.  Anyway which direction would OACS go?  Go higher - more tcl, or go lower - less tcl.

There are some inefficiency with the direction I am going, more tcl.  For example there are places I had to call 2 tcl procs wherein normally I can just call a single sql with a join.  But I decided not to optimized anymore since its not a bottleneck page.  Also it seems to be more readable codewise, as its more clear what I am trying to do.  Rather than making a query that some junior level developer will not figure out at once.

I am not yet all convinced with all this abstractions, although I am inclined to it.  Maybe its because I tried to experiment with it or maybe due to CCM experience, or both.  I am still open to where OpenACS will go.  Its really fasinating using 2 toolkits that came from the same ACS heritage.

I think all this abstraction thingy was born as a side effect of the more complex data model.  I sure miss 3.x days wherein you just query simple tables :)

Collapse
Posted by Peter Marklund on

I agree with Don here and feel we need to make more calls on the Tcl level. We should make the Tcl API our primary means of abstracting away underlying implementation and encapsulating business logic.

Using a Tcl API becomes exceedingly important as we use more and more caching and Tcl callbacks. Bypass the Tcl API and callbacks and caching will not work.

One of the most powerful patterns in software development that I have come across is the Observer pattern (with a publisher, a subscriber, events, and callbacks). By making the OpenACS kernel publish a set of callbacks we in effect give applications the power to customize the behaviour of the kernel without forking it.

We recently added callbacks related to installation, instantiation, and mounting of packages that are already proving very useful. I could imagine great value in having other types of callbacks such as before-request, after-request, after-user-instantiate, after-group-instantiate, after-user-register etc. etc.

Using callbacks can help us achieve upgradability, something that I feel strongly about these days, and I wrote a little comment about it in my Blog this morning.

Collapse
Posted by Don Baccus on
Yes, I'm a fan of callbacks, too - as long as one doesn't go too hog wild.  Nothing to excess ...

As far as caching ... I really need to send you the caching db_* API :)  You can cache queries as easily as caching Tcl API calls.  The major problem in each case is exactly the same:

keeping the cache synched with the db.

It's the interdependencies between tables and stuff like that which makes this difficult.  Caching on a query string is just as easy as caching on a Tcl API call signature and just as likely to be wrong when an underlying table changes ...

Which brings me to Jun's question.  I don't favor breaking a logical join into multiple Tcl calls.  That just won't scale (which you recognize by pointing out that your example didn't need to)

No ... I'm thinking about Tcl API that's high-level enough to do useful work for you.  For instance to manage the insertion of content repository data in an easy-to-use way that doesn't require you to micromanage the CR.  The upload file Tcl API I wrote for the CR doesn't work for types you've extended your self but is an example of useful CR Tcl API for a simple cr_revision.

The new Tcl-based service contract definitions that hide the SQL stuff from the client is a great example of what I'm talking about.  Make the API writer wrestle with all the PG/Oracle differences and free the application package author from that thankless task.

ad_form's ability to manage object_ids for the caller is another (though much less ambitious) example of hiding db complexity from the client.

The CMS's ability to generate forms from content types is another example of a high-level API that should be really useful once someone figures out how to make it easy to use (and to integrate with ad_form?  Oh boy!)

We need to think about high-level API that makes it easy to cobble together packages quickly.

Collapse
Posted by Jun Yamog on
Hi Don,

Thanks, I got a better understanding of what you want.

Regarding the proc to upload a file.  Well its already in a tcl proc.  And you are right that this is a good place to use tcl.  Since I can detect the mime type and insert a new one if needed, the whole upload is in a transaction, etc.  So the programer calls in a tcl proc, passes in the title, upload_file name, etc.  Its actually close to the file storage upload thingy, but instead of using PL/SQL and tcl, I used tcl and simple query and inserts.

The Tcl-based service contract is something that will be a good next to do.  (Although I am still a long way from that) Right now its on a primitive SC.  The packages are binded by a CR folder id.  Or maybe not binded,  I can't quite explain it.

The CMS ability to create the form depending on the content type is really great.  Deds also did the same for his classifieds modules.  Although both did not make use of ad_form, the standard form builder.  I think it can be possible by using ad_form -extend.  There should be standard way of storing what widget to use for a field.  I think right now some packages already store the widget type (forgot where the table is).  Something like in ad_form "foo_field:text(auto)" maybe helpful.  Auto widgets will look at the db on what widget foo_field is supposed to use.  Or something like that.  Atleast the design is enough to be easy, and also easy to override in case it really needs to.