Forum OpenACS Improvement Proposals (TIPs): TIP #71 (Rejected): Add Jon Griffin's new Paginator Procs to Core Release

*Proposal*

a) Add Jon Griffin's new paginator code to future releases of OpenACS. Information available here:
http://jongriffin.com/static/openacs/paginate/

b) Add support for this new paginator into List Builder.

*Reason*

a)
  1) It works really well.
  2) The current paginator does not.
  3) Is a lightweight alternative to list builder for large data sets where simple pagination and navigation is all that is required.

b)
  1) Pagination in List Builder at present does not work well.
  2) Supporting this within List Builder would help to avoid divergence in 'look and feel' and general display code structure which would increase workload later.

*Implementation*

Simply copy:

tools-procs.tcl
tools-procs-oracle.xql
tools-procs-postgresql.xql
tools-procs.xql

into ../acs-tcl/tcl/

...so that the procs and queries are sourced at server start-up.

Just to clarify: The section under Implementation relates to part a). I am not clever enough to suggest how to do b). Perhaps Lars could point me in the right direction (hint hint!).
This might be a good solution to our pagination problem of Forums.
Just to add my 2 cents. I don't have much time at all for doing this, and I think I have some newer code that I merged with Brad Duell's.

Once someone gets it working with the existing code I can make sure all the parts are there.

The most important thing is to get my query procs which do all the work in the DB and I can do paginated sorts of 20K pages in under one sec on both PG and Oracle.

The rest of the stuff, while nice if it is easy to add, can be added later or just a switch.

Again, the DB procs for paginating results sets in the DB will make your paginations much quicker.

Just for clarity: you are suggesting to put this on HEAD or oacs-5-2, right?

Also, you are volunteering to do the work, right?

Jon: Where's the code you've merged with Brad's, is that the current set of files on your site?

Jade: I'm going to look into this with forums starting tomorrow, if all goes well (I'm busy today).

Everyone: why are e-mail responses broken on this box?  I remember seeing some comments regarding qmail earlier but ignored them ...

One of OpenACS's recurring issues is that it offers several different approaches to doing the same thing. In some anchitectures this is considered a strength, "there's more than one way," but I think that this diversity hurts OpenACS. In practice it has lead to a division of coding, bug fixing, and documentation efforts, as well as a palimpsest of many different coding techniques embedded in the Core packages, so that there is rarely more than one "standard" example of any particular technique. We've made progress merging email, photo album, bugs, and other duplicate packages. I see a big benefit in having a single standard way to do lists that includes sorting, filtering, paginating, bulk operations, and hooks for single-record insert/delete/update/view. Does this proposal take us further away from that?

How can we provide examples in the core packages of correct usage of the technique

Who will write documentation showing developers when to use multirow vs pagination vs list-template?

In the pagination approach, how does one sort, filter, or do bulk operations? Who will add Jon's documentation of his procs to the standard docs?

Joel,

To be complete the new pagination procs should replace the existing ones in acs-templating and work with list-builder.

"There should be one -- and pereferrably only one -- obvious way to do it." --Tim Peters, The Zen of Python

Pagination in list-builder should be fixed, perhaps by integrating what makes Jon's pagination so good. Otherwise we'll fall under the confusing "use list-builder because it's nice, but use this other (jon's pagination) way if you have a large data set".

-Roberto

The proposal is in two parts for that reason. The intention is to propose that we incorporate Jon's new work by making it accessible via list-builder. In the meantime though, anyone who has used these procs for a specific application will have their site break every time they do an 'Update from Repository' because the procs are not in ../acs-tcl/tcl/

I am proposing a two stage process.

I'm very leery of two-stage processes where step one goes away from our long-term direction and step two doesn't have anybody signed up to do the work or a clear plan on how to do the work.
Jon's paginator are superior to list-builder paginator, then should replace the paginator method implemented on the list-builder, as simple as this, it's a matter of need for scalability (we have used those procs on production with excellent results).
I vote to approve. I would like some clarification of who will do the work, and when it's planned to be completed, however. I think this is an important addition.
Fools rush in where angels fear to tread......

It is not a lack of willingness that makes me hesitate to volunteer to do the integration, purely a lack of knowledge! I will agree do the work, but I have no idea at all how long it will take me. I have to learn how to actually use list builder first, then work out how the pagination works and then try to modify it in line with Jon's code (which looks elegant and simple at first sight).

But 'in at the deep end' is often the best way!

It needs to be done since the current situation is no good for anyone.

Regards
Richard

I approve an offer myself to integrate it to list-builder (during december) for HEAD.
RE: breaking existing code...

When merging Jon's code, why not rename it (or place it in a namespace) such that there will be no conflict between the existing code (not in the core) and the new core code? At the same time deprecate the non-core code so that developers are aware that it will be removed at some point. In the mean time, no code should break.

I think the moment we move Jon's code into the core, the naming will change from tools:: anyway, if we keep our naming conventions.

With the willingness of Rocal and Richard guaranteed to work on it for 5.2, I approve (for 5.2).

I have been studying the list-builder source code and trying to gain a basic understanding of what happens. I think Roberto is correct when he says that we need to extract what is good about Jon's paginator and incorporate that into list builder. Jon also mentioned that the important bit is the db procs from his work.

However, looking throught the list builder it seems to me that      its job is to poke around in the variable scope of the calling tcl and the display template to facilitate user modification of the underlying query. It seems from the comments that the intention was always to paginate in the db and so I wonder whether our supposed problem stems from a misuse of list builder rather than anything wrong with the code. If so we need to establish whether list builder builds queries using the postgres 'offset' and 'limit' keywords and then progress accordingly.

If it does then we should try to find where the performance issues or bugs stem from.

If it does not we need to change the paginator procs to support this way of doing it.

So in essence I gues I am now recommending voting against my own proposal! Sorry.

I will do some more work and then propose something more suitable.

Regards
Richard

Richard ... while openacs.org was down for upgrade this weekend I worked on pagination for the forums package, using list builder, and now understand it thoroughly.

I also understand how to make it paginate very quickly, and think I can hack it up to do so without having to modify any of the client code.

I think this will make use of Jon's pagination code unnecessary, i.e. we'll want to recommend use of list builder and deprecate use of the existing paginator.

The reason it's so slow now is that it uses the paginator but inefficiently.  Rather than go into endless details I think I'll just fix it.

I would actually like to investigate writing a version of list builder that combines the multirow functionality afterwards. The current mechanism - call a list builder proc which then sets global structures for magic tcl procs you call embedded in .xql files to modify your query in ways that fit the needs of the listtemplate tag - seems like a gawdawful kludge to me.  By combining the functionality it could, among other things, do reasonably efficient pagination using the display query itself wrapped appropriately in LIMIT/OFFSET rather than require a separate prepare-for-pagination query as is necessary now.

Don,

I knew that you were working on this (secretly hoped you would post here!!).

I can't tell you how relieved I am to hear that you have worked through this issue because it would have taken me many days and a lot of experimentation before I fully understood it. I had concluded that there seemed to be a lot of surplus munching going on and was thinking along the lines of proposing to do what you have just suggested.

From my reading so far, even if you specified all the page parameters correctly, the paginator was still asking postgres to run through the entire unfiltered dataset (actually more than once) before arranging the required page.

It seems to me that the best way to paginate data is to ask postgres to do it (as in Jon's code). We don't need to import Jon's code to achieve this but as you say, we would need to make sure that the result of the user clicking on a url is a query that returns only the data required for the selected page, and in the correct order - and nothing more.

I have used 'Limit' and 'Offset' for paginating a 3000 row dataset and it is a really fast way to restrict the results. So any means of exposing this functionality through list builder gets my vote.

If there is anything that I can do to help please let me know.

Regards
Richard

The paginator, which listbuilder is using, requires you to first fill a cache of row ids for the entire data structure being displayed.  You then query display information using specific row ids from that cache.

Filling the cache is very slow if you have a large datastructure, for instance 5,600 threads in a forum like we do in the openacs Q&A forum.  The query itself's not too bad, it's pulling the rows out of the rowset and stuffing them into an nsv cache variable that's taking most of the time.

The design philosphy behind the paginator seems to be "pay a fairly steep up-front cost so accessing every page afterwards costs roughly the same".

This isn't good.  Normally people are going to reference more recent threads, blog entries, bugs sorted by some criteria, etc.  LIMIT/OFFSET and Oracle ROWNUM tricks are faster for early entries in the rowset than those at the end, but that's OK given common usage patterns.  Besides both Oracle and PG implement these constructs quite well, we don't really care if accessing the very first thread in the openacs Q&A forum takes a couple of tenths of a second longer than accessing the most recent one.

To put it bluntly the paginator's a bit evil and I'll just rewrite list builder to not use it, except perhaps the bits that generate that nice navigation bar.

So Don, now that you've taken this on and figured out what's going on with the current code, what pagination do your recommend OpenACS use going forward? The old crappy List Builder pagination? The improved-by-Don but still Evil pagination? Something basically based on Jon Griffin's paginator procs? Or what?

I'm actually rather astonished that this thread seems to have generated so much debate. The List Builder's built-in pagination sucks, this has been well known for a long time. Lars wrote all of List Builder, and if I remember correctly, he himself has said in these Forums that its built-in pagination sucks and should be fixed.

Jon Griffin's paginations procs are not "new", they've been around for at least a year now (maybe two). Several people have used them (I have not), and everyone who has seems to agree that they are a major improvement over the existing lousy List Builder pagination.

Joel, your heart is in the right place, but you're putting up straw men. If the toolkit is broken, it needs to be fixed - and AFAICT everyone who's examined the issue seems to agree that the List Builder pagination is definitely broken. If there is insufficient, contradictory, or just plain misleading documentation, then that is also a breakage that needs to be fixed - but that second breakage is in no way a justification for stopping someone from fixing the first breakage.

By all means, change the proc names or do whatever else is necessary or desirable to integrate Jon's pagination implementation smoothly into OpenACS - but do it! Now, if Don has figured out an even better solution, way cool, but please, somebody pick one of the improvements available and actually put it into the toolkit.

REJECT:

list builder pagination will be efficient and we don't want another pagination method because people should use listbuilder

Since Don is willing to take this on and he's looked closely at it, I vote to reject this particular way of implementing it, and let Don take care of it. Also, since this is pretty much a bug fix in my opinion, I don't think Don's fix requires a TIP. This one probably didn't either, but it probably saved you some work!
I vote to reject, because the openacs.org upgrade drove Don to rewrite pagination in listbuilder, which renders this tip unnecessary.
Well, Andrew, any time you find a place where you think the toolkit's broken, feel free to offer to fix it rather than just bitch about it, OK? List builder pagination can be made efficient, and the paradigm's very easy to work with, I see no reason to add yet another way to do it. REJECT
Reject. Don's fixing it. See preview few posts.
I've commited my improvements to the list builder's pagination code. Essentially I've sped it up by creating a separate paginator instance for each PAGE GROUP rather than one for the ENTIRE LIST. This means that on openacs.org's Q&A forum, rather than filling a cache of 5500 entries whenever the cache is flushed, we're only filling 330, using LIMIT+OFFSET/ROWNUM tricks. We then get the benefit of having cached the keys for the current page group until there's another post in the case of forums (not all clients of the list builder enable caching, of course). I've also gotten rid of the Tcl-based sorting of the keys returned for use in the page display query's IN (keys) clause, a minor speedup. There's probably still a lot of room for improvement, i.e. I can make lighter-weight list builder specific code to execute and cache the pagination query, etc. I'll poke at it periodically. Currently the OpenACS Q&A forum takes about 1.6 seconds to display if the cache is flushed. It took about 18 seconds using the existing list builder. When the cache is available it takes about 750 milliseconds. Much of that seems to be running the listtemplate tag as the shorter the number of rows, the faster it goes. Notice that performance of this can be impacted by changing the number of rows per page, and page groups per navigation widget, so I'll probably add these as parameters to forums at some point. You get quicker page displays by shrinking those numbers. And of course I haven't added the performance improvements to forums itself that I have in mind ... those will help, too.
This is very good news indeed. Had you had a chance to test your pagination enhancements on bug-tracker as well, as there might be additional challanges due to the filters.
I'm sure I speak for lots of people when I say - Thank you Don. 😊
Don, Were you successful on the Oracle side as well? --cro
Yes, this works for Oracle as well as Postgres, I have both on my server at home for testing/development.

Malte, I'll try to look at the bugtracker in the next few days, still plan on concentrating on forums through the next couple of days.