Forum .LRN Q&A: performance optimization for dotLRN portlets

This post is basically a cross link to the work being done on optimizing display of object name and url for Categories.

https://openacs.org/forums/message-view?message_id=85086

When an approach is implemented for Categories it's likely that the same design pattern can be used to rewrite dotLRN portlets queries so they don't have to get a url for each object they display.

Here are some typical dotLRN portlet queries from SloanSpace. Not horrible, but when there a many portlets on a page it can get slow.

305 ms  	  main  	dbqd.news-portlet.www.news-portlet.select_news_items: select nsdb0 
            select news_items_approved.package_id,
                   acs_object.name(apm_package.parent_id(news_items_approved.package_id)) as parent_name,
                   (select site_node.url(site_nodes.node_id)
                    from site_nodes
                    where site_nodes.object_id = news_items_approved.package_id) as url,
                   news_items_approved.item_id,
                   news_items_approved.publish_title,
                   news_items_approved.publish_date
            from news_items_approved
            where news_items_approved.publish_date < sysdate
            and (news_items_approved.archive_date >= sysdate or news_items_approved.archive_date is null)
            and news_items_approved.package_id in (0, 32240, 32607, 33345, 543285, 187532, 184299, 1801180, 176155, 175668, 185255, 2336625, 2353505, 2398529, 502752, 69585, 40628, 31871, 2449158, 2451203, 36745, 2275302, 79549, 2478821, 2549759, 43319, 66539, 110607, 193287, 193807, 39467, 497256, 2448337, 3416372, 3414701, 37137, 2919495, 2911709, 3016932, 2973468, 2950109, 2871629, 3060733, 3117011, 3152632, 2689331, 108331, 3235837, 3225610, 68898, 2080663, 73641, 143938, 70331, 53851)
            order by parent_name,
                     news_items_approved.publish_date desc,
                     news_items_approved.publish_title			

111 ms  	  main  	dbqd.faq-portlet.www.faq-portlet.select_faqs: select nsdb0 
            select acs_objects.context_id as package_id,
                   acs_object.name(apm_package.parent_id(acs_objects.context_id)) as parent_name,
                   (select site_node.url(site_nodes.node_id)
                    from site_nodes
                    where site_nodes.object_id = acs_objects.context_id) as url,
                   faqs.faq_id,
                   faqs.faq_name
            from faqs,
                 acs_objects
            where faqs.faq_id = acs_objects.object_id
	    and faqs.disabled_p <> 't'
            and acs_objects.context_id in (0, 32219, 32588, 33320, 543276, 187514, 184284, 1801171, 176133, 175646, 185234, 2336616, 2353496, 2398520, 502743, 69576, 40601, 31848, 2449149, 2451194, 36723, 2275293, 79526, 2478812, 2549750, 43296, 66516, 110515, 193260, 193788, 39444, 497247, 2448328, 3416363, 3414692, 37114, 2919486, 2911654, 3016923, 2973459, 2950100, 2871583, 3060724, 3117002, 3152623, 2689322, 108241, 3235828, 3225601, 68889, 2080654, 73543, 143922, 70248, 53749)
            order by lower(faq_name)
        
 121 ms  	  main  	dbqd.forums-portlet.www.forums-portlet.select_forums: select nsdb0 
	    select forums_forums.package_id,
            acs_object.name(apm_package.parent_id(forums_forums.package_id)) as parent_name,
            (select site_node.url(site_nodes.node_id)
            from site_nodes
            where site_nodes.object_id = forums_forums.package_id) as url,
            forums_forums.forum_id,
            forums_forums.name,
            case when last_modified > (sysdate - 1) then 't' else 'f' end as new_p
            from forums_forums_enabled forums_forums,
            acs_objects
            where acs_objects.object_id = forums_forums.forum_id and 
            forums_forums.package_id in (0,32154,32531,33256,543253,187468,184226,1801148,175931,175598,185190,2336593,2353473,2398497,502720,69553,40534,31785,2449126,2451171,36660,2275270,79268,2478789,2549727,43229,66447,110175,193100,193701,39387,497224,2448305,3416336,3414669,37049,2919463,2911706,3016900,2973436,2950077,2871626,3060701,3116979,3152600,2689299,108044,3235805,3225578,68866,2080631,73273,143687,70003,53473)
             order by parent_name,
             forums_forums.name
Collapse
Posted by Timo Hentschel on
I'm currently working on the categorization package which i will commit tommorrow in a first version. This is a cross-package since it has to enable users to categorize all kinds of objects. When displaying a list of all objects mapped to a particular category, i had to face two main problems:

a) getting the name of an object FAST (i refrain from using acs_object.name since it's not efficient when displaying a list of objects)
b) showing a link to the object

I solved a) by introducing a new subtype 'named_objects' of acs_objects. Every object-type in the system of displayable objects should now be derived from named_objects, no longer from acs_objects. More so, each package then should make sure that it stores the name and package_id in the new named_objects table (i will post details in a new thread). Then, i can simply join with the named_objects table whenever i need to get the name of an object. (drawback: objects can have only one name, so it's not multilingual, but acs_object.name wasn't either).

I solved b) by adding a .vuh page under /www/o/. It will accept an object_id, query the named_objects table to find out the package_id, get the package_url from the cache and then invoke a service-contract (which has to be implemented for every displable object-type) which will return the local url in the package to display that object.

To sum it up, with these modifications long overdue we can speed up those kind of queries where we need to get an object_name or a url considerably.

Collapse
Posted by Dirk Gomez on
Caroline,

I don't like the fact that there are hard-coded values in every SQL query you posted.

If you don't use bind variables every single statement needs to be parsed - which may lead to a huge scalability issue. Parsing a query is quite costly in Oracle because a lot of things are being done. The Cost-Based Optimizer tries to figure the most efficient way of performing a query and traverses a tree of possible execution plans, needs to check permissioning etc. There's a cut-off somewhere, but it is entirely likely that a prominent %age of time of each above-mentioned query is taken for parsing alone.

(It also has an effect on concurrency, read this for some gory details:
http://asktom.oracle.com/pls/ask/f?p=4950:8:43470802654141231::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:2444907911913,)

I suggest that you switch to using transaction-temporary tables or look at this http://asktom.oracle.com/pls/ask/f?p=4950:8:43470802654141231::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:210612357425,

Collapse
Posted by Tilmann Singer on
I don't know where these hardcoded values are coming from in these particular cases, but I suspect they could easily be replaced with where exists (select 1 from ...) which should be much more efficient.
Collapse
Posted by Caroline Meeks on
The examples I listed were cut and pasted from developer support. They are actual queries, with timing, from a random user accessing his MySpace on SloanSpace this morning.

The numbers you see are a list of package_ids supplied by the portal package that indicate which communities this particular user is a member of.

The code that generates the query for the news portlet for example can be viewed at

http://cvs.openacs.org/cvs/news-portlet/www/news-portlet-oracle.xql?rev=1.7&content-type=text/x-cvsweb-markup&cvsroot=dotlrn

The snippet that generates the long list of numbers is:

and news_items_approved.package_id in ([join $list_of_package_ids ", "])
This is pretty typical of the dotLRN portlets.
Collapse
Posted by Dirk Gomez on
The variable list_of_package_ids - where does it come from?

- Is it some user-dependent input?

- Is it needed in every portlet content query? Or only once per page?

On a side note: is null clauses usually lead to indexes not being used because null values are not stored in a b*tree index. That might be another performance-detrimental query clause.

I think you should take out certain parts of a particularly expensive query and play around a fair bit to get a feeling what exactly causes sub-optimal performance.

As you have production-size data, your results would really matter. I am tinkering with an empty system 🤔

Collapse
Posted by Dirk Gomez on
I should have read your posting more carefully.

How is the "in list" created in the code? It may be possible to move the code/query that does this into the respective portlet query.

Collapse
Posted by Caroline Meeks on
The package list is set by portals. The portlet accesses it as follows:
set list_of_package_ids $config(package_id)
I'm not sure if the work being done on New Portals will affect this.

https://openacs.org/forums/message-view?message_id=92143

Collapse
Posted by Don Baccus on
These lists for the where clauses are due to the fact that the portlets are aggregating content from all the news instances for all groups the user belongs to, etc.  I haven't dug into these queries or the surrounding portlet code to see if things can be organized in a better way.

At the very least we could use db_map to replace these "in" clauses with "= :forum_id" in the case where the portlet is being used to show a single forum's contents, which will often be the case within a single class (i.e. one without subgroups with subgroup forums.)