Forum OpenACS Q&A: Big performance problem or very large numbers :-) ?

After initiallizing our installation in preproduction
environment for testing we are having BIG performance problems.
Our environment is:

Software environment:
--------------------
Aolserver 4.01
OpenaACS 5.1 (from CVS)
DotLearn (2.0.3)
Postgres 7.4.3 (the same with previous 7.4.x and Oracle 9i with the same results)
Linux Debian stable kernel 2.4.20
Files in file system (not in DB)

Hardware:
----------
Opteron Dual for AOLServer (2 Gb RAM)
Opteron Dual for Postgres (2 Gb RAM)

Our data are like this magnitude:
---------------------------------
dotlrn users:    ---    50.000
classes:        ---    6.000
acs_objects      --- 1.000.000
cas_permissions  ---  700.000
acs_object_paths --- 2.600.000

Before load the data the same processes has a good, reasonably time cost.
After the load of data, all the system has slowed down very much.

-The login page costs 3 - 5 seconds.
-The administration pages are impossible to navigate.
-Many packages have the same problem like administration
but in minor scale.

We think that the main problems arises from the permissions subsystem (mainly we are
detected bad performance in acs_object__name and acs_permission__permission_p or too
many checking instructions, there are many rows to access).

Are there any way to avoid this problem?
If not possible, how could we minimize that big numbers?
Are the subsites a solution?
Minimizing the package number?
Cut the permissions check in administration pages?

Anybody have similar numbers in his installation?

If anybody would like to test our installation, tell us to give him an user/pw.

Any pointer will be welcomed.

PS: We are too detecting that in some case swapping the predicate "exists (subselect)"
with "in (subselect)" is many times better. ????

Collapse
Posted by Dave Bauer on
See this recent thread: https://openacs.org/forums/message-view?message_id=190427

If that does not help, please post some details on which pages and queries as slow. Install developer support to get the database query timings and page request times.

yes, probably a problem with site nodes & the curriculum bar, substitute the acs-tcl/tcl/site-nodes-procs.tcl file with the one at HEAD and try again!
Collapse
Posted by Don Baccus on
PS: We are too detecting that in some case swapping the predicate "exists (subselect)" with "in (subselect)" is many times better. ????

This is an interesting observation. This is often true in Oracle.

But ... in versions of Postgres older than 7.4, the in (subselect) construct was unoptimized. The subselect would be fully evaluated, and then searched linearly to check if the in was satisifed. As you can imagine, this was very very slow for subselects returning many rows. exists was orders of magnitude faster.

AFAIK you're the first person to compare the execution speed of the new optimization in 7.4 with exists on a large set of real data. Can you post some actual numbers for us to look at?

Unfortunately we're still supporting 7.3 (which does not have the optimization) so will be in no hurry to switch from exists to the subselect form but when PG 7.5 comes out, we'll drop 7.3 support and may want to go rewrite some of the more important queries.

Collapse
Posted by Dario Roig on
More problems of performance in our environment (I work with Agustin Lopez).

To show the homework portlet slow 15 seconds, with developer support in on we have discovered that the "order by content_item__get_path .." the following select slow 10 seconds:

"select
  o.object_id,
  coalesce(f.label, fs_tree.name) as name,
  fs_tree.live_revision as version_id,
  fs_tree.content_type,
  r.content_length, r.title,
  fs_tree.parent_id as folder_id,
  lpad(' ',(tree_level(fs_tree.tree_sortkey) - 1), ' ') as spaces,
  rels.related_object_id as homework_file_id,
  p.first_names || ' ' || p.last_name as file_owner_name,
  o.creation_user
from
  (select cr_items.*
  from cr_items, cr_items as cr_items2
  where cr_items2.item_id in (5672685)
    and cr_items.tree_sortkey between
      cr_items2.tree_sortkey and tree_right(cr_items2.tree_sortkey)
    and tree_level(cr_items.tree_sortkey) > '2'
    and tree_level(cr_items.tree_sortkey) <= '2' + 1) fs_tree
  join acs_objects o on (o.object_id = fs_tree.item_id)
  left join cr_folders f on (f.folder_id = fs_tree.item_id)
  left join persons p on (p.person_id = o.creation_user)
  left join cr_revisions r on (r.revision_id = fs_tree.latest_revision)
  left join cr_item_rels rels on
    (rels.item_id = o.object_id and rels.relation_tag = 'homework_correction')
where not exists (select 1
                  from cr_item_rels
                  where related_object_id = o.object_id
                    and relation_tag = 'homework_correction')
and (o.creation_user = '3647094' or f.folder_id is not null)
order by content_item__get_path(fs_tree.item_id, null)
;"

Some idea to solve it?

Hello, Don!

Ok.

Here is the difference between two selects called
from admin/site-map with Postgres 7.4.3

This is the original from CVS, using exists
in one subselect
----------------------------------------------
select package_id,
      package_key,
      pretty_name as package_pretty_name,
      apm_package_type__num_parameters(package_key) as parameter_count,
      node_id, url, parent_url, name, root_p, mylevel, object_id,
      directory_p, parent_id, n_children,
      p.instance_name as object_name,
      acs_permission__permission_p(object_id, '3901770', 'admin') as object_admin_p
from apm_packages p join apm_package_types using (package_key) right outer join
    (select n.node_id,
            site_node__url(n.node_id) as url,
            site_node__url(n.parent_id) as parent_url,
            n.name,
            case when exists (select 1 from site_nodes where parent_id = n.node_id) then 1 else 0 end as n_children,
            case when n.node_id = (select site_node__node_id('/', null)) then 1 else 0 end as root_p,
            (tree_level(n.tree_sortkey) - (select tree_level(n2.tree_sortkey) from site_nodes n2 where n2.node_id = (select coalesce('436', site_node__node_id('/', null))))) as mylevel,
            n.object_id,
            n.directory_p,
            n.parent_id
      from site_nodes n, site_nodes n2
      where (n.object_id is null
            or exists (
                select 1 from acs_object_party_privilege_map ppm
                where ppm.object_id = n.object_id
                  and ppm.party_id = '3901770'
                  and ppm.privilege = 'read'))
      and n2.node_id = (select coalesce('436', site_node__node_id('/', null)))
      and n.tree_sortkey between n2.tree_sortkey and tree_right(n2.tree_sortkey)
      and (n.parent_id is null or n.parent_id in (436))) site_map
on site_map.object_id = p.package_id
order by url
;

This is modified by me, only replacing "exists"
by "1 in" inthe same subselect:
----------------------------------------------
select package_id,
      package_key,
      pretty_name as package_pretty_name,
      apm_package_type__num_parameters(package_key) as parameter_count,
      node_id, url, parent_url, name, root_p, mylevel, object_id,
      directory_p, parent_id, n_children,
      p.instance_name as object_name,
      acs_permission__permission_p(object_id, '3901770', 'admin') as object_admin_p
from apm_packages p join apm_package_types using (package_key) right outer join
    (select n.node_id,
            site_node__url(n.node_id) as url,
            site_node__url(n.parent_id) as parent_url,
            n.name,
            case when exists (select 1 from site_nodes where parent_id = n.node_id) then 1 else 0 end as n_children,
            case when n.node_id = (select site_node__node_id('/', null)) then 1 else 0 end as root_p,
            (tree_level(n.tree_sortkey) - (select tree_level(n2.tree_sortkey) from site_nodes n2 where n2.node_id = (select coalesce('436', site_node__node_id('/', null))))) as mylevel,
            n.object_id,
            n.directory_p,
            n.parent_id
      from site_nodes n, site_nodes n2
      where (n.object_id is null
            or 1 in (
                select 1 from acs_object_party_privilege_map ppm
                where ppm.object_id = n.object_id
                  and ppm.party_id = '3901770'
                  and ppm.privilege = 'read'))
      and n2.node_id = (select coalesce('436', site_node__node_id('/', null)))
      and n.tree_sortkey between n2.tree_sortkey and tree_right(n2.tree_sortkey)
      and (n.parent_id is null or n.parent_id in (436))) site_map
on site_map.object_id = p.package_id
order by url
;

Regards,
Agustin

Collapse
Posted by Jeff Davis on
Could you send the timings and explain analyze output for the two versions of this query? I spend a lot of time tuning this particular query for pg7.3, although I did not try "in" instead of "exists" for the subquery since I didn't think it would be faster...
Collapse
Posted by Andrew Piskorski on
Jose, you mention that on your site, the OpenACS login page is taking 3 - 5 seconds, and many admin pages are much worse. That makes me suspicious that your AOLserver thread settings might be wrong. Did you check that? What do you have for maxthreads, minthreads, and threadtimeout in your AOLserver config file?

From the additional info Dario posted (one query which itself is very slow), I think bad AOLserver thread settings couldn't possibly be causing all of your problems, but they could be a contributing factor, so you definitely want to double check them, if you haven't already.

Collapse
Posted by Don Baccus on
Jeff, the "in" form would be much, much slower for 7.3, the "in" optimization, as I mentioned above, didn't come 'til 7.4.

Jose, I second Jeff's request, though, could you post times and query plans (output of EXPLAIN SELECT ...) here, please?

Collapse
Posted by Don Baccus on
For the homework portlet query, could you please: 1. Try it without the "order by..." clause - you seem to indicate that this is the problem. How much time does the query take with the "order by"? How much without? 2. Could you post the EXPLAIN output for that query? 3. Here's the relevant query in content_item__new, could you test this, tell us how slow it is, and post the EXPLAIN output for it, after substituting in the proper parameter values, please?
select i2.name, tree_level(i2.tree_sortkey) as tree_level
from cr_items i1, cr_items i2
where i2.parent_id <> 0
  and i1.item_id = get_path__item_id
  and i1.tree_sortkey between i2.tree_sortkey and  tree_right(i2.tree_sortkey)
order by tree_level
Collapse
Posted by Peter Alberer on
I do not know much about the other thread settings, but for sure the StackSize parameter is important. In the default config.tcl file the value for this parameter is much to small...
Collapse
Posted by Jarkko Laine on
I think it would be hugely important to set the thread settings correctly by default. I don't think there's anything in the install docs about them and I learned about their importance the hard way last fall, when one site I administrate went down on a crucial time with only about 1 hit/second load.

Could someone who knows approximately what would be reasonable values for thread settings either change the default config file or file a bug and a patch for it?