Forum OpenACS Development: OpenACS 4 file storage slow query

Collapse
Posted by Jun Yamog on
Hi,

I am currently using file storage for one of our sites.  Each time you
view a folder via /www/index things are very slow.  The problem comes
from the view fs_folders_and_files which is a union of fs_folders and
fs_files view.  The bottleneck is at fs_files view when it tries to
get the url of a file.  During the contruction of the url a function
file_storage__get_package_id is called. This function I believe is
what is causing the bottleneck.

You can either optimize this function of just use the package_id
column in cr_folders.  ETP makes use of this column and I wonder why
file storage does not make use of this.  Why would you have to
recompute the package_id of a folder when you can just store it in the
first place.  I also doubt that there is use of changing the
package_id since there is no feature in file storage than can move
folders across packages.

I would like to ask your opinion about this.  I think we should just
drop the function and just use the package_id column in cr_folders.
We may retain it for compatiblity, but I still think that storing the
value rather than computing it is better.  What do you think?  Am I
missing something here?  Although the migration script will have to
populate cr_folders.package_id.

Collapse
Posted by Yonatan Feldman on
the view fs_folders_and_files no longer exists, it is now called fs_objects
and is a join of fs_folders, fs_files, and fs_simple_objects. the views no
longer call the function get_package_id and are therefore much faster.

it looks like file-storage isn't making use of the package_id column in
cr_folders. i think it is a good idea to start using it. i would say go for it,
but make sure you are working on the latest code.
Collapse
Posted by Jowell Sabino on
Jun, I think the bottleneck is the call to acs_permission__permission_p (see the query fs::get_folder_contents in file-storage/tcl/file-storage-procs-{postgresql,oracle}.xql).

As for why package_id column in cr_folders is not used, there are various reasons. One is historical: the package_id column is a relatively recent addition, which did not exist in the original aD datamodel and therefore was not in the file-storage code (and port). The second is the way file-storage uses CR: a file-storage instance is an entire tree, identified by its root folder. Not all file-storage folders belong to separate packages (unlike the way ETP uses CR, for example), with the sole exception being the root folder. Since you only need to know the package_id for each root folder, this relationship is mapped in fs_root_folders table and all package_id entries in cr_folders are set to null (i.e., just not use or be aware that it is there). It should be easy to modify the code if you want to store package_id in cr_folders (perhaps to improve performance, but I doubt it). It seems to me though that using the fs_root_folders mapping is a cleaner way to do it (data normality, too?). How to use CR properly certainly qualifies as one area where the community should identify "best practices".

Finally, for file-storage to clean up (and in the future, archive smartly) the CR tree properly whenever APM deletes the file-storage instance, I put a trigger on fs_root_folders that gets activated whenever package_id is deleted. Yes, this could have been done for the package_id column in cr_folders, but when I did the port package_id was not part of cr_folders yet, and even if it was I still wouldn't do it because I feel file-storage should not be putting triggers on the CR datamodel (package isolation). Luke Pond pointed out that it would be nice if CR could do this (clean up/archive) for file-storage and ETP, but right now, not yet.

Collapse
Posted by Andrew Grumet on
For tuning the permission_p part of the query, please see this thread: https://openacs.org/bboard/q-and-a- fetch-msg.tcl?msg_id=0003n5
Collapse
Posted by Jun Yamog on
Hi Yon,

When was this fs_objects implemented?  I am using beta 1 code.  Is this code in the CVS HEAD?

Hi Jowell,

Nope its not the acs_permission__permission_p call, this was my first hunch so I removed it and still there was a performance problem.  I investigated it to the source of the problem.  So am pretty sure its the file_storage__get_package_id because removing makes the query a whole lot faster.

Yes ETP works differently from file storage but I think using the package_id on cr_folders will be ok even when the package_id column will not be unique.  It will be faster to lookup the value rather than compute it.

Anyway thanks for giving me your insights about this.  I will check out the CVS head and see if I can upgrade the data model.  If the upgrade of file-storage is too much of a problem I may have to change my current file-storage by using the package_id column.

Collapse
Posted by Jun Yamog on
Hi Jowell and Yon,

I have checkout the head of the CVS.  It seems that there are major changes in the data model and there is no upgrade scripts.  Is the code on the head more or less stable?  Will there be any major code movements on it?

I am inclined to change the current file-storage in beta 1-2 to use the package_id.  But if the new file storage comes out then my data model may not be upgraded properly.  Or maybe it will be upgraded since the change is very minor, its just the usage of a column and a change in a view.  Do you think I should minor fork my file-storage's data model?  The chances that it will not be upgraded properly is slim but I can't tell yet since I haven't seen the upgrade script yet.

Collapse
Posted by Jun Yamog on
Hi,

I got back on this issue again today.  On further investigation it seems only the fs_files, fs_folders makes use of the plsql function file_storage__get_package_id.  They just use it to make the "url" column.  Now the url column of the views fs_files, fs_folders, fs_folders_and_files does not seem to be used.

So I set the url columns to be nulls and recreate the views and its  faster now.  I am not talking about milliseconds faster but just looking at a page load.  I don't need some fancy benchmark to tell me that it is now faster since it was so slow before.  I will now have our testers test it.

Since Yon's new file-storage may not be in for 4.5 don't you think that this qualifies for a bug fix for 4.5 final?

Collapse
Posted by Jun Yamog on
Hi,
<p>
I have uncovered another performance problem in file-storage.  Although the code is 4.5 beta.  Here is the offending query.
<pre>
    select  r.title,
            r.revision_id as version_id,
            person__name(o.creation_user) as author,
            r.mime_type as type,
            to_char(o.last_modified,'YYYY-MM-DD HH24:MI') as last_modified,
            r.description,
            acs_permission__permission_p(r.revision_id,'2426','admin') as admin_p,
            acs_permission__permission_p(r.revision_id,'2426','delete') as delete_p,
            r.content_length as content_size
    from  acs_objects o, cr_revisions r, cr_items i
    where  o.object_id = r.revision_id
    and    acs_permission__permission_p(r.revision_id, '2426', 'read') = 't'
    and    r.item_id = i.item_id
    and    r.item_id = '7416'
    and r.revision_id = i.live_revision
</pre>

A possible fix is this

<pre>
    select  r.title,
            r.revision_id as version_id,
            person__name(o.creation_user) as author,
            r.mime_type as type,
            to_char(o.last_modified,'YYYY-MM-DD HH24:MI') as last_modified,
            r.description,
                acs_permission__permission_p(r.revision_id, '2426', 'read') as read_p,
acs_permission__permission_p(r.revision_id,'2426','admin') as admin_p,
            acs_permission__permission_p(r.revision_id,'2426','delete') as delete_p,
            r.content_length as content_size
    from  acs_objects o, cr_revisions r, cr_items i
    where  o.object_id = r.revision_id
    and    r.item_id = i.item_id
    and    r.item_id = '7416'
    and r.revision_id = i.live_revision
</pre>
<p>
And just check the perm on tcl level.  The 2nd query is significantly faster.  And I only need to count my fingers to justify the speed no fancy explain stuff.  In fact if I run explain is just gives the same plan.  Odd.  But I am sure they are very different.
<p>
I haven't taken a look at the latest file-storage code but any opinions about this?

Collapse
Posted by Jun Yamog on
Hi,

I just check the head of openacs-4 tree.  Unfortunately it still exists.  I have done my patch on my production server and things are fast again.  All we need is just to use the 2nd query and on file.adp check if read_p eq "t".  I am sure its faster to just not display a row rather than having a properly written query.

It seems that few people are concerned about this performance problem. Am I the only suffering from this or are our server just too crappy old.  I will be putting in a bug fix on the SDM, I will leave it to the package maintainer to decide.

Collapse
Posted by Tilmann Singer on
A few things that I don't understand about this query (I did not try it in file-storage, just from looking at it here):

Isn't the and r.item_id = i.item_id redundant? The query will at most select one row anyway, because of the

and    r.item_id = '7416'
and r.revision_id = i.live_revision
Does it bring any secret performance benefits, or is it just oversight?

Jun, the reason why the query is slow is propably because it executes the acs_permission__permission_p for much more rows than necessary. If the optimizer would choose to first narrow the rows using the other conditions in the where clause then it would be much faster, because the stored procedure would be executed only once at maximum.

My only suggestion for improving this is to extract most of the query into a subquery and wrap the query part with acs_permission__permission_p around it. Maybe there are other ways that I don't know of.

Throwing away rows at the tcl level is bad practice and I would vote against adding something like that to the toolkit, although in this particular case it is only one row that gets thrown away so not a big deal. Also note that it might work fast on oracle as it is.

Collapse
Posted by Jun Yamog on
Hi Til,
<p>
That where clause is made by the tcl file, I believe that it is not redundant.  Or it is actually redundant in the case of this query where its getting the live version, when it gets all the version the "and r.revision_id = i.live_revision" is not present. Also I am not sure if using the db_api of content_revision__is_live is faster.
<p>
Yes I agree with you that using tcl to filter is not good.  As I have mentioned I have decided to just go for a working query rather than a properly written query.  Is your idea of subquery is something like this?
<pre>
select * from (
select  r.title,
            r.revision_id as version_id,
            person__name(o.creation_user) as author,
            r.mime_type as type,
            to_char(o.last_modified,'YYYY-MM-DD HH24:MI') as last_modified,
            r.description,
acs_permission__permission_p(r.revision_id,'2426','admin') as admin_p,
            acs_permission__permission_p(r.revision_id,'2426','delete') as delete_p,
            r.content_length as content_size
    from  acs_objects o, cr_revisions r, cr_items i
    where  o.object_id = r.revision_id
    and    r.item_id = i.item_id
    and    r.item_id = '7416'
    and r.revision_id = i.live_revision ) as version where
                acs_permission__permission_p(version_id, '2426', 'read') = 't';
</pre>
I tried it... unfortunately is has almost the same response time.  For some odd reason putting it on where clause makes it slow.  This is really puzzling the 2 queries that I posted above has the same explain output.  Anyway I leave it to the file-storage maintainer to accept or reject the patch.  More opinion please... again I am using PG 7.1.3.  Thanks.
Collapse
Posted by Dan Wickstrom on
It's easier to understand if you post the plan.

I'm surprised that the subquery change didn't help since the subquery will get stored in a temp table, and the sequence scan over the temp table which calls permission_p shouldn't be that expensive since the temp table should only hold one row.  Possibly the planner is doing something funny.

Just for grins try moving the permission_p(... 'read') call into the subqueries target list and alias it to read_p.  Then modify the outer query where clause to test on read_p:

select * from (...) version where version.read_p = 't';