Forum OpenACS Q&A: File Storage Module Extremely Slow

Collapse
Posted by Robin Felix on
After advertising the availability of the file storage module to my
users and subsequent loading of a few (only 10 MB or so) files, the
response time has become incredibly slow.  When clicking on a file in
the hierarchical display (/file-storage/group?group_id=14), it can
take anywhere from 2 to 5 minutes for a response!

The server shows two postmaster processes ongoing during that
interval taking up almost 50% CPU each, and the log shows numerous
queries ongoing.

Before going into gruesome detail on the troubleshooting, is this a
known problem?

Could Ben's new-file-storage module finesse this?

Collapse
Posted by Ben Adida on
Yes, this is a known problem, due probably to the BLOB hack
and the somewhat inefficient code of file-storage. Don't blame
the BLOB hack, though, it is a great fix. It's just not efficient for
large single BLOBs (lots of copying  of data going on).

Yes, the new-file-storage is massively faster. The most
inefficient part is the hierarchy calculation, and that will be
improved, too.

Collapse
Posted by Robin Felix on
Is new-file-storage available?  You've mentioned it, Ben, but I don't see it anywhere in the tree.  Given my current problem, it would be worth a try even in beta form.
Collapse
Posted by Don Baccus on
The key with new-file-storage is that it will store the file where files should be stored - in the host operating system's file system, rather than the RDBMS.  This is why it's so much faster.

The BLOB hack is CPU intensive, as it encodes/decodes the binary data in order to stuff it into Postgres.  This expands the size of the data  by 4/3.  Also, in order to avoid tying up the connection the BLOB is spooled to a file before it is returned to the user.  aD has removed this from the Oracle driver, presumably because of the additional overhead - file storage ain't blindingly fast in the Oracle world, either.  If I thought we'd be stuck with my BLOB hack much longer I'd remove it from the PG driver, too (PG 7.1 will have reasonable large objects, and of course we won't use them anymore for file storage anyway).

Collapse
Posted by Phil McTimoney on
I managed to speed things up considerably by removing the reference to fsf2 in the query below, taken from one_file.tcl in the file-storage module. It is only referenced in the 'from' section of the query, causing a cartesian join.

select count ( fsv.version_id ) as n_versions,
       fsf1.file_title,
       fsf1.folder_p,
       fsf1.parent_id,
       fsvl.url,
       u.first_names || ' ' || u.last_name as owner_name,
       user_has_row_permission_p (3, 'read', fsvl.version_id, 'FS_VERSIONS' ) as public_read_p
from   fs_files fsf1,
       fs_files fsf2,
       fs_versions fsv,
       fs_versions_latest fsvl,
       users u
where  fsf1.file_id = 82
and    fsvl.file_id = 82
and    fsf1.file_id = fsvl.file_id
and    user_has_row_permission_p (3, 'read', fsvl.version_id, 'FS_VERSIONS' ) = 't'
and    fsf1.owner_id = u.user_id
group by fsf1.file_title,
       fsf1.folder_p,
       fsf1.parent_id,
       fsvl.url,
       u.first_names,
       u.last_name,
       user_has_row_permission_p (3, 'read', fsvl.version_id, 'FS_VERSIONS' )
Collapse
Posted by Phil McTimoney on
BTW...is the "new file storage module" available anywhere??  I'm still not overly-thrilled with the current version.
Collapse
Posted by Ben Adida on
given the rising demand, I will include new-file-storage either as
part of 3.2.5 or as a separate module for download
Collapse
Posted by Robin Felix on
Phil, the speed increase is incredible!  At first glance it looks like you're just removing a COUNT of all stored files, but I'll have to dig deeper when I have time.

Any idea what effect this will have elsewhere?  Are we courting danger or DB corruption here?

Collapse
Posted by Phil McTimoney on
Robin,

I'm happy that the change will have no knock-on problems.  The table I eliminated wasn't used anywhere else in the query.  If you do that, it just creates a lot of needless work for the server.

Collapse
Posted by Don Baccus on
I've just committed this change to the CVS tree, thanks.

The history is that the original Oracle query, stripped to the bit relevant to this bug, did this:

select fsf2.parent_title from fs_files fsf1, fs_files fsf2
where fsf1.parent_id = fsf2.file_id(+)
The outer join is used to return NULL as the file parent's title if the file is at the root-level of the hierarchy.

Since pre-7.1 PG doesn't have outer joins, this query was split into two:

select fsf1.parent_id from fs_files fsf1, fs_files fsf2
where fsf1.file_id = $file_id
followed by:
set parent_title [database_to_tcl_string_or_null $db "
   select parent_title from fs_files where file_id = $parent_id"]
Whichever of us did it forgot to remove the second reference to fs_files when removing the outer join, resulting in a cross-join on fsf1, fsf2.

Oops!

Collapse
Posted by Caroline Meeks on
I have just completed a custom installation of new-file-storage.  My advice is to remove calls to general permissions within large queries.  These get slow very fast as the number of files, groups and users increase.

I rewrote the pages (one-folder.tcl was the worst offender) with specific checks based on my clients business model. Thus the code is quite custom and not sutible to be rolled back in, but if anyone else is doing a new-file-storage implementation and wants to take a look, email me and I'll send you a tarball.

Other minor additions to new-file-storage that someone might want:

Different graphics for jpg and pdf files (the only types of files my clients uses)
JPG are sent to a separate download page for a nice layout.
"Print this page" from the jpg-download page redirects to a page custom formated for printing.
"Download" zips the jpg then downloads it.

Again, nothing general enough to roll back in but nice ideas if you are doing a custom implementation.

Collapse
Posted by Caroline Meeks on
I have also found fs_order_files to be very slow and I only have about 350 files. I did a quick fix by only updating rows where the sort key or depth actually change. Time went from 11 seconds to 1 for the test cases I ran.

ns_log warning "entering fs_order at [ns_time]"
    set ordering_query "select file_id, fs_level_gen(file_id,1) as the_level, s
ort_key as old_sort_key, depth as old_depth
                          from fs_files
                         order by fs_connect_by(file_id), file_title"

    set order_depth_list [database_to_tcl_list_list $db $ordering_query]
    set sort_key 1
    set changed_count 0

    foreach id_depth $order_depth_list {
        set id [lindex $id_depth 0]
        set depth [expr [lindex $id_depth 1]-1]
        set old_sort_key [lindex $id_depth 2]
        set old_depth [lindex $id_depth 3]
        if {$old_sort_key != $sort_key || $old_depth != $depth} {
            incr changed_count
            ns_db dml $db "update fs_files set sort_key=$sort_key, depth=$depth
 where file_id=$id"
        }
        incr sort_key
    }
    ns_log warning "Done with fs_order at [ns_time] changed $changed_count files out of a total of $sort_key"
Collapse
Posted by Andrew Grumet on
At the risk of posting out-of-path, I'll mention a recent experience with a production site that runs on classic ACS4. We have about 100,000 acs_objects in our system and this caused the main file-storage page to become quite slow. As expected, the cost was due to the row permission query ("does the user have permission to read this directory and file?"). Our optimization was to force the permission query to run after reducing the number of rows with the other parts of the WHERE clase. I've posted further detail to the OpenACS4 design forum. The thread is here: https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0003n5&topic_id=12&topic=OpenACS%204%2e0%20Design.
Collapse
Posted by Don Baccus on
Andrew ... I'm curious, have you experimented with the quicker permissions check function that the unified LASD provided us?

Minimizing the calls to permission_p is undoubtably important, of course, but I think we've sped up the permissions checking considerably from ACS 4 classic.  A best-of-both-worlds scenario with file-storage, I should hope.

We should also just get rid of the file sort key in the postgres version.  We already have the tree_sortkey and it's sufficient and it's already being maintained by triggers on the content repository. fs_order_files could be a NOP for the PG version.

Collapse
Posted by Andrew Grumet on
Minimizing the calls to permission_p is undoubtably important, of course, but I think we've sped up the permissions checking considerably from ACS 4 classic.
Yep, we rolled the improved version in a month or two ago and saw some nice improvements. Thanks to the folks who did the tuning!
Collapse
Posted by Caroline Meeks on
Glad to hear you guys don't have this problem in 4x!

I check to see if this bug was in ACES or ACS3.5 and it turns out that Eric Cholankeril wrote a PLSQL funtion (see packages/file-storage/file-storage.sql) that does the entire sort inside the database which is probably a better solution then my quick fix.  If I find my quicky solution doesn't give me the performance I need I'll translate it for Postgres.