Forum OpenACS CMS: Re: content repository physical folders keeps growing

Posted by Gustaf Neumann on
Dear Antonio,

i've added an additional helper proc "cr_check_orphaned_files" with an optional "-delete" flag to address this problem.

The test might return false-positive on non-orphaned files but that is quite unlikely. The bigger problem is that the lookup on large repositories can be prohibitive slow, even as scheduled procedure. On sites like, this query is fine. However, on one of our systems we have e.g. 2.5 mio files in the cr. On this system a single lookup whether a file is references from cr_revisions is already very slow:

   select count(*) from cr_revisions where content = '/16/04/28/160430';
   Total runtime: 1904.185 ms
Multiplying 1.9 secs by 2.5 mio entries gives 54+ days (sql time)!

The same slow lookup happens as well in the sql query cr_delete_scheduled_files.fetch_paths we discussed above. Some time ago, i've fixed this on one of our systems by adding the following index

   create index cr_revisions_content_idx on cr_revisions (substring(content for 100));
This helps a lot:
   select count(*) from cr_revisions where substring(content, 1, 100) = substring('/16/04/28/160430', 1, 100);
   Total runtime: 0.062 ms
With this the accumulated SQL times boil down to 155 secs. In this 2.5 minutes there will be quite some stress on the database, so i am not sure, whether one wants to run this every day. Also, the tcl time to compute the 2.5 mio entries using the tcl-lib function is very slow (i've just measured 22 minutes). .... i will commit something better (faster, better configurable) soon.