Forum OpenACS Q&A: Hosting a searchable mail archive

Collapse
Posted by Robin Felix on
I've run a highly-customized OpenACS 3 system for several years and am fairly comfortable with its operation and programming.  Since January, I have set up one operational and another public but experimental OpenACS 5 site, but I am just beginning to understand what goes on beneath the bonnet of the various existing applications and services, certainly not enough to attempt anything but minor changes.  I say this to put my question into the context of my experience.

I have a need to take the email list archives from a membership organization and make them available in a searchable form to the membership only.  Obviously, OpenACS is ideal for the controlled access requirement, paired with HyperMail, which can take the thousands of messages and structure them.  Hypermail is especially useful in that the historical emails are still being searched out from members,  and its capability to incrementally add material and reindex is ideal.

My first cut at this was to divide the archive into years, process a test year with Hypermail, create hundreds of individual HTML files plus index files, and upload a zip archive of the resulting directory into the File Storage module.  However, this approach produced extremely slow index file (~125 Mb) retrieval times, whether the files were stored in the database or the file system.  This could be addressed by dividing the archive into quarters or months, but this increases the complexity of performing incremental updates and of maintaining a set of user-friendly links into the File Storage area directories. Another drawback of using File Storage is that is it cumbersome to delete hundreds of individual files.

A better approach would seem to be taking the files generated by Hypermail, copying them to the web server filesystem, and registering them with OpenACS so they can be indexed by OpenFTS.  In OpenACS 3, I would have accomplished this with the Static Pages application.  In OpenACS 5, though, I am confused by the various alternatives: Basic Content Delivery System, Basic Content Management System, BCMS UI, BCMS UI Wizard, and so on.  I've searched the Q&A and CMS forums for advice, but most seems targeted toward package developers rather than to those trying to use a vanilla system and avoid customization whenever possible.

Can anyone point out documentation or provide advice that would more directly address my problem, give me a better understanding of the content management options, or otherwise help me crack this nut?  TIA

Collapse
Posted by Jeff Davis on
I think using static pages + hypermail is a pretty good low cost/safe/fast solution. I have worked some on importing mail to forums but it's not quite ready for prime time. Static pages works fine on OpenACS 5.
Collapse
Posted by Robin Felix on
My understanding of Static Pages is that the indexed content pages must be mounted under the <myserver>/www filesystem.  This defeats the controls imposed by OACS groups -- once the URL is recovered, it allows access to the content without restriction.

I want to be able to restrict access to organization members, a group that changes from year to year, and I don't want others to be able to access the archives using a URL that bypasses OACS login security.

If there were a way to accomplish this with Static Pages, I would use it.  Otherwise, I was hoping that a more robust OACS 5 Content Management application would do a better job.

Collapse
Posted by Randy O'Meara on
Robin,

I haven't done this, so I'm guessing that it will work. Somebody else pipe up if this is incorrect...

You can control access based on acs-subsite Application group membership.

My understanding is that you should be able to create a subsite (say, mlist), set it's join policy to closed, and mount a static pages instance under that subsite (say mlist/archive). You can control membership in the groups "mlist Administrators" and "mlist Members", created by the subsite instantiation code. If you already have a member group, you may be able to assign subsite permissions to that group, though I'm not certain.

You then should be able to create a directory "<acs root>/www/mlist/archives" and place your static content there.

/R

Collapse
Posted by Andrew Piskorski on
Robin, no, the Static Pages package is not limited to content under myserver/www/ (the web root), it will work just fine with pages under myserver/mypackage/www/ too. (It did the last time I used it for that anyway.)

I don't see why you'd want to use File Storage. These old email archives aren't files that people are going to be uploading, downloading, modifying, etc., right?

If you use Static Pages, there might be a better way to do it, but worst case, you can always handle the access control with an AOLserver registered proc or filter. That wouldn't be too hard.

Besides the searching part, what you've got here is a permissions mapping problem. In general, one way you can always solve that, is make your own little OpenACS package to model the mapping in the RDBMS. Create one acs_object for every entity you need to set permissions on. E.g., if the finest permissions granularity you need is "email list name", then you just create one acs_object for each email list. And set the permissions the way you want on those acs_objects - e.g., only readable by members of such-and-such group.

Once you have your permissions modeled in the database, then all you need some Tcl code to enforce those permissions. Basically, that means write some Tcl code to intercept requests for the URLs where the email archives actually live, figure out what acs_object maps to that URL, and do a permissions check on that acs_object. An AOLserver registered filter or registered proc is the most traditional way to do that. Which probably seems familiar from OpenACS 3.x. :)

There might be and probably are handier or better ways to use various OpenACS facilities to accomplish the same thing. .vuh files, the subsite stuff Randy mentioned above, etc. But it's all definitely doable.

Oh, and of course, every OpenACS package you create automatically comes with at least one acs_object you can set permissions on - the package instance itself. So in some cases, you might not need to write any data model at all, just mount, say, 5 instances of your package, one for each of your 5 email archives, set permissions on each package instance, and poof you're done. The ACS Request Processor automatically enforces permissions on mounted package instances, so the "write some Tcl code to enforce the permissions" part has already been done for you.

It's been a long time since I looked at Static Pages, but if I remember right, the simplest way to do this is probably to mount one Static Pages instance for each email archive sub-tree that you want to set permissions on, and just set permissions on each package instance. If that doesn't fit well with the content and access control you need, then probably fall back to writing you're own little data model and code to enforce its permissions.

Collapse
Posted by Tilmann Singer on
Another approach would be to use the edit-this-page package and programmatically create its content. The irc-logger package does that, see here for an example: https://openacs.org/irc/log/