Forum OpenACS Development: Extend Forums to suport Email List archives

First some background to the itch I want to scratch: With the sole exceptions of the various versions of the OpenACS Forums/BBoard package and Google Groups, every web-based bulletin board software I've ever seen sucks. And as I've briefly mentioned elsewhere, every web-based interface to email list archives I've ever used also sucks, without exception.

To my mind this is a ridiculous state of affairs, and I propose to remedy it, like so:

  1. Extend the OpenACS Forums package to support importing mailing list archives and displaying them as read-only web Forums.

  2. Also add the ability to take arbitrary mailboxes of saved emails and import them as if they were a list archive. Distinguish and filter out duplicate emails.

  3. Add admin UI to control permissions, categorization, content editing, etc. of such imported email data. (For private non-list emails, on initial import default the permissions to those users who actually received the emails.)

  4. Extend the functionality of Forums search. In addition to plain full-text search, it should support searching for content by particular authors, within particular dates, in particular combinations of Forums, possibly by various metadata tags, etc.

    The above features should make OpenACS very useful for archiving and searching the bazillions of emails saved at most companies and other organizations. Important business or technical details are often contained only in emails, which are then mis-filed and effectively lost. Yet most employees have big folders of old work emails lying around, they just can't organize or search them effectively, and can't let anyone else do so either.

    But that's a solvable problem... Take all of those email folders, automatically suck them into your single centralized store, and type some queries into the "search" box.

  5. Add functionality to also export web-based Forums content to email archive and/or Usenet format.

    A very serious drawback of all current web-based forums is that they are 100% centralized, there is no easy built in distribution or redundancy at all for all that data. This has always worried me...

    When ArsDigita went away, its years of BBoard content did not survive in any format amenable for importing into openacs.org or another OpenACS instance. We were lucky that any of that content surived as static HTML pages at Red Hat at all. If the OpenACS server ever melted down, would we lose content? I don't know, but I do know that we would be 100% dependent on retrieving a recent PostgreSQL backup, etc.

    It shouldn't be that way. I should be able to provide a read-only mirror of all the OpenACS Forums content, by just asking the openacs.org maintainers to flip one switch.

I haven't actually looked at the OpenACS Forums code at all, so I don't know how difficult this would be. I believe OpenACS and/or Tcllib also both have various email handling functions which should help this work, but I'm not yet familiar with those either.

I would like to work on this, but probably won't have time until summer 2006. Anyone else also interested in this project?

Any comments or further thoughts?

Collapse
Posted by Lurch . on
Not directly related to the above, but you asked for further thoughts... see

http://noanoa.ics.es.osaka-u.ac.jp/~k-choy/mlwiki-test/mlwiki.php

if nothing else, it's interesting...

Collapse
Posted by Andrew Piskorski on
Links to any other projects with similar goals are definitely appreciated. But unfortunately, the guy doing MLwiki has given almost no info on just what exactly he wants to accomplish and how he's going about it - not very useful.

Btw, "Lurch", we strongly prefer real names in these forums, and registering with bogus "spam@" email addresses is seriously frowned upon.

Collapse
Posted by Caroline Meeks on
Great project!

The system should also be able to deal with attachments and also make them searchable.

Does anyone know which IMS specification would be used for forums content? That would be worth looking at as you looked into export.

Andrew, I have projects in the pipeline who might require us to do some of the pieces of this vision. I'll keep you in the loop if/when the projects happen.

Collapse
Posted by Malte Sussdorff on
as for 4). The mail tracking package already does part of that. At least it allows you to store an e-mail, the recipient, sending date and the object_id (why the e-mail was send in the first place). The resulting list is searchable and can be limited to a data range.

Which reminds me to change forums to call acs-mail-lite::send (or complex_send, which supports multiple mime types) with the "no_callback_p" switch, if it is using acs-mail-lite in the first place...

Collapse
Posted by Daniël Mantione on
On freepascal.org we use Mailman. I'm currently using static pages to make my mailing list archives searchable. That's a killer feature for the website, but it doesn't work ideal since the "presentation" around each e-mail is also considered by the search system, it affects both the search results (some words appear on all pages) as the summary that is shown after each result (includes presentation).

So I'm quite interrested in this feature. However, I would make a separate package, because:

  • Not all features of forums make sense for mailinglists
  • There is metadata in e-mails which can be put to good use (message-id, references). The database tables for forums have not been designed for this purpose.
  • User interface issues. E-mail might need a slightly different user interface than forums.

Mailman stores all its e-mail archives both in html format as in mbox format. The mbox format could be nicely used to build an user interface uppon and provide a search wide search service contract.

Of course the situation could be very different in a different mailinglist manager. What mailinglist manager were you thinking about?

Collapse
Posted by Andrew Piskorski on
Daniël, of course I would prefer to save all useful mail header metadata so that going from email file archive to database and back does not lose anything significant. That will need investigation.

Mailing list managers should be more or less irrelevant. AFAIK they all store any archived emails just like a client email reader does, in one of three or four different standard file formats. At least for the first pass that's all that matters. As the project actually gets going we'd want to look into that more, as it's likely that different mailing list manager programs insert additional headers into the emails which may contain useful metadata.

I have no opinion yet whether this new functionality should be added into the existing Forums package, constructed as a second package, or what. I definitely think that the existing Forums data model, code, and especially UI should be re-used as much as is possible and practical. Web-based forums, email lists, and Usenet groups are all merely differently flavored implementations of the same basic concept, the "discussion list", and the software design should reflect that. I seek to unify them.

Collapse
Posted by Carl Robert Blesius on
I totally agree Andrew and am very interested in making it happen (actually in the process of writing a proposal right now and I would like to talk with you about it in person seeing we are in the same area).

For me it has to do with importing MASSIVE amounts of listserv archives (which are in mbox format) and making them categorizable and searchable (lists go back to the mid 1990s). This includes storing attachments (e.g. medical images) so they can be used and categorized.

Please contact me per email if you have time to meet soon (I will post my final proposal in the forums, but anyone else in the area is welcome)

Collapse
Posted by Andrew Piskorski on
I recently stumbled across one little known web email list archive tool which doesn't totally stuck, ezmlm-browse (search on google). It has various flaws, e.g., no full-text search, and its UI features seem confusingly inconsistent. But, shockingly, it actually has a link to see all messages by a particular user. OpenACS/ACS has had that obvious feature since 1996 or so, Google Groups (the former DejaNews) has it, and now ezmlm-browse. So that's (only) three I'm aware of.

Another feature a good mailing list archive should have is a button to say, "Please forward me a copy of this particular email." (Of course you must be required to login under SSL and have an authenticated email address in order to use that feature.) I've never yet seen an archive with that.

Collapse
Posted by Andrew Piskorski on
Hah, I just now noticed that this very forum has a "Forward" link, which comes very close to my idea of how a "Forward me this email please" button should work. (The only obvious flaw I see is that it only lets you forward a single message, not a a whole thread or selections from a thread. And it should fill in your own email address by default.)

Clearly a new feature the got rolled out in the recent openacs.org upgrade. Whoever implemented that, kudos to you, it's nice.

Collapse
Posted by Andrew Piskorski on
There is also the issue of managing email lists and keeping spam off them. That's not really something I want to address in this project, but it is clearly related, and has substantial synergies with it and with OpenACS.

Donald Becker just recently raised this issue on the Beowulf list, and I made a few comments about it there.

Collapse
12: Webmail and MUAs (response to 1)
Posted by Andrew Piskorski on
Possibly relevant is Hipp's (SQLite backed) Experimental Mail User Agent, and of course the data model and/or code from Jin Choi's old (Oracle backed) ACS Webmail package.
Collapse
13: grepmail (response to 1)
Posted by Andrew Piskorski on
I'm told that grepmail is useful for regexp and date-based searching of email archive files, and various MUAs have interfaces to it. grepmail's implementation is a Perl script, about 2400 lines total (including comments, etc.).
Collapse
Posted by Caroline Meeks on
Has there been any progess on this project? I am currently have a proposal to write that might use some of these features.

We may also need to deal with email attachments when someone replies by email. Has anyone done this?

Thanks

Collapse
Posted by Andrew Piskorski on
Caroline, no, no progress that I'm aware of. Carl Robert Blesius and I talked about it back in March. He was going to write up a draft spec., and I was going to read through the current Forums codebase to get an idea of how suitable it might be as a base for this project. Unfortunately, I never did that.   :(   I haven't been in touch with Carl since then, and don't know what progress he made on his spec.
Collapse
16: Zawinski's Intertwingle (response to 1)
Posted by Andrew Piskorski on
Also somewhat relevant are Jamie Zawinski's Intertwingle ideas, which I read about many years ago, but somehow forgot to mention before.