Forum OpenACS Q&A: Forum XML feed

Posted by Simon Carstensen on
With the much talk about logging the weekly chat and the already excisting CVS forum, perhaps creating an XML forum feed would be quite useful? rss-support could be used, although I have no idea about how effective it is working as an xml export service...

The idea of being able to subscribe to recent posts from the OpenACS forums in my Radio News Aggregator alongside other news feeds seems pretty cool to me.

Any thoughts?


2: Re: Forum XML feed (response to 1)
Posted by Roberto Mello on
That's a neat idea. When will you have it done? 😃


3: Re: Forum XML feed (response to 2)
Posted by John Sequeira on
I wrote a perl cgi script to convert the old site's web/db page from xml to rss (using LWP, XML.  It was pretty simple,  but of course it broke when the site was redesigned.

I know this isn't the 'right' way to do it,  but it worked for me.  It's your call whether revising this would be easier than implementing something serverside on


use strict;
use LWP::Simple;
use XML::RSS;
use HTTP::Date;
#use Data::Denter;
#use Carp;
#use File::Slurp;
use HTML::TokeParser;
use URI;

use constant IS_CGI => 1;
my $URL = '';;
use constant COPYRIGHT => "";
use constant DESCRIPTION => "OpenACS BBoard";
use constant TITLE => "OpenACS BBoard";
use constant EMAIL => '';
use constant RSS_FILE => 'openacs.rss';
use constant DATE_FILE => 'openacs.dat';

my ($content_type, $document_length, $current_build_time, $expires, $server) = &LWP::Simple::head($URL);
my %items;  # rss items title=>description
my @item_keys;

my $content = &read_file('index.html') || &get( $URL ) or die $!;

#Create a TokeParser object, using our downloaded HTML.
my $stream = HTML::TokeParser->new( \$content ) or die $!;

#For every h4 element, parse out title and description
while ( my $tag = $stream->get_tag("li") ) {

    my $permalink = $stream->get_tag("a");
    my $link = $permalink->[1]{href} || "--";
    last if $link =~ /category=Development/;
    my $url = URI->new_abs( $link, $URL );
    my $title = $stream->get_trimmed_text("/a");  # use contents of h2 tag for item title
    $url =~ s/&/&/g;
#    $stream->get_tag('a');

    $items{$title} =  { title => &clean_url( $title ),
            description =>  $stream->get_trimmed_text('a') . " " .  $stream->get_trimmed_text('/a'),
            link => $url } ;  # go to next table tag (which begins comments)
    push @item_keys, $title;

#die Denter ( \%items );

if (-e $RSS_FILE) {
  if (! &homepage_changed ($current_build_time ) )  {
    # redirect to the old file
    print &read_file($RSS_FILE);

my $rss = new XML::RSS (version => '0.91');
my $current_build_date = time2str($current_build_time);
my $link = $URL;
$link =~ s/&/&/g;
$rss->channel(title          => TITLE ,
              link          => $link,
              language      => 'en',
              description    => DESCRIPTION,
#              rating        => '(PICS-1.1 " r (SS~~000 1))',
              copyright      => COPYRIGHT,
              pubDate        => $current_build_date,
              lastBuildDate  => $current_build_date,
              managingEditor => EMAIL,
              webMaster      => EMAIL,

for (@item_keys) {
  $rss->add_item( %{ $items{$_} }  );

print "Content-type: text/xml\n\n" if IS_CGI;
print $rss->as_string;

#utility function for cleaning the feed
sub clean {
    my $text = shift;
    $text =~ s/read more//gi;

sub clean_url {
    $_[0] =~ s/&/&/g;

#returns true if homepage has changed since last time we checked
sub homepage_changed  {
    my $current_build_time = shift;

    my $date_file = DATE_FILE;
#    my $last_build_time;

    #if (-e $date_file) {
    my    $last_build_time = &read_file($date_file);

    if ( ($current_build_time ne $last_build_time) ||
     (! -e $date_file)) {

    open OUTFILE, "> $date_file";
    print OUTFILE $current_build_time;
    close OUTFILE;
    return 1;

    return undef;

#utility function to slurp a file
sub read_file {

    my $file = shift;
    local $/;
    open INFILE, "< $file" || die $!;  #not sure if I want to die here...
    my $text = <INFILE>;
    close INFILE;
    return $text;


=head1 NAME - takes lars' home page and converts it into an RSS file


This script can be run in several ways, and also receive it's input in several ways:

If it's run from a directory containing index.html,  it will use that file as it's source.
If it can't locate index.html,  it  will grab a home page using LWP::Simple as it's source.

Once it has the home page,  it will parse the news items and transform them into RSS.

It will print the resulting RSS file to STDOUT,  and also save it to an RSS file (mono.rss)

It will only reparse the home page if determines that the content has changed (it figures this out by performing a HEAD)

=head1 USAGE



install it as a CGI and call http://yourserver/cgi-bin/


=item *

=item *

=item *

=item *


=head1 AUTHOR

John Sequeira