Forum OpenACS Q&A: New Package: News Aggregator

Collapse
Posted by Simon Carstensen on
You can download the news-aggregator package at:

http://bcuni.furfly.net/news-aggregator-0.1d.apm

The package should sit well on top of an OpenACS 4.6 install.

The package features a basic news aggregator that reads a set of news sources on an hourly basis, finds the new items, and displays them in reverse-chronological order on a single page. Yup, pretty standard :)

Future improvements:

  • actual rss validation: currently no validation is done, adding an invalid RSS feed is simply cancelled or perhaps an error on a bad day.
  • integration with notifications
  • keywords: specify a list of keywords of interest and have the aggregator display items holding these words at the top of the page and/or email them to you
  • RSS autodiscovery
  • xml-rpc post function: right now the POST functions (appears once you've specified a lars-blogger instance URL under the news-aggregator parameters) simply posts to entry-edit?content=@content@. An XML-RPC interface should be built, so that users can post to any blog (additionally lars-blogger should have an XML-RPC interface, but that's a whole nother story :).
  • an admin interface
And that's pretty much it for now. Feel free to check out the package as you please and post some feedback if you have any.

Right now the file lives at my page, but should perhaps be moved to the CVS repository at some point? /Simon

Collapse
Posted by Don Baccus on
This is very cool.  Yes, we should have this in the repository at some point.

Does this work for both Oracle and PG?  That's a bottom line consideration that's been ignored at times in the past but won we really can't afford to ignore (except for the contrib section, which we'll be very loose with).

Add to your "future improvements" list - a portlet and dotLRN applet :)

Collapse
Posted by Simon Carstensen on
Does this work for both Oracle and PG?

It only works for PG at the moment. Next thing, apart from a few bug fixes here and there and working on the improvements, will be an Oracle port.

Portlet and applet hereby added to the list :)

/Simon

Collapse
Posted by Mark Aufflick on
The index fails if no blogger url property is set - the set post_url just needs wripping with if exists:

@@ -49,7 +49,9 @@
        set show_title_p 0
        set content "$item_description \[<a href=\"$link\">$title</a>\]"
    }
-    set post_url "${blogger_url}?[export_vars { content }]"
+      if {[info exists blogger_url]}  {
+              set post_url "${blogger_url}?[export_vars { content }]"
+      }
}

ad_return_template

Collapse
Posted by Mark Aufflick on
Sorry - how rude of me! It is very cool :)
Collapse
Posted by Mark Aufflick on
'nother bigfix for you:

--- news-aggregator-procs.tcl.bak      Wed Feb  5 04:34:25 2003
+++ news-aggregator-procs.tcl  Wed Feb  5 04:34:45 2003
@@ -260,7 +260,7 @@
            # need these to check against already added items
            # also we check whether link is an external or internal URL
            # if not, it might occur in other items, and we can't check against it
-          if { [exists_and_not_null link] && [na_check_link $link $source_link] } {
+          if { [exists_and_not_null link] && [na_check_link $link $feed_url] } {
                set identifier "link"
            } elseif { [exists_and_not_null description]  } {
                set identifier "description"

Collapse
Posted by Simon Carstensen on
This bug has been causing me some headache. Perhaps someone can help me out?

The na_check_link proc checks whether link is internal (i.e. a permalink) or external (i.e.points to a source). Take for example Evhead's feed (http://www.evhead.com/rss.xml) - the link nodes point to external URLs. The source he's commenting on is put in the link node. That's when I came up with the na_check_link proc. Later I noticed some doublets from Doc Searl's and David Winer's blogs coming in. It turns out that all Winer's link nodes point to URLs like http://scriptingnews.userland.com/backissues/2003/02/04#When:2:49:44PM. Hence it's detected as an external URL (since its domain name is different from the feed_url which is http://scripting.com/rss.xml). Doc Searl's link nodes point to http://doc.weblogs.com, which is the correct URL, only the feed is placed at http://partners.userland.com/people/docSearls.xml. Again his link nodes are detected as external links.

I'm not sure how to solve this problem. As for Doc Searls I solved it by using the URL of the website instead (which is http://doc.weblogs.com).

I have to make sure the link node doesn't point to an external source, since it's potentially used to check whether the item has already been added (so if Evan writes about a piece by Joi Ito, for example, and links to him and I've subscribed to Joi Ito's feed, there's going to be a doublet).

Any suggestions?

BTW, I'm not sure I get your bugfix Mark. What was wrong with my code?

/Simon