Forum OpenACS Development: Re: Calling C library functions from an AolServer filter.

Hi Andrew,

The problem to resolve is the following:

I'm designing a tracking an auditing API in order to provide services to all the applications that needs tracking and auditing. Each application it's responsible for providing the definition of the data to be tracked, and for the processing of the tracked data to extract meaning for it.

One problem to resolve is the bottleneck that represents to save the tracked data to a database, and the database growing. A possible solution is to save the data in an alternative format, and left in the database the enough metadata in order to track this data. When an application A1 needs to audit the data, first checks the metadata in order to load to the database the necessary data (through a function provided by the tracking and auditing API), and perform the analysis of this data. The results of the analysis could be added to the alternative format, enriching the data, and enabling its access to other application that don't need to know A1.

The alternative format could be HDF, because it's very efficient storing and managing a huge amount of information, is portable, and allows to model complex relations.

One solution is to call the HDF API from the filter (that is in charge to track the data and generate the basic metadata).
Another solution is to generate a kind of "ring buffer" (as the Data Turbine Inititive manages), the filter is sending the data to this "ring buffer", and the data from the buffer is saved to a disk. Another process is in charge to transform this data to the HDF format and load in the database the metadata necessary to track it.

Thanks for your help!

Regards,

Jorge.

Given the scale of what you are trying to do, here is what I would suggest:

1. first use a 'trace' filter. These run after the connection has closed, they always run, and theoretically you have the most information about what happened, essentially you can record information during the request and handle it at the end. Trace filters don't interfere with the quickness of serving the request.

2. Dump you collected information to a log file of some type. You may be able to just piggyback on the error log, or the access log, or create a new file to accept output.

3. Unfortunately, 2 requires that there is some text format for the HDF. I bet there is one.

4. There are certain utilities that can filter the error log to multiple files (multilog), so using the error.log facility is a good choice.

5. Occasionally process the output files with whatever tools are available for your HDF client/server.

Usint a text file buffer, you can start immediately instead of creating a module specific to AOLserver or Tcl, and you can test each part separately. You also have a transaction log, and you don't have to worry if your HDF software errors out, or if you change software.

Hi Tom,

First, thanks again for your help!!.

Your solution is close to my second alternative:

*****************
Another solution is to generate a kind of "ring buffer" (as the Data Turbine Initiative manages), the filter is sending the data to this "ring buffer", and the data from the buffer is saved to a disk. Another process is in charge to transform this data to the HDF format and load in the database the meta-data necessary to track it.
*****************

My only doubt is about the efficiency if is the filter that directly dump the information to the text file, because it'll be requesting I/O access per each request, and with a kind of "ring buffer", the I/O operations could be delayed until to have a pre-configured information amount (as for example 100 connections).

I like the idea to use a trace filter!

Regards,

Jorge.

The text file is your buffer, with something like multilog, you can have more than one, so you only process the ones that are closed. This provides a separation of writing/reading from the file.

Please note that I have no idea what data you are collecting, but only that eventually everything starts as a string, and Tcl is very good at producing strings. AOLserver is even better at handling strings (including binary data).

But I did look at the HDF, which I found out stands for Hierarchical Data Format, there appears to be a significant amount of up-front data-modeling. If you continue to work on this, it would be very interesting to hear from you on how you use HDF.

The details of applying this to AOLserver (how to use this in a filter) are simple by comparison, so continue to ask for help. A good starting point is simply writing data to a file, even from a Tcl shell, that does what you want, and can be used by whatever downstream tools you choose, after that, adding that to AOLserver is easy.