Forum OpenACS Development: Re: Calling C library functions from an AolServer filter.

Posted by Gustaf Neumann on
From the documentation of HDF i see that the library achieves thread safety through serializing API calls. One can achieve a similar behavior via ns_proxy or libthread. This won't allow concurrent HDF calls, but should work otherwise.

One can load a C-implemented Tcl extension into the aolserver via the Tcl command load (or via package require, which uses load).

Posted by Jorge Couchet on
Well, it raises more doubts:

a) Regards the HDF library concurrent use: The only place where the HDF library will be used is inside the filter. Each connection to the AolServer is managed by its own thread. How are managed the filters?

b) Regards the load command: I suppose that I should use the load command in the TCL proc that attach to the filter. There is a performance penalty here (compared against an AolServer module), right?



Posted by Tom Jackson on
Uggg, you have a long way to go. I don't think there is much anyone can do to direct you. Let me at least try:

1. C libraries can be wrapped into an AOLserver module.
2. AOLserver modules expose one or more Tcl commands.
3. You can use these Tcl commands anywhere in AOLserver.

4. Filters are part of an HTTP request pipeline
5. Filters can use any (number of) available Tcl commands/procs.
6. Filters return a signal to the request pipeline:

7. There are three filter points:

8. Registered procedures (at most one per request) fire after the postauth, and only if filter_return was not returned by a prior filter.

9. Trace filters always run.

10 None of this has any direct relationship to modules, modules usually just provide additional functionality.

Anyway, if HDF isn't wrapped in some kind of AOLserver module and exposed via a Tcl API, it is pretty useless. Maybe you could load it, but you couldn't access any of the functionality without the wrapper.

Posted by Jorge Couchet on
Well, at least know that it isn't an easy task :-).

I have clear the AOLserver module part and the necessity to wrap the HDF library.

I don't have clear the context where the HTTP request pipeline is executed:

(1) Is it executed inside of its own thread?
(2) How can I create and reserve a buffer (a kind of cache) available to all the HTTP request pipeline processes (but accessible uniquely to the proc attached to my filter)?
(3) There is any documentation (in addition to the AOLserver source code) available to read about the items (1) and (2)?

Thanks again!


Posted by Andrew Piskorski on
Jorge, what are you trying to use HDF5 for anyway? (It's conceivable that what you want to do can be accomplished some other way.) Also, just what is it you want to accomplish in this AOLserver filter, and why?

There is apparently tclhdf, which provides limited read-only access to HDF from Tcl. Depending on what that already has, it might or might not help you get started on wrapping the HDF5 C libraries.

AOLserver filters probably run in each connection thread. The simplest way to check it probably to put an ns_log statement into your filter, and then look in your server log to see what the thread identifier is.

I don't know what you actually want with your "buffer" question, you'll need to be more specific. A buffer or cache of what, for what? What are trying to do with it, and why do you want such a thing at all?

Posted by Jorge Couchet on
Hi Andrew,

The problem to resolve is the following:

I'm designing a tracking an auditing API in order to provide services to all the applications that needs tracking and auditing. Each application it's responsible for providing the definition of the data to be tracked, and for the processing of the tracked data to extract meaning for it.

One problem to resolve is the bottleneck that represents to save the tracked data to a database, and the database growing. A possible solution is to save the data in an alternative format, and left in the database the enough metadata in order to track this data. When an application A1 needs to audit the data, first checks the metadata in order to load to the database the necessary data (through a function provided by the tracking and auditing API), and perform the analysis of this data. The results of the analysis could be added to the alternative format, enriching the data, and enabling its access to other application that don't need to know A1.

The alternative format could be HDF, because it's very efficient storing and managing a huge amount of information, is portable, and allows to model complex relations.

One solution is to call the HDF API from the filter (that is in charge to track the data and generate the basic metadata).
Another solution is to generate a kind of "ring buffer" (as the Data Turbine Inititive manages), the filter is sending the data to this "ring buffer", and the data from the buffer is saved to a disk. Another process is in charge to transform this data to the HDF format and load in the database the metadata necessary to track it.

Thanks for your help!



Posted by Tom Jackson on
Given the scale of what you are trying to do, here is what I would suggest:

1. first use a 'trace' filter. These run after the connection has closed, they always run, and theoretically you have the most information about what happened, essentially you can record information during the request and handle it at the end. Trace filters don't interfere with the quickness of serving the request.

2. Dump you collected information to a log file of some type. You may be able to just piggyback on the error log, or the access log, or create a new file to accept output.

3. Unfortunately, 2 requires that there is some text format for the HDF. I bet there is one.

4. There are certain utilities that can filter the error log to multiple files (multilog), so using the error.log facility is a good choice.

5. Occasionally process the output files with whatever tools are available for your HDF client/server.

Usint a text file buffer, you can start immediately instead of creating a module specific to AOLserver or Tcl, and you can test each part separately. You also have a transaction log, and you don't have to worry if your HDF software errors out, or if you change software.

Posted by Jorge Couchet on
Hi Tom,

First, thanks again for your help!!.

Your solution is close to my second alternative:

Another solution is to generate a kind of "ring buffer" (as the Data Turbine Initiative manages), the filter is sending the data to this "ring buffer", and the data from the buffer is saved to a disk. Another process is in charge to transform this data to the HDF format and load in the database the meta-data necessary to track it.

My only doubt is about the efficiency if is the filter that directly dump the information to the text file, because it'll be requesting I/O access per each request, and with a kind of "ring buffer", the I/O operations could be delayed until to have a pre-configured information amount (as for example 100 connections).

I like the idea to use a trace filter!



Posted by Tom Jackson on
The text file is your buffer, with something like multilog, you can have more than one, so you only process the ones that are closed. This provides a separation of writing/reading from the file.

Please note that I have no idea what data you are collecting, but only that eventually everything starts as a string, and Tcl is very good at producing strings. AOLserver is even better at handling strings (including binary data).

But I did look at the HDF, which I found out stands for Hierarchical Data Format, there appears to be a significant amount of up-front data-modeling. If you continue to work on this, it would be very interesting to hear from you on how you use HDF.

The details of applying this to AOLserver (how to use this in a filter) are simple by comparison, so continue to ask for help. A good starting point is simply writing data to a file, even from a Tcl shell, that does what you want, and can be used by whatever downstream tools you choose, after that, adding that to AOLserver is easy.