Forum OpenACS Development: Re: JODconverter vs. OpenOffice/LibreOffice Command Line

Collapse
Posted by Brian Fenton on
Hi Frank

sorry to wake up an old thread, but we have been looking into this anew, as some of our clients have really been hammering the LibreOffice convertor, and we're looking into solutions. The obvious solution is to implement some form of queue management, and I will look into that later (maybe ns_proxy can help?), but right now I'm looking at tuning the LibreOffice/Pyodconvertor.

Regarding the --convert-to flag, from what I've been reading, the key difference between that and Pyodconvertor/Unoconv/JODconverter is that it has to start up and tear down the LibreOffice instance for each call, whereas the latter maintain the server listening on a port, thus avoiding the cost/delay of startup/tear-down. So that is a strong performance reason NOT to use the --convert-to flag.

Our real problem is that Libreoffice becomes unresponsive once it receives multiple requests simultaneously, so we are looking into how best to set up some form of queue management. If we do it within OpenACS, we could perhaps have some monitoring available to us in the application, but if we find an existing tool that can do it externally to OpenACS, and that is easy to integrate, we may just go ahead with that.

I also came across some LibreOffice tuning suggestions here https://askubuntu.com/a/857183/171033

Any other suggestions welcome!
Brian

A simple form of queue management is NaviServer's job queue. If one creates such a queue with a single thread associated, jobs in this queue are serialized, and no two jobs can be executed in parallel

#
# Create a job queue named "q1" with a single thread (do this only once)
#
ns_job create q1 1

#
# Add some jobs to this queue. Every job takes about 2 
# seconds and writes then a message to the log file
# for demonstration purposes.
#
for {set i 1} {$i<10} {incr i} {
    ns_job queue -detached q1 {ns_sleep 2; ns_log notice hi}
}
Hi Brian,

Sorry, just saw the new posting right now...

We never faced the problem of parallel LibreOffice conversions before, and we don't have a solution for it. In typical ]po[ use scenarios, there is usually a single accountant in charge of generating printable invoices for customers. No other documents require conversion, usually.

My personal coding style for these type of structures would be to generate short BASH scripts from within OpenACS, and to place them in a specific directory. Then use cron or similar (once per second) to poll if one of these scripts is running, and start the next one.

But the solution from Gustaf appears to be much more elegant...

However, I'm not clear about how to return the converted result to the user. This is asynchronous, and may take minutes or even hours in a congested system. This is beyond any HTTP timeout.

I guess that is something you'd have to fix on the application level. Maybe by sending the result to the user via Email? Maybe using manually polling via a "check if ready" page? This problem would apply to both ns_job and BASH scripts...

Cheers
Frank

The "-detached" flag of my ns_job example is for demo purposes, causing asynchronous behavior (many things are added to the queue, the jobs are executed one-by-one in the background). When ns_job is called without this flag, then the caller will wait for the result (or until the configurable timeout is reached).

... with "ns_job" one can run arbitrary commands, also including arbitrary db-operations or "exec"... so there is much flexibility. One possible feature missing in ns_job is persistency on server restarts, but that might or might not be an issue for some applications. For persistency, there is the option of xowf's at-commands or rolling your own db-based command queue, similar to acs-mail-lite.

Thanks Gustaf and Frank. Very useful ideas, and ns_job looks great, and could be a quick win to at least stop LibreOffice server from dying. I think we can live without persistency right now. Maybe in future, we'll look at a db-based command queue.

Cheers!
Brian

Hi Brian,

while moving over to Dockerized Versions of ]project-open[ I "stumbled" (after hours of looking) at an unoconv image which runs nicely in it's own container.

https://github.com/alphakevin/unoconv-server

Testing it with curl provided the results I wanted without running so far into the problem of multiple parallel conversions (especially as I might spawn multiple containers in case that is needed).

Still need to change our code though to us ns_http or chilkat (https://www.example-code.com/tcl/http_multipart_form_data.asp) with the container.