Forum OpenACS Development: JODconverter vs. OpenOffice/LibreOffice Command Line

Hi!

I know that Quest and Cognovis both use JODconverter in order to convert some file formats back and forth. Brian told me once about this option and Malte actually wrote a package being used in ]project-open[ V4.0.

However, using the "ooffice" command line allows for exactly the same results, but:

- doesn't need an open office server running in the background consuming memory
- doesn't need an extra package to capsulate the IPC and error conditions
- can run multiple threads in parallel, while JODconverted needs to do queuing or similar(?)

The entire conversion is reduced to a single exec line:

exec ooffice --headless --convert-to pdf --outdir /tmp/ $odt

Where is my mistake? 😊

Frank

I second your choice. Running OpenOffice/LibreOffice from the command line is a very powerful way to convert from one format to another.

One thing I do, for example, is to create a print template as a flat odt (.fodt), then use it as a lean and mean adp template in my page, to be filled with data. Once I have produced the fodt, I can convert it to whathever formats OpenOffice/LibreOffice supports. I have struggled a lot with prints in the past, and I find this just wonderful, as most of the stupid formatting is handled by the office suite itself!

I don't know JODconverter, but back in the days when direct calls were not possible I remember using unoconv in a somewhat similar way. No need to say that requiring one software is easier than requiring two.

Ciao

Hi Antonio,

How do you call libreoffice? If I try to call libreoffice directly using exec, I have a problem with the PATH missing.

[Java framework] Error in function createSettingsDocument (elements.cxx).
javaldx failed!
Warning: failed to read path from javaldx
while executing
"exec /usr/bin/libreoffice --headless --convert-to pdf --outdir /var/lib/aolserver/<LONGPATH> ..."
invoked from within
"ns_proxy eval $handle "exec $call""

Curious how you call it and if you might have a wrapper function for it. I tried:

exec /usr/bin/libreoffice --headless --convert-to $convert_to --outdir $outdir $oo_file

Solved my own problem using google and looking for a PHP solution instead of TCL, which taught me that the HOME env was missing....

exec -- /usr/bin/env HOME=[acs_root_dir] /usr/bin/libreoffice --headless --convert-to $convert_to --outdir $outdir $oo_file

Yep, more or less the same thing I came up with. The problem is Libreoffice needs to write stuff in its home directory, therefore this must be set to something you have write permissions into.

Kudos for solving! 😊

We've got the oofice call in /packages/intranet-invoices/www/view.tcl, for example. That looks like this:

set result [im_exec bash -l -c "export HOME=~\$\{whoami\}; ooffice --headless --convert-to pdf --outdir /tmp/ $odt_zip"]

Cheers,
Frank

Well, over six years ago when this was developed, ooffice did not have that option, thats why jodconverter and later pyod converter came into being.

If it works for you, perfect, I actually changed my intranet-openoffice package to use pyodconvert

set status [catch {ns_proxy eval $handle "exec -- /usr/bin/python3 [acs_package_root_dir intranet-openoffice]/pyodconverter/main.py $oo_file $output_file" 5000} result]

I personally will keep the intranet-openoffice package, but maybe exchange the phython call above with a direct exec on libreoffice.

Hi Malte,

Just for curiosity: Why would you use pyodconverter and not ooffice directly?

Cheers,
Frank

Hi Frank,

are lack of knowledge and never change a running system good answers?

Other than that I second Brian in thanking you for pointing this out and look forward to experimenting with flat ODT.

Best wishes
Malte

Collapse
Posted by Brian Fenton on
Hi Frank

as Malte said, that option didn't exist at the time. We use Pyodconverter, and there are some options available that it's not clear to me if you have available with the oofice --convert-to command.

For example, scaling options, over-riding e.g. FooterIsDynamicHeight

But I must thank you for raising the subject, as it's great to have options.

Brian

Collapse
Posted by Brian Fenton on
Hi Frank

sorry to wake up an old thread, but we have been looking into this anew, as some of our clients have really been hammering the LibreOffice convertor, and we're looking into solutions. The obvious solution is to implement some form of queue management, and I will look into that later (maybe ns_proxy can help?), but right now I'm looking at tuning the LibreOffice/Pyodconvertor.

Regarding the --convert-to flag, from what I've been reading, the key difference between that and Pyodconvertor/Unoconv/JODconverter is that it has to start up and tear down the LibreOffice instance for each call, whereas the latter maintain the server listening on a port, thus avoiding the cost/delay of startup/tear-down. So that is a strong performance reason NOT to use the --convert-to flag.

Our real problem is that Libreoffice becomes unresponsive once it receives multiple requests simultaneously, so we are looking into how best to set up some form of queue management. If we do it within OpenACS, we could perhaps have some monitoring available to us in the application, but if we find an existing tool that can do it externally to OpenACS, and that is easy to integrate, we may just go ahead with that.

I also came across some LibreOffice tuning suggestions here https://askubuntu.com/a/857183/171033

Any other suggestions welcome!
Brian

A simple form of queue management is NaviServer's job queue. If one creates such a queue with a single thread associated, jobs in this queue are serialized, and no two jobs can be executed in parallel

#
# Create a job queue named "q1" with a single thread (do this only once)
#
ns_job create q1 1

#
# Add some jobs to this queue. Every job takes about 2 
# seconds and writes then a message to the log file
# for demonstration purposes.
#
for {set i 1} {$i<10} {incr i} {
    ns_job queue -detached q1 {ns_sleep 2; ns_log notice hi}
}
Hi Brian,

Sorry, just saw the new posting right now...

We never faced the problem of parallel LibreOffice conversions before, and we don't have a solution for it. In typical ]po[ use scenarios, there is usually a single accountant in charge of generating printable invoices for customers. No other documents require conversion, usually.

My personal coding style for these type of structures would be to generate short BASH scripts from within OpenACS, and to place them in a specific directory. Then use cron or similar (once per second) to poll if one of these scripts is running, and start the next one.

But the solution from Gustaf appears to be much more elegant...

However, I'm not clear about how to return the converted result to the user. This is asynchronous, and may take minutes or even hours in a congested system. This is beyond any HTTP timeout.

I guess that is something you'd have to fix on the application level. Maybe by sending the result to the user via Email? Maybe using manually polling via a "check if ready" page? This problem would apply to both ns_job and BASH scripts...

Cheers
Frank

The "-detached" flag of my ns_job example is for demo purposes, causing asynchronous behavior (many things are added to the queue, the jobs are executed one-by-one in the background). When ns_job is called without this flag, then the caller will wait for the result (or until the configurable timeout is reached).

... with "ns_job" one can run arbitrary commands, also including arbitrary db-operations or "exec"... so there is much flexibility. One possible feature missing in ns_job is persistency on server restarts, but that might or might not be an issue for some applications. For persistency, there is the option of xowf's at-commands or rolling your own db-based command queue, similar to acs-mail-lite.

Thanks Gustaf and Frank. Very useful ideas, and ns_job looks great, and could be a quick win to at least stop LibreOffice server from dying. I think we can live without persistency right now. Maybe in future, we'll look at a db-based command queue.

Cheers!
Brian

Hi Brian,

while moving over to Dockerized Versions of ]project-open[ I "stumbled" (after hours of looking) at an unoconv image which runs nicely in it's own container.

https://github.com/alphakevin/unoconv-server

Testing it with curl provided the results I wanted without running so far into the problem of multiple parallel conversions (especially as I might spawn multiple containers in case that is needed).

Still need to change our code though to us ns_http or chilkat (https://www.example-code.com/tcl/http_multipart_form_data.asp) with the container.