Forum OpenACS Q&A: automatic file conversion via openoffice...

I recently read the following thread on the openoffice discuss list.
We have been thinking about a functionality to automatically convert
MS-Office documents to html/pdf, but didn't really have the time to do

I heard that openoffice wants to integrate conversion to pdf with
their next release. They already have conversion to html integrated. I
have never tested it, so I cannot comment on the quality. But it might
be a nice enhancement for dotLRN/dotWRK...

Posted by David Walker on
KDE 3.0 has a print-to-pdf feature that in my limited testing works
nicely.  Might be worth looking at as well.
Posted by Don Baccus on
I use OpenOffice 1.0 and the MS filters have been vastly improved in the last year.  I send folks Word docs and they never know they've been scammed by a piece of Open Source software :)  I've been reading Excel and Word docs sent me with no problem.  In my mind Sun has done a much better job achieving their goals with OO 1.0 than Netscape has with Mozilla (I'm running Mozilla 1.0RC1 and it's still annoyingly buggy, while OO 1.0 feels like professional software).

So if it is or becomes possible to use OpenOffice's filters in a straightforward way I think we'd have something good on our hands.  BTW OpenOffice generates real HTML so a Word->HTML conversion done by OO will be better than one done by Word - unless the Word doc has one of the increasingly rare things that makes OO choke.

KDE's print to PDF seems to work pretty well though I've had it barf on some things.  But I'm not running 3.0, like all things KDE-ish this utility has probably gotten better.

Posted by Tom Mizukami on
We have been using xml to perform this auto conversion. If your can get your contributors to simply use a MS Office template then there are many tools you can use to get xml that will validate against the docbook or simplified docbook dtd then converting to html and pdf is easy.
Posted by Roberto Mello on
Another alternative is to use antiword. It can dump word documents to html, text or postscript (that can be converted to pdf with ps2pdf).

However, antiword probably won't do as good a job as openoffice for more complex word documents, but it has done a pretty good job for me.

Posted by Ben Koot on
I came accros Pdf programm This project may fit in nicely with our globalisation issues, as it suports multiple languages Just a thought
Posted by zet ucu on
How to do it with Java:

import officetools.OfficeFile;
FileInputStream fis = new FileInputStream(new File("test.doc")); // works with xls also
FileOutputStream fos = new FileOutputStream(new File("test.pdf"));
OfficeFile f = new OfficeFile(fis,"localhost","8100", true);

All possible convertions:
doc --> pdf, html, txt, rtf
xls --> pdf, html, csv
ppt --> pdf, swf
html --> pdf

Maybe useful:
HTML to PDF with PHP, Java or ASP