Forum OpenACS Q&A: server side document to .pdf conversion
uploaded to my file-storage (mostly .doc, .xls, .ppt etc.) into .pdf
whenever a user clicks on a button "convert this document to pdf"
Additionally I would like to add a password to every pdf that has
been created. The password will consist of the "user_id +
Has anyone experience with such a conversion. What are the costs
concerning Adobe (I am looking for a low budget solution)???
Actually I want to convert the document to .pdf first and whenever a user wants to download that document, a password (his user_id + version_id) is being enhanced to the existing .pdf!
The user downloading the .pdf will have to type in his user_id + version_id in order to open the .pdf...
I don't know for .doc and other files, but if you have an XML one you could use FOP, it's simple to use and extremely low budget :)
Create a PDF Service with Samba
Use GhostScript to create a PDF document out of any PostScript printer job
it describes how to set up a Samba pdf file printer and how to use it with the lpr command (I don't know how lpr handles .doc, .xsl etc., a specific driver or converter might be needed to spit .ps to the printer). I'm too new to tcl and ns api to tell how to send a command line from a web script, but it must be possible.
Adobe does have such a thing as the Acrobat Distiller Server: http://www.adobe.com/products/acrdis/main.html
but it is looking to convert a postscript file, not start with the raw document. And yes, it is pricey: $5,000 US for 100 registered users, and $10,000 for unlimited.
Getting into postscript is going to be the kicker. I don't know of anything that can take all those formats and export them into postscript.
Maybe you should make your users do that part, and upload the postscript document? From there it looks easy and cheap.
Generating a postscript file on windows is as easy as having a postscript printer driver installed (even if you don't have a printer for it) and then printing to a file!
Lastly (sorry to be a wet blanket--it really is a great idea), there are tons of job options to be set in Distiller, and sometimes you want different options depending on the contents of the file.
This is a tough problem--it seems like it would be easier just to convert the file to PDF before you upload it.
Doc2pdf is an email robot that converts Microsoft Office attachments (.doc, .ppt and .xls) to PDF files. All you need do is carbon-copy (CC) doc2pdf when you email a Microsoft Office document. Doc2pdf converts the attachment to a PDF file and sends the PDF file, as an attachment, in a reply to all recipients.
It's an email robot, but you could probably hack it to work serverside.
Unfortunately I have not been able to get OpenOffice's print-to-PDF option to work for as long as I have been using it. In the meantime, the absolutely excellent KDE printing architecture (see below) allows one to print to a PDF. So, we simply print to a Postscript (.ps) file, click on that to open it in KGhostView, and then click "Print" and choose "Print to PDF." Voila, you have a gorgeous PDF.
Prior to the KDE print architecture, we used createpdf.adobe.com . For $10/month you get to create an unlimited number of PDFs from many file formats, including Postscript (which all UNIX apps with printing capability can export). It also converts .doc and .xls files as well, so it's just as useful for those who us MS Office but would like to produce open, professional looking documents for export to your customers or business partners."
Adobe's createpdf.adobe.com has actually everything that I would need: You can upload a file in almost any format, choose the appropriate security options (in my case: password), and let them send it to you via email, link or download in browser...
Does anyone have experience with converting a document on KDE and KGhostView how it is descibed in the first part?
at the beginning the volume will not be that high (20/month), but I can't imagine myself doing all of this manually when the volume passes 100/month... So I try to find solutions for the time when the volume gets high!
downloads the converted version. You might have to learn some more about
rfc1823 (file uploads) first. You may have to program some authentication
headers or a login process into your script as well.
Since the adobe site takes 15 minutes when you're a subscriber, you would
have to schedule an task to retrieve the completed pdf 15 minutes later and
figure out what to do with it next.
Adobe either sends you a link to a download page or directly attaches your new .pdf to their email...
This would be the best procedure:
1. A person uploads a .doc, .xls etc. to an OpenACS webservice (file-storage)
2. OpenACS connects to create.adobe.com, sets the security settings (they are on the same page as the upload button etc.) and lets Adobe deliver the new .pdf to the OpenACS webservice's email address.
3. An email handler automatically gets the email, regexps the title of the attachment (i.e. document.doc?version_id=100) and inserts the new .pdf as a new version of our document into the database...
That would be perfect and a nice service for OpenACS 3.x 4.x too...
When I was at my last job we solved the same problem using a combination of AbiWord/Ghostscript to perform .doc --> .pdf conversion. As several posters have mentioned, the difficult part is not .ps to .pdf - Ghostscript handles this nicely. Getting .docs into .ps is an utter nightmare, though. At our request, one of the developers of Abiword added a command-line "print-to-ps" function that works on *nix builds of Abiword - it's still there as far as I know.
In the end, we weren't particularly happy with Abiword import filters - they don't handle tables, among other things. Toward the end of my time there we were investigating OpenOffice as well:
I've done pretty exhaustive research in this area, and basically you're limited to:
1) a windows box running a closed solution - activepdf is one of the better known players, but you can check out several on pdfzone.com
2) one of the hacks I mentioned above - I don't know of any better solutions running on *nix.
the openoffice solution sounds and looks to be pretty superior... I heard that SUN will start *licensing* StarOffice from version 6.0 on! OpenOffice will remain opensource though I believe.
Do you know what kind of people are/were hacking around with an openoffice conversion solution? Which community could I bother with this?? I already talked to some developers at openoffice.org, but they all seem to work for SUN and told me that they are developing a closed source solution doing conversion. (What great of an opensource community ... That was ~4 month ago - things might have changed?!
Here's a Java example program for batch conversion of text files from any supported OpenOffice format to any other supported OpenOffice format:
FileInputStream fis = new FileInputStream(new File("test.doc")); // works with xls also
FileOutputStream fos = new FileOutputStream(new File("test.pdf"));
OfficeFile f = new OfficeFile(fis,"localhost","8100", true);
All possible convertions:
doc --> pdf, html, txt, rtf
xls --> pdf, html, csv
ppt --> pdf, swf
html --> pdf
Maybe useful: http://dancrintea.ro/html-to-pdf/
HTML to PDF with PHP, Java or ASP
In your case you can make your life pretty easy and just get the openoffice wrapper procs using
svn co https://svn.cognovis.de:/projop/packages/intranet-contacts/tcl/oo-procs.tcl
In them you will find a procedure called contact::oo::convert_to_pdf_using_jooconverter and this is what you need to do the conversion. Everything else is "leftover" from tries we did when the jodconverter wasn't working as expected.
If you have further questions or need help, contact me please.
You need to have OpenOffice and unoconv installed for this.
unoconv is a command line utility that can convert any file format that OpenOffice can import, to any file format that OpenOffice is capable of exporting.Some of the supported document formats are Open Document Format (.odt), MS Word (.doc), MS Office Open/MS OOXML (.xml), Portable Document Format (.pdf), HTML, XHTML, RTF, Docbook (.xml), and more.