Converting Binary Documents to HTML

Content Repository : Developer Guide

The content repository uses the INSO libraries included with Intermedia to support conversion of binary files such as Microsoft Word documents to HTML. This document describes how to make this conversion be part of the item creation or editing process, such that the content is always stored in the repository as HTML.

Note: Because temporary tables and LOB storage are used during the conversion process, the entire process described here must be performed within the context of a single transaction.

Create the Revision

The first step is to create the revision that will be associated with the converted document, and obtain the corresponding ID. The content column for the revision must be initialized with an empty blob object:

revision_id := content_revision.new(item_id => :item_id,
                                    revision_id => :revision_id,
                                    data => empty_blob(),
                                    title => 'My Word Document',
                                    ...);

Uploading Binary Files

The next step in the process is to upload the binary file into the temporary table cr_doc_filter. This may be done using any standard technique for uploading a binary file, such as an image. The temporary table has only two columns; one is a BLOB to store the document itself, and one is the revision ID.

Converting the Document

Once the revision has been created and the file has been uploaded, the file may be converted to HTML and written into the empty blob associated with the revision. This is done with the to_html procedure in the content_revision package:

begin
  content_revision.to_html(:revision_id);
end;
/

Once the transaction is committed, the uploaded document is automatically deleted from the cr_doc_filter table.


karlg@arsdigita.com
Last Modified: $‌Id: convert.html,v 1.2 2017/08/07 23:47:47 gustafn Exp $