Forum OpenACS Q&A: Re: Invalid Unicode character sequence found in pg index.

You want to convert to the encoding that your database is using.

'man iconv' explains how use iconv.

..and..

iconv -l

..states the available encodings.

http://linuxcommand.org/man_pages/pdftotext1.html

..mentions that the default text output is LATIN1 encoding, but that you can also specify the output to be another encoding using '-enc encoding-name'.

If the db is encoded in UNICODE, then try outputting pdftotext in UNICODE.

If pdftotext still outputs encoding with gremlins, consider processing the pdftotext output with iconv using the '-c' flag, since it silently removes characters that are not convertible to the '-t' encoding.

iconv and pdftotext should be accessible from the shell, so you can run some test cases to help determine what might work.

cheers,

Torben