This is a repost of a question I put on StackExchange. Figured I'd increase my odds since there are lots of Postgres users here. And the data in question actually is for a couple of OpenACS sites. :)
From http://serverfault.com/questions/252904/character-set-issues-in-postgres-upgrade:
I am moving some databases from Postgres 7.4.8 to 8.4.5, on CentOS 5. In the old database the encoding is UNICODE. So I did a text pg_dump, created my new databases like so:
createdb --template template0 --encoding unicode testdb
and imported the data:
psql testdb < testdb.dmp >& testdb.out
Unfortunately, these errors are sprinkled throughout:
ERROR: invalid byte sequence for encoding "UTF8": 0x8e
I know that iconv is the tool most often recommended to fix this, but I'm not sure what the conversion is. I tried running it from Unicode to UTF8, since that is what Postgres reports for my new database, but that produced a file full of what looked like Chinese characters. I also tried UTF8 to UTF8; that chokes on the first of the offending characters with this:
iconv: illegal input sequence at position 874924
I'd like to know a) why am I having problems when I'm not changing the character set? And b) is there any way of fixing this short of editing the dump file by hand?