It seems like an encoding issue while storing the data. Not sure if you have had a chance to try encoding convertfrom (or convertto) utf-8. If that does not work, the following might help (off the top of my head):
binary scan $somebindata a* somevar
along with:
binary format a* $somedata
That said, there is a nice article on charsets and unicode by Joel Spolsky: http://www.joelonsoftware.com/articles/Unicode.html