Forum OpenACS Q&A: spam module charset encoding issue when inserting to blob

for the press: my system: oacs4.5/oracle successfuly speaks iso-8859-2

..but in reality the charset probles did not go away everywhere I'm afraid.

I still have some issues here and there
One of them is with the spam module.
What is happening is that the message body both text and html do not display characters back correctly (but the message title is).

I don't have this problem in other modules like bboard or news where it works fine.

As I see the message body in all of those mentioned modules are stored in blob.

Question: Is the handling of data in spam module done any different then in news or bboard. From what I can see inserting into blob is done differently but I don't know if this causing the problem.

Hi Janus,

Did you have any succes in soving the issue on this posting?

I have a similar problem for a french mailing I need to send out. I'm using ars digita acs 4.2 for this site, but your problem seemed similar to mine.

8bit characters are encoded before or while they are inserted in to the db through the spam.new procedure, and pop-out in some weird (to me unkown) character encoding. Result is that we can't use any characters with accents at the moment. and the french don't seem to like that 😉

I've been so far unsuccesfull in solving the issue, since all looks proper to me. Did you find a solution?

hi Raoul,

There are two separate issues with this,
one is not diplaying correctly on the spam preview page,
and second not sending it correctly in the email body and title line.

First issue I let go since only admin will see it anyway.

Second issue you have to tell the mail client what charset you sending in and the system to encode it before sending to ns_sendmail. Spam module uses acs-mail package for sending emails.
As I see it the acs-mail package has not been modified to handle different charsets and it is sending out UTF-8.

Add lines setting headers "Conten-Type" and "Content-Encoding" in acs_mail_body_to_output_format in acs-mail/tcl/acs-mail-procs.tcl
and at the output of this proc $header_subject must be first treated with encoding convertto tcl fuction.
and in acs-mail-procs.tcl acs_mail_encode_content again specify "Content-Type"

let me know if you need more help.

ciao

Hi Janus,

Thanks for your pointers. All clear except for finding the right parameter for the tcl command convertto.

For starters, the header_subject is fine, thats not the one that needs convertion, since all accents and such remain intact. My concern is the body of the message.

I tried convertto on multiple locations (before creation new mail, before sending, within acs_mail_body_to_output_format, etc), all to no avail.

    set content [encoding convertto iso8859-1 $content]

Now I presume that iso8859-1 is what I should use, since I couldn't find a good explanation online. But unfortunatly it doesn't work.

The mail headers all look proper, but the body output is still wrong.

Any clue?

In procedure acs_mail_body_to_output_format
line 400 (after: set content [lindex $contents 1]) add:
ns_set put $headers "Content-Type" "$content_type; charset=iso-8859-1"
ns_set put $headers "Content-Encoding" "8bit"

in procedure acs_mail_encode_content
line 336 (after: append content "--$boundary ") add:
append content "Content-Type: $c_type; charset=iso-8859-1 "

hope it helps

Hi all,

I have the same problem in openacs 4.6.3 with the Spanish and Catalan languages.

After applying the Jay's changes in the acs-mail-procs.tcl file still doesn't work.

For example the text "cío caçar" appears "cC-o caC'ar" in the email.

Is it possible that the stringToBlob sqlj procedure also needs a charset conversion function?, because I have put a System.out.println function in there and the sqlplus shows 'cmo cagar'. Or maybe the charset configuration in my Ora8i, or...

Anyway, if some of you have some hint, please help me, I am getting mad with those LOB, CLOB, BLOB, whatever.

as always, thank you in advance.

Hi again,

There is a bug half related with this topic #822 Add text storage support to cr_import_content.

Maybe adding a CLOB storage type would be the perfect solution, but since I am not very comfortable with this kind of datatype I have done a workaround with varchar2(4000).

So if you want to send emails in iso-8859-2 but reducing the capacity of a 'text' type in your content repository to 4000 characters, here are the changes:

*** Attention! I have not test all the side effects outside the acs-mail and spam packages. UAYOR ***

  • add the 'text' column in your 'cr_revisions' table.
    >alter table CR_REVISIONS add text varchar2(4000)
  • Comment the 'text to lob' action at the beggining of your 'content_item.new' pl/sql function.
    --  if storage_type = 'text' then
    --     v_storage_type := 'lob';
    --  else
        v_storage_type := storage_type;
    --  end if;
  • Modify the 'content_revision.new' pl/sql function. This funtion is overloaded, change only te one with the 'text' parameter.
    function new (
      title         in cr_revisions.title%TYPE,
      description   in cr_revisions.description%TYPE default null,
      publish_date  in cr_revisions.publish_date%TYPE default sysdate,
      mime_type   	in cr_revisions.mime_type%TYPE default 'text/plain',
      nls_language 	in cr_revisions.nls_language%TYPE default null,
      text		    in varchar2 default null,
      item_id       in cr_items.item_id%TYPE,
      revision_id   in cr_revisions.revision_id%TYPE default null,
      creation_date	in acs_objects.creation_date%TYPE default sysdate,
      creation_user	in acs_objects.creation_user%TYPE default null,
      creation_ip	in acs_objects.creation_ip%TYPE default null
    ) return cr_revisions.revision_id%TYPE is
    
      v_revision_id integer;
      v_content_type acs_object_types.object_type%TYPE;
    
    begin
    
      v_content_type := content_item.get_content_type(item_id);
      
      v_revision_id := acs_object.new(
          object_id     => revision_id,
          object_type   => v_content_type,
          creation_date => creation_date,
          creation_user => creation_user,
          creation_ip   => creation_ip,
          context_id    => item_id
      );
      
      insert into cr_revisions (
        revision_id, title, description, mime_type, publish_date,
        nls_language, text, item_id
      ) values (
        v_revision_id, title, description, mime_type, publish_date,
        nls_language, text, item_id
      );
    
      return v_revision_id;
    
    end new;
  • Change the 'acs_mail_body_to_mime_get_content_simple' query in the 'packages/acs-mail/tcl/acs-mail-procs.xql' file.
    select content, text, mime_type as v_content_type
            from cr_revisions
            where revision_id = :revision_id
  • Change the 'acs_mail_encode_content' function in the 'packages/acs-mail/tcl/acs-mail-procs.tcl' file.
    [...]
                    if [db_0or1row acs_mail_body_to_mime_get_content_simple {
                            select content, text, mime_type as v_content_type
                            from cr_revisions
                            where revision_id = :revision_id
                    }] {
                            if [string equal $storage_type text] {
                                    ns_log "Notice" "acs-mail: encode: one part hit $content_item_id"
                                    # vinodk: no need for this, since we're checking
                                    #         storage_type
                                    #
                                    # We win!  Hopefully.  Check if there are 8bit characters/data.
                                    # HT NL CR SP-~  The full range of ASCII with spaces but no
                                    # control characters.
                                    if ![regexp "\[^\u0009\u000A\u000D\u0020-\u007E\]" $content] {
                                            ns_log "Notice" "acs-mail: encode: good code $content_item_id"
                                            # We're still okay.  Use it!
                                            return [list $v_content_type $text]
                                    }
                                    ns_log "Notice" "acs-mail: encode: bad code $content_item_id"
                            } else {
    [...]

I think I don't leave anything, hope it helps,

\-.-/juanjo