acs_mail_lite::email_type (public)

 acs_mail_lite::email_type [ -subject subject ] [ -from from ] \
    [ -headers headers ] [ -header_arr_name header_arr_name ] \
    [ -reply_too_fast_s reply_too_fast_s ] \
    [ -check_subject_p check_subject_p ]

Defined in packages/acs-mail-lite/tcl/email-inbound-procs.tcl

Scans email's subject, from and headers for actionable type.

Returns actionable type and saves same type in header_arr_name(aml_type), and saves some normalized header info to reduce redundant processing downstream. See code comments for details.

Actional types: 'auto_gen' 'auto_reply', 'bounce', 'in_reply_to' or empty string indicating 'other' type.

  • 'auto_reply' may be a Delivery Status Notification for example.
  • 'bounce' is a specific kind of Delivery Status Notification.
  • 'in_reply_to' is an email reporting to originate from local email, which needs to be tested further to see if OpenACS needs to act on it versus a reply to a system administrator email for example.
  • 'auto_gen' is an auto-generated email that does not qualify as 'auto_reply', 'bounce', or 'in_reply_to'
  • '' (Empty string) refers to email that the system does not recognize as a reply of any kind. If not a qualifying type, returns empty string.
Adds these index to headers array:
  • received_cs: the received time of email in tcl clock epoch time.
  • aml_type: the same value returned by this proc.

If additional headers not calculated, they have value of empty string.

If headers and header_arr_name provided, only header_arr_name will be used, if header_arr_name contains at least one value.

If check_subject_p is set 1, checks for common subjects identifying autoreplies. This is not recommended to rely on exclusively. This feature provides a framework for extending classification of emails for deployment routing purposes.

If array includes keys from 'ns_imap struct', such as internaldate.*, then adds header with epoch time quivilent to header index received_cs

Switches:
-subject
(optional)
of email
-from
(optional)
of email
-headers
(optional)
of email, a block of text containing all headers and values
-header_arr_name
(optional)
-reply_too_fast_s
(defaults to "10") (optional)
-check_subject_p
(defaults to "0") (optional)
Set to 1 to check email subject.

Partial Call Graph (max 5 caller/called nodes):
%3 test_acs_mail_lite_inbound_procs_check acs_mail_lite_inbound_procs_check (test acs-mail-lite) acs_mail_lite::email_type acs_mail_lite::email_type test_acs_mail_lite_inbound_procs_check->acs_mail_lite::email_type acs_mail_lite::imap_conn_set acs_mail_lite::imap_conn_set (private) acs_mail_lite::email_type->acs_mail_lite::imap_conn_set acs_mail_lite::inbound_filters acs_mail_lite::inbound_filters (private) acs_mail_lite::email_type->acs_mail_lite::inbound_filters acs_mail_lite::parse_email_address acs_mail_lite::parse_email_address (private) acs_mail_lite::email_type->acs_mail_lite::parse_email_address acs_mail_lite::sched_parameters acs_mail_lite::sched_parameters (public) acs_mail_lite::email_type->acs_mail_lite::sched_parameters ad_outgoing_sender ad_outgoing_sender (public) acs_mail_lite::email_type->ad_outgoing_sender acs_mail_lite::imap_check_incoming acs_mail_lite::imap_check_incoming (private) acs_mail_lite::imap_check_incoming->acs_mail_lite::email_type acs_mail_lite::maildir_check_incoming acs_mail_lite::maildir_check_incoming (private) acs_mail_lite::maildir_check_incoming->acs_mail_lite::email_type

Testcases:
acs_mail_lite_inbound_procs_check
Source code:
    set ag_p 0
    set an_p 0
    set ar_p 0
    set as_p 0
    set dsn_p 0
    set irt_idx -1
    set or_idx -1
    set pe_p 0
    set ts_p 0
    set reject_p 0
    # header cases:  {*auto-generated*} {*auto-replied*} {*auto-notified*}
    # from:
    # https://www.iana.org/assignments/auto-submitted-keywords/auto-submitted-keywords.xhtml
    # and RFC 3834 https://www.ietf.org/rfc/rfc3834.txt

    # Do NOT use x-auto-response-suppress
    # per: https://stackoverflow.com/questions/1027395/detecting-outlook-autoreply-out-of-office-emails

    # header cases:
    # {*x-autoresponder*} {*autoresponder*} {*autoreply*}
    # {*x-autorespond*} {*auto_reply*}
    # from:
    # https://github.com/jpmckinney/multi_mail/wiki/Detecting-autoresponders
    # redundant cases are removed from list.
    # auto reply = ar
    set ar_list [list  {auto-replied}  {auto-reply}  {autoreply}  {autoresponder}  {x-autorespond}  ]
    # These were in auto_reply, but are not specific to replies:
    #                     {auto-generated}
    #             {auto-notified}
    # See section on auto_gen types. (auto-submitted and the like)


    if { $header_arr_name ne "" } {
        upvar 1 $header_arr_name h_arr
    } else {
        array set h_arr [list ]
    }

    if { $headers ne "" && [array size h_arr] < 1 } {
        #  To remove subject from headers to search,
        #  in case topic uses a reserved word,
        #  we rebuild the semblence of array returned by ns_imap headers.
        #  Split strategy from qss_txt_table_stats
        set linebreaks "\n\r\f\v"
        set row_list [split $headers $linebreaks]
        foreach row $row_list {
            set c_idx [string first ":" $row]
            if { $c_idx > -1 } {
                set header [string trim [string range $row 0 $c_idx-1]]
                # following identifies multiline header content to ignore
                if { ![string match {*[;=,]*} $header] } {
                    # list of email headers at:
                    # https://www.cs.tut.fi/~jkorpela/headers.html
                    # Suggests this filter for untrusted input:
                    if { [regsub -all -- {[^a-zA-Z0-9\-]+} $header {} h2 ] } {
                        ns_log Warning "acs_mail_lite:email_type.864:  Unexpected header '${header}' changed to '${h2}'"
                        set header $h2
                    }
                    set value [string trim [string range $row $c_idx+1 end]]
                    # string match from proc ad_safe_eval
                    if { ![string match {*[\[;]*} $value ] } {
                        # 'append' is used instead of 'set' in
                        # the rare case that there's a glitch
                        # and there are two or more headers with same name.
                        # We want to examine all values of specific header.
                        append h_arr(${header}"${value} "
                        ns_log Dev "acs_mail_lite::email_type.984  header '${header}' value '${value}' from text header '${row}'"
                    }
                }
            }
        }
    }

    set reject_p [acs_mail_lite::inbound_filters -headers_arr_name h_arr]


    if { !$reject_p } {

        set hn_list [array names h_arr]
        ns_log Dev "acs_mail_lite::email_type.996 hn_list '${hn_list}'"
        # Following checks according to RFC 3834 section 3.1 Message header
        # https://tools.ietf.org/html/rfc3834


        # check for in-reply-to = irt
        set irt_idx [lsearch -glob -nocase $hn_list {in-reply-to}]
        # check for message_id = mi
        # This is a new message id, not message id of email replied to
        set mi_idx [lsearch -glob -nocase $hn_list {message-id}]

        # Also per RFC 5436 section 2.7.1 consider:
        # auto-submitted = as

        set as_idx [lsearch -glob -nocase $hn_list {auto-submitted}]
        if { $as_idx > 1 } {
            set as_p 1
            set as_h [lindex $hn_list $as_idx]
            set an_p [string match -nocase $h_arr(${as_h}) {auto-notified}]
            # also check for auto-generated
            set ag_p [string match -nocase $h_arr(${as_h}) {auto-generated}]
        }



        ns_log Dev "acs_mail_lite::email_type.1017 as_p ${as_p} an_p ${an_p} ag_p ${ag_p}"

        # If one of the headers contains {list-id} then email
        # is from a mailing list.

        set i 0
        set h [lindex $ar_list $i]
        while { $h ne "" && !$ar_p } {
            #set ar_p string match -nocase $h $hn

            set ar_idx [lsearch -glob $hn_list $h]
            if { $ar_idx > -1 } {
                set ar_p 1
            }

            incr i
            set h [lindex $ar_list $i]
        }

        ns_log Dev "acs_mail_lite::email_type.1039 ar_p ${ar_p}"


        # get 'from' header value possibly used in a couple checks
        set fr_idx [lsearch -glob -nocase $hn_list {from}]
        set from_email ""
        if { $fr_idx > -1 } {
            set fr_h [lindex $hn_list $fr_idx]
            set from [ns_quotehtml $h_arr(${fr_h})]
            set h_arr(aml_from) $from
            set from_email [string tolower  [acs_mail_lite::parse_email_address  -email $from]]
            set h_arr(aml_from_addrs) $from_email
            set at_idx [string last "@" $from ]
        } else {
            set at_idx -1
        }
        if { $at_idx > -1 } {
            # from_email is not empty string
            set from_host [string trim [string range $from $at_idx+1 end]]
            set party_id [party::get_by_email -email $from_email]
            if { $party_id ne "" } {
                set pe_p 1
            }
        } else {
            set from_host ""
            set party_id ""
        }




        if { !$ar_p
             && [info exists h_arr(internaldate.year)]
             && $from ne ""
         } {
            # Use the internal timestamp for additional filters
            set dti $h_arr(internaldate.year)
            append dti "-" [format "%02u" $h_arr(internaldate.month)]
            append dti "-" [format "%02u" $h_arr(internaldate.day)]
            append dti " " [format "%02u" $h_arr(internaldate.hours)]
            append dti ":" [format "%02u" $h_arr(internaldate.minutes)]
            append dti ":" [format "%02u" $h_arr(internaldate.seconds)] " "
            if { $h_arr(internaldate.zoccident) eq "0" } {
                # This is essentially iso8601 timezone formatting.
                append dti "+"
            } else {
                # Comment from panda-imap/src/c-client/mail.h:
                # /* nonzero if west of UTC */
                # See also discussion beginning with:
                # /* occidental *from Greenwich) timezones */
                # in panda-imap/src/c-client/mail.c
                append dti "-"
            }
            append dti [format "%02u" $h_arr(internaldate.zhours)]
            append dti [format "%02u" $h_arr(internaldate.zminutes)] "00"
            if { [catch {
                set dti_cs [clock scan $dti -format "%Y-%m-%e %H:%M:%S %z"]
            } err_txt ] } {
                set dti_cs ""
                ns_log Warning "acs_mail_lite::email_type.1102  clock scan '${dti}' -format %Y-%m-%d %H:%M:%S %z failed. Could not check ts_p case."
            }
            set h_arr(aml_received_cs) $dti_cs
            # Does response time indicate more likely by a machine?
            # Not by itself. Only if it is a reply of some kind.

            # Response is likely machine if it is fast.
            # If the difference between date and local time is less than 10s
            # and either from is "" or subject matches "return*to*sender"

            # More likely also from machine
            # if size is more than a few thousand characters in a short time.

            # This is meant to detect more general cases
            # of bounce/auto_reply detection related to misconfiguration
            # of a system.
            # This check is
            # intended to prevent flooding server and avoiding looping
            # that is not caught by standard MTA / smtp servers.
            # An MTA likely checks already for most floods and loops.
            # As well, this check providesy yet another
            # indicator to intervene in uniquely crafted attacks.

            if { $pe_p && $dti_cs ne "" } {
                # check multiple emails from same user

                nsv_lappend acs_mail_lite si_party_id_cs(${party_id}$dti_cs
                set max_ct [nsv_get acs_mail_lite si_max_ct_per_cycle]
                set cycle_s [nsv_get acs_mail_lite si_dur_per_cycle_s]
                set cs_list [nsv_get acs_mail_lite si_party_id_cs(${party_id})]
                set cs_list_len [llength $cs_list]
                if { $cs_list_len > $max_ct } {
                    set params_ul [acs_mail_lite::sched_parameters]
                    set lpri_pids [dict get $params_ul lpri_party_ids]
                    set lpri_pids_list [split $lpri_pids]
                    if { $party_id ni $lpri_pdis_list } {
                        # full check required
                        set start_cs [nsv_get acs_mail_lite si_start_t_cs]
                        set prev_start_cs [expr { $start_cs - $cycle_s } ]
                        set cs_list [lsort -integer -increasing -unique $cs_list]
                        set i 0
                        set is_stale_p 1
                        while { $is_stale_p && $i < $cs_list_len } {
                            set test_ts [lindex $cs_list $i]
                            if { $test_ts > $prev_start_cs } {
                                set is_stale_p 0
                            }
                            incr i
                        }
                        if { $is_stale_p } {
                            set cs2_list [list ]
                            # Really?
                            # We just added dti_cs to si_party_id_cs(party_id)
                            # This happens when scanning email is delayed some
                            ns_log Warning "acs_mail_lite::email_type.655  party_id '${party_id}' prev_start_cs '${prev_start_cs}' i '${i}'  cs_list_len '${cs_list_len}' cs_list '${cs_list}' cs2_list '${cs2_list}'"
                        } else {
                            set cs2_list [lrange $cs_list $i-1 end]
                            set cs2_list_len [llength $cs2_list]
                            if { $cs2_list_len > $max_ct } {
                                # si_max_ct_per_cycle reached for party_id

                                # Flag as low priority if over count for cycle
                                # That is, add party_id to
                                # acs_mail_lite::sched_parameters -lpri_party_ids
                                # if it is not already
                                # Already checked at beginning of this check
                                lappend lpri_pids_list $party_id
                                acs_mail_lite::sched_parameters  -lpri_party_ids $lpri_pids_list

                            }
                        }
                        nsv_set acs_mail_lite si_party_id_cs(${party_id}$cs2_list
                    }
                }
            }

            # RFC 822 header required: DATE
            set dt_idx [lsearch -glob -nocase $hn_list {date}]
            # If there is no date. Flag it.
            if { $dt_idx < 0 } {
                set ts_p 1
            } else {
                # Need to check received timestamp vs. when OpenACS
                # or a system hosted same as OpenACS sent it.

                set dt_h [lindex $hn_list $dt_idx]
                # Cannot use optional ns_imap parsedate here. May not exist.
                # RFC 5322 section 3.3: multiple spaces in date is acceptable
                # but not for tcl clock scan -format
                regsub -all -- { +} $h_arr(${dt_h}) { } dt_spaced
                # RFC 5322 section 3.3: obs-zone breaks clock scan format too
                set dt_spaced_tz_idx [string first " (" $dt_spaced]
                set dt_spaced [string trim [string range $dt_spaced 0 ${dt_spaced_tz_idx} ]]
                set dte_cs [clock scan $dt_spaced -format "%a, %d %b %G %H:%M:%S %z"]

                set diff 1000
                if { $dte_cs ne "" && $dti_cs ne "" } {
                    set diff [expr { abs( $dte_cs - $dti_cs ) } ]
                }
                # If too fast, set ts_p 1
                if { $diff < 11 } {
                    set ts_p 1
                }

                # check from host against acs_mail_lite's host
                # From: header must show same OpenACS domain for bounce
                # and subsequently verified not a user or system recognized
                # user/admin address.

                # Examples of unrecognized addresses include mailer-daemon@..
                set host [dict get [acs_mail_lite::imap_conn_set] host]
                if { $ts_p && [string -nocase "*${host}*" $from_host] } {
                    if { $from_email eq [ad_outgoing_sender] || !$pe_p } {
                        # This is a stray one.
                        set ag_p 1
                    }

                }

                # Another possibility is return-path "<>"
                # and Message ID unique-char-ref@bounce-domain

                # Examples might be a bounced email from
                # a nonstandard web form on site
                # or
                # a loop where 'from' is
                # a verified user or system recognized address
                # and reply is within 10 seconds
                # and a non-standard acs-mail-lite reply-to address


            }

        }

        # Delivery Status Notifications, see RFC 3464
        # https://tools.ietf.org/html/rfc3464
        # Note: original-envelope-id is not same as message-id.
        # original-recipient = or
        set or_idx [lsearch -glob -nocase $hn_list {original-recipient}]
        if { $or_idx < 0 } {
            # RFC 3461 4.2 uses original-recipient-address
            set or_idx [lsearch -glob  -nocase $hn_list {original-recipient-address}]
        }

        # action = ac (required for DSN)
        # per fc3464 s2.3.3
        set ac_idx [lsearch -glob -nocase $hn_list {action}]
        if { $ac_idx > -1 } {
            set ac_h [lindex $hn_list $ac_idx]
            set status_list [list failed  delayed  delivered  relayed  expanded ]
            # Should 'delivered' be removed from status_list?
            # No, just set ar_p 1 instead of dsn_p 1

            set s_i 0
            set status_p 0
            set stat [lindex $status_list $s_i]
            while { $stat ne "" && !$status_p } {
                # What if there are duplicate status values or added junk?
                # Catch it anyway by wrapping glob with asterisks
                if { [string match -nocase "*${stat}*" $h_arr(${ac_h})] } {
                    set status_p 1
                }
                ns_log Dev "acs_mail_lite::email_type.1070  status_p $status_p stat '${stat}' ac_h ${ac_h} h_arr(ac_h) '$h_arr(${ac_h})'"

                incr s_i
                set stat [lindex $status_list $s_i]
            }
            if { $status_p } {
                # status = st (required for DSN)
                # per fc3464 s2.3.4
                set st_idx [lsearch -glob -nocase $hn_list {status}]
                if { $st_idx > -1 } {
                    set st_h [lindex $hn_list $st_idx]
                    set dsn_p [string match {*[0-9][0-9][0-9]*}  $h_arr(${st_h}) ]
                    ns_log Dev "acs_mail_lite::email_type.1080  dsn_p ${dsn_p} st_h ${st_h} h_arr(st_h) '$h_arr(${st_h})'"
                    if { $st_idx eq 2 || !$dsn_p } {
                       set ar_p 1
                    }
                }
            }
        }

        ns_log Dev "acs_mail_lite::email_type.1089  ar_p ${ar_p} dsn_p ${dsn_p}"

        # if h_arr exists and..
        if { !$ar_p && $check_subject_p } {
            # catch nonstandard cases
            # subject flags

            # If 'from' not set. Set here.
            if { $from eq "" } {
                set fr_idx [lsearch -glob -nocase $hn_list {from}]
                if { $fr_idx > -1 } {
                    set from $h_arr(${from})
                }
            }
            # If 'subject' not set. Set here.
            if { $subject eq "" } {
                set fr_idx [lsearch -glob -nocase $hn_list {subject}]
                if { $fr_idx > -1 } {
                    set subject $h_arr(${subject})
                    set h_arr(aml_subject) [ns_quotehtml $subject]
                }
            }

            set ps1 [string match -nocase {*out of*office*} $subject]
            set ps2 [string match -nocase {*automated response*} $subject]
            set ps3 [string match -nocase {*autoreply*} $subject]
            set ps4 [string match {*NDN*} $subject]
            set ps5 [string match {*\[QuickML\] Error*} $subject]
            # RFC 3834 states to NOT rely on 'Auto: ' in subject for detection.
            #set ps6 \[string match {Auto: *} $subject\]

            # from flags = pf
            set pf1 [string match -nocase {*mailer*daemon*} $from]

            set ar_p [expr { $ps1 || $ps2 || $ps3 || $ps4 || $ps5 || $pf1 } ]
        }

    }
    ns_log Dev "acs_mail_lite::email_type.1127 ar_p ${ar_p}"


    # Return actionable types:
    # 'auto_gen', 'auto_reply', 'bounce', 'in_reply_to' or '' (other)

    #  a bounce also flags maybe auto_reply, in_reply_to, auto_gen
    # an auto_reply also flags maybe auto_reply, auto_gen, in_reply_to
    # an auto_gen does NOT include an 'in_reply_to'
    # an in_reply_to does NOT include 'auto_gen'.
    if { $dsn_p || $or_idx > -1 } {
        set type "bounce"
    } elseif$ar_p
               || ( $irt_idx > -1 && ( $ag_p || $as_p || $an_p || $ts_p ) )
           } {
        set type "auto_reply"
    } elseif$ag_p || $as_p || $an_p || $ts_p } {
        set type "auto_gen"
    } elseif$irt_idx > -1 } {
        set type "in_reply_to"
    } else {
        # other
        set type ""
    }
    if { $header_arr_name ne "" } {
        set h_arr(aml_type) $type
    }
    return $type
XQL Not present:
PostgreSQL, Oracle
Generic XQL file:
packages/acs-mail-lite/tcl/email-inbound-procs.xql

[ hide source ] | [ make this the default ]
Show another procedure: