acs_mail_lite::email_type (public)
acs_mail_lite::email_type [ -subject subject ] [ -from from ] \ [ -headers headers ] [ -header_arr_name header_arr_name ] \ [ -reply_too_fast_s reply_too_fast_s ] \ [ -check_subject_p check_subject_p ]
Defined in packages/acs-mail-lite/tcl/email-inbound-procs.tcl
Scans email's subject, from and headers for actionable type.
Returns actionable type and saves same type in header_arr_name(aml_type), and saves some normalized header info to reduce redundant processing downstream. See code comments for details.
Actional types: 'auto_gen' 'auto_reply', 'bounce', 'in_reply_to' or empty string indicating 'other' type.
Adds these index to headers array:
- 'auto_reply' may be a Delivery Status Notification for example.
- 'bounce' is a specific kind of Delivery Status Notification.
- 'in_reply_to' is an email reporting to originate from local email, which needs to be tested further to see if OpenACS needs to act on it versus a reply to a system administrator email for example.
- 'auto_gen' is an auto-generated email that does not qualify as 'auto_reply', 'bounce', or 'in_reply_to'
- '' (Empty string) refers to email that the system does not recognize as a reply of any kind. If not a qualifying type, returns empty string.
- received_cs: the received time of email in tcl clock epoch time.
- aml_type: the same value returned by this proc.
If additional headers not calculated, they have value of empty string.
If headers and header_arr_name provided, only header_arr_name will be used, if header_arr_name contains at least one value.
If check_subject_p is set 1, checks for common subjects identifying autoreplies. This is not recommended to rely on exclusively. This feature provides a framework for extending classification of emails for deployment routing purposes.
If array includes keys from 'ns_imap struct', such as internaldate.*, then adds header with epoch time quivilent to header index received_cs
- Switches:
- -subject
(optional)- of email
- -from
(optional)- of email
- -headers
(optional)- of email, a block of text containing all headers and values
- -header_arr_name
(optional)- -reply_too_fast_s
(defaults to"10"
) (optional)- -check_subject_p
(defaults to"0"
) (optional)- Set to 1 to check email subject.
- Partial Call Graph (max 5 caller/called nodes):
- Testcases:
- acs_mail_lite_inbound_procs_check
Source code: set ag_p 0 set an_p 0 set ar_p 0 set as_p 0 set dsn_p 0 set irt_idx -1 set or_idx -1 set pe_p 0 set ts_p 0 set reject_p 0 # header cases: {*auto-generated*} {*auto-replied*} {*auto-notified*} # from: # https://www.iana.org/assignments/auto-submitted-keywords/auto-submitted-keywords.xhtml # and RFC 3834 https://www.ietf.org/rfc/rfc3834.txt # Do NOT use x-auto-response-suppress # per: https://stackoverflow.com/questions/1027395/detecting-outlook-autoreply-out-of-office-emails # header cases: # {*x-autoresponder*} {*autoresponder*} {*autoreply*} # {*x-autorespond*} {*auto_reply*} # from: # https://github.com/jpmckinney/multi_mail/wiki/Detecting-autoresponders # redundant cases are removed from list. # auto reply = ar set ar_list [list {auto-replied} {auto-reply} {autoreply} {autoresponder} {x-autorespond} ] # These were in auto_reply, but are not specific to replies: # {auto-generated} # {auto-notified} # See section on auto_gen types. (auto-submitted and the like) if { $header_arr_name ne "" } { upvar 1 $header_arr_name h_arr } else { array set h_arr [list ] } if { $headers ne "" && [array size h_arr] < 1 } { # To remove subject from headers to search, # in case topic uses a reserved word, # we rebuild the semblence of array returned by ns_imap headers. # Split strategy from qss_txt_table_stats set linebreaks "\n\r\f\v" set row_list [split $headers $linebreaks] foreach row $row_list { set c_idx [string first ":" $row] if { $c_idx > -1 } { set header [string trim [string range $row 0 $c_idx-1]] # following identifies multiline header content to ignore if { ![string match {*[;=,]*} $header] } { # list of email headers at: # https://www.cs.tut.fi/~jkorpela/headers.html # Suggests this filter for untrusted input: if { [regsub -all -- {[^a-zA-Z0-9\-]+} $header {} h2 ] } { ns_log Warning "acs_mail_lite:email_type.864: Unexpected header '${header}' changed to '${h2}'" set header $h2 } set value [string trim [string range $row $c_idx+1 end]] # string match from proc ad_safe_eval if { ![string match {*[\[;]*} $value ] } { # 'append' is used instead of 'set' in # the rare case that there's a glitch # and there are two or more headers with same name. # We want to examine all values of specific header. append h_arr(${header}) "${value} " ns_log Dev "acs_mail_lite::email_type.984 header '${header}' value '${value}' from text header '${row}'" } } } } } set reject_p [acs_mail_lite::inbound_filters -headers_arr_name h_arr] if { !$reject_p } { set hn_list [array names h_arr] ns_log Dev "acs_mail_lite::email_type.996 hn_list '${hn_list}'" # Following checks according to RFC 3834 section 3.1 Message header # https://tools.ietf.org/html/rfc3834 # check for in-reply-to = irt set irt_idx [lsearch -glob -nocase $hn_list {in-reply-to}] # check for message_id = mi # This is a new message id, not message id of email replied to set mi_idx [lsearch -glob -nocase $hn_list {message-id}] # Also per RFC 5436 section 2.7.1 consider: # auto-submitted = as set as_idx [lsearch -glob -nocase $hn_list {auto-submitted}] if { $as_idx > 1 } { set as_p 1 set as_h [lindex $hn_list $as_idx] set an_p [string match -nocase $h_arr(${as_h}) {auto-notified}] # also check for auto-generated set ag_p [string match -nocase $h_arr(${as_h}) {auto-generated}] } ns_log Dev "acs_mail_lite::email_type.1017 as_p ${as_p} an_p ${an_p} ag_p ${ag_p}" # If one of the headers contains {list-id} then email # is from a mailing list. set i 0 set h [lindex $ar_list $i] while { $h ne "" && !$ar_p } { #set ar_p string match -nocase $h $hn set ar_idx [lsearch -glob $hn_list $h] if { $ar_idx > -1 } { set ar_p 1 } incr i set h [lindex $ar_list $i] } ns_log Dev "acs_mail_lite::email_type.1039 ar_p ${ar_p}" # get 'from' header value possibly used in a couple checks set fr_idx [lsearch -glob -nocase $hn_list {from}] set from_email "" if { $fr_idx > -1 } { set fr_h [lindex $hn_list $fr_idx] set from [ns_quotehtml $h_arr(${fr_h})] set h_arr(aml_from) $from set from_email [string tolower [acs_mail_lite::parse_email_address -email $from]] set h_arr(aml_from_addrs) $from_email set at_idx [string last "@" $from ] } else { set at_idx -1 } if { $at_idx > -1 } { # from_email is not empty string set from_host [string trim [string range $from $at_idx+1 end]] set party_id [party::get_by_email -email $from_email] if { $party_id ne "" } { set pe_p 1 } } else { set from_host "" set party_id "" } if { !$ar_p && [info exists h_arr(internaldate.year)] && $from ne "" } { # Use the internal timestamp for additional filters set dti $h_arr(internaldate.year) append dti "-" [format "%02u" $h_arr(internaldate.month)] append dti "-" [format "%02u" $h_arr(internaldate.day)] append dti " " [format "%02u" $h_arr(internaldate.hours)] append dti ":" [format "%02u" $h_arr(internaldate.minutes)] append dti ":" [format "%02u" $h_arr(internaldate.seconds)] " " if { $h_arr(internaldate.zoccident) eq "0" } { # This is essentially iso8601 timezone formatting. append dti "+" } else { # Comment from panda-imap/src/c-client/mail.h: # /* nonzero if west of UTC */ # See also discussion beginning with: # /* occidental *from Greenwich) timezones */ # in panda-imap/src/c-client/mail.c append dti "-" } append dti [format "%02u" $h_arr(internaldate.zhours)] append dti [format "%02u" $h_arr(internaldate.zminutes)] "00" if { [catch { set dti_cs [clock scan $dti -format "%Y-%m-%e %H:%M:%S %z"] } err_txt ] } { set dti_cs "" ns_log Warning "acs_mail_lite::email_type.1102 clock scan '${dti}' -format %Y-%m-%d %H:%M:%S %z failed. Could not check ts_p case." } set h_arr(aml_received_cs) $dti_cs # Does response time indicate more likely by a machine? # Not by itself. Only if it is a reply of some kind. # Response is likely machine if it is fast. # If the difference between date and local time is less than 10s # and either from is "" or subject matches "return*to*sender" # More likely also from machine # if size is more than a few thousand characters in a short time. # This is meant to detect more general cases # of bounce/auto_reply detection related to misconfiguration # of a system. # This check is # intended to prevent flooding server and avoiding looping # that is not caught by standard MTA / smtp servers. # An MTA likely checks already for most floods and loops. # As well, this check providesy yet another # indicator to intervene in uniquely crafted attacks. if { $pe_p && $dti_cs ne "" } { # check multiple emails from same user nsv_lappend acs_mail_lite si_party_id_cs(${party_id}) $dti_cs set max_ct [nsv_get acs_mail_lite si_max_ct_per_cycle] set cycle_s [nsv_get acs_mail_lite si_dur_per_cycle_s] set cs_list [nsv_get acs_mail_lite si_party_id_cs(${party_id})] set cs_list_len [llength $cs_list] if { $cs_list_len > $max_ct } { set params_ul [acs_mail_lite::sched_parameters] set lpri_pids [dict get $params_ul lpri_party_ids] set lpri_pids_list [split $lpri_pids] if { $party_id ni $lpri_pdis_list } { # full check required set start_cs [nsv_get acs_mail_lite si_start_t_cs] set prev_start_cs [expr { $start_cs - $cycle_s } ] set cs_list [lsort -integer -increasing -unique $cs_list] set i 0 set is_stale_p 1 while { $is_stale_p && $i < $cs_list_len } { set test_ts [lindex $cs_list $i] if { $test_ts > $prev_start_cs } { set is_stale_p 0 } incr i } if { $is_stale_p } { set cs2_list [list ] # Really? # We just added dti_cs to si_party_id_cs(party_id) # This happens when scanning email is delayed some ns_log Warning "acs_mail_lite::email_type.655 party_id '${party_id}' prev_start_cs '${prev_start_cs}' i '${i}' cs_list_len '${cs_list_len}' cs_list '${cs_list}' cs2_list '${cs2_list}'" } else { set cs2_list [lrange $cs_list $i-1 end] set cs2_list_len [llength $cs2_list] if { $cs2_list_len > $max_ct } { # si_max_ct_per_cycle reached for party_id # Flag as low priority if over count for cycle # That is, add party_id to # acs_mail_lite::sched_parameters -lpri_party_ids # if it is not already # Already checked at beginning of this check lappend lpri_pids_list $party_id acs_mail_lite::sched_parameters -lpri_party_ids $lpri_pids_list } } nsv_set acs_mail_lite si_party_id_cs(${party_id}) $cs2_list } } } # RFC 822 header required: DATE set dt_idx [lsearch -glob -nocase $hn_list {date}] # If there is no date. Flag it. if { $dt_idx < 0 } { set ts_p 1 } else { # Need to check received timestamp vs. when OpenACS # or a system hosted same as OpenACS sent it. set dt_h [lindex $hn_list $dt_idx] # Cannot use optional ns_imap parsedate here. May not exist. # RFC 5322 section 3.3: multiple spaces in date is acceptable # but not for tcl clock scan -format regsub -all -- { +} $h_arr(${dt_h}) { } dt_spaced # RFC 5322 section 3.3: obs-zone breaks clock scan format too set dt_spaced_tz_idx [string first " (" $dt_spaced] set dt_spaced [string trim [string range $dt_spaced 0 ${dt_spaced_tz_idx} ]] set dte_cs [clock scan $dt_spaced -format "%a, %d %b %G %H:%M:%S %z"] set diff 1000 if { $dte_cs ne "" && $dti_cs ne "" } { set diff [expr { abs( $dte_cs - $dti_cs ) } ] } # If too fast, set ts_p 1 if { $diff < 11 } { set ts_p 1 } # check from host against acs_mail_lite's host # From: header must show same OpenACS domain for bounce # and subsequently verified not a user or system recognized # user/admin address. # Examples of unrecognized addresses include mailer-daemon@.. set host [dict get [acs_mail_lite::imap_conn_set] host] if { $ts_p && [string -nocase "*${host}*" $from_host] } { if { $from_email eq [ad_outgoing_sender] || !$pe_p } { # This is a stray one. set ag_p 1 } } # Another possibility is return-path "<>" # and Message ID unique-char-ref@bounce-domain # Examples might be a bounced email from # a nonstandard web form on site # or # a loop where 'from' is # a verified user or system recognized address # and reply is within 10 seconds # and a non-standard acs-mail-lite reply-to address } } # Delivery Status Notifications, see RFC 3464 # https://tools.ietf.org/html/rfc3464 # Note: original-envelope-id is not same as message-id. # original-recipient = or set or_idx [lsearch -glob -nocase $hn_list {original-recipient}] if { $or_idx < 0 } { # RFC 3461 4.2 uses original-recipient-address set or_idx [lsearch -glob -nocase $hn_list {original-recipient-address}] } # action = ac (required for DSN) # per fc3464 s2.3.3 set ac_idx [lsearch -glob -nocase $hn_list {action}] if { $ac_idx > -1 } { set ac_h [lindex $hn_list $ac_idx] set status_list [list failed delayed delivered relayed expanded ] # Should 'delivered' be removed from status_list? # No, just set ar_p 1 instead of dsn_p 1 set s_i 0 set status_p 0 set stat [lindex $status_list $s_i] while { $stat ne "" && !$status_p } { # What if there are duplicate status values or added junk? # Catch it anyway by wrapping glob with asterisks if { [string match -nocase "*${stat}*" $h_arr(${ac_h})] } { set status_p 1 } ns_log Dev "acs_mail_lite::email_type.1070 status_p $status_p stat '${stat}' ac_h ${ac_h} h_arr(ac_h) '$h_arr(${ac_h})'" incr s_i set stat [lindex $status_list $s_i] } if { $status_p } { # status = st (required for DSN) # per fc3464 s2.3.4 set st_idx [lsearch -glob -nocase $hn_list {status}] if { $st_idx > -1 } { set st_h [lindex $hn_list $st_idx] set dsn_p [string match {*[0-9][0-9][0-9]*} $h_arr(${st_h}) ] ns_log Dev "acs_mail_lite::email_type.1080 dsn_p ${dsn_p} st_h ${st_h} h_arr(st_h) '$h_arr(${st_h})'" if { $st_idx eq 2 || !$dsn_p } { set ar_p 1 } } } } ns_log Dev "acs_mail_lite::email_type.1089 ar_p ${ar_p} dsn_p ${dsn_p}" # if h_arr exists and.. if { !$ar_p && $check_subject_p } { # catch nonstandard cases # subject flags # If 'from' not set. Set here. if { $from eq "" } { set fr_idx [lsearch -glob -nocase $hn_list {from}] if { $fr_idx > -1 } { set from $h_arr(${from}) } } # If 'subject' not set. Set here. if { $subject eq "" } { set fr_idx [lsearch -glob -nocase $hn_list {subject}] if { $fr_idx > -1 } { set subject $h_arr(${subject}) set h_arr(aml_subject) [ns_quotehtml $subject] } } set ps1 [string match -nocase {*out of*office*} $subject] set ps2 [string match -nocase {*automated response*} $subject] set ps3 [string match -nocase {*autoreply*} $subject] set ps4 [string match {*NDN*} $subject] set ps5 [string match {*\[QuickML\] Error*} $subject] # RFC 3834 states to NOT rely on 'Auto: ' in subject for detection. #set ps6 \[string match {Auto: *} $subject\] # from flags = pf set pf1 [string match -nocase {*mailer*daemon*} $from] set ar_p [expr { $ps1 || $ps2 || $ps3 || $ps4 || $ps5 || $pf1 } ] } } ns_log Dev "acs_mail_lite::email_type.1127 ar_p ${ar_p}" # Return actionable types: # 'auto_gen', 'auto_reply', 'bounce', 'in_reply_to' or '' (other) # a bounce also flags maybe auto_reply, in_reply_to, auto_gen # an auto_reply also flags maybe auto_reply, auto_gen, in_reply_to # an auto_gen does NOT include an 'in_reply_to' # an in_reply_to does NOT include 'auto_gen'. if { $dsn_p || $or_idx > -1 } { set type "bounce" } elseif { $ar_p || ( $irt_idx > -1 && ( $ag_p || $as_p || $an_p || $ts_p ) ) } { set type "auto_reply" } elseif { $ag_p || $as_p || $an_p || $ts_p } { set type "auto_gen" } elseif { $irt_idx > -1 } { set type "in_reply_to" } else { # other set type "" } if { $header_arr_name ne "" } { set h_arr(aml_type) $type } return $typeXQL Not present: PostgreSQL, Oracle Generic XQL file: packages/acs-mail-lite/tcl/email-inbound-procs.xql