Analysis, annotations, planning for adding IMAP feature
Notes in preparation for adding IMAP to legacy bounce MailDir paradigm
New procs
For imap, each begin of a process should not assume a connection exists or doesn't exist. Check connection using 'imap ping' before login. This should help re-correct any connection drop-outs due to intermittent or one-time connection issues.
Each scheduled event should quit in time for next process, so that imap info being processed is always nearly up-to-date. This is important in case a separate manual imap process is working in tandem and changing circumstances. This is equally important to quit in time, because imap references relative sequences of emails. Two concurrent connections would likely have different and overlapping references. The overlapping references would likely cause issues, since each connection would expect to process the duplicates as if they are not duplicates.
variables useful while exploring new processes like forecasting and scheduling
- scan_in_active_p
- (don't use. See si_active_cs). Answers question. Is a proc currently scanning replies?
- si_active_cs
- (don't use. See si_actives_list.) The clock scan of the most recently started cycle. If a cycle's poll doesn't match, it should not process any more email.
- si_actives_list
- A list of start clock seconds of active imap_checking_incoming procs
- scan_incoming_configured_p
- Is set to 0 if there is an error trying to connect. OTherwise is set to 1 by acs_mail_lite::imap_check_incoming
- replies_est_next_start
- Approx value of [clock seconds] next scan is expected to begin
- duration_ms_list
- Tracks duration of processing of each email in ms of most recent process, appended as a list. When a new process starts processing email, the list is reset to only include the last 100 emails. That way, there is always rolling statistics for forecasting process times.
- scan_in_est_dur_per_cycle_s
- Estimate of duration of current cycle
- scan_in_est_quit_cs
- When the current cycle should quit based on [clock seconds]
- scan_in_start_cs
- When the current cycle started scanning based on [clock seconds]
- cycle_start_cs
- When the current cycle started (pre IMAP authorization etc) based on [clock seconds]
- cycle_est_next_start_cs
- When the next cycle is to start (pre IMAP authorization etc) based on [clock seconds]
- parameter_val_changed_p
- If related parameters change, performance tuning underway. Reset statistics.
- scan_in_est_dur_per_cycle_s_override
- If this value is set, use it instead of the
scan_in_est_dur_per_cycle_s
- accumulative_delay_cycles
- Number of cycles that have been skipped 100% due to ongoing process (in cycles).
Check scan_incoming_active_p
when running new cycle.
Also set replies_est_next_start
to clock seconds for use with time calcs later in cycle.
If already running, wait a second, check again.. until 90% of duration has elapsed.
If still running, log a message and quit in time for next event.
Each scheduled procedure should also use as much time as it needs up to the cut-off at the next scheduled event. Ideally, it needs to forecast if it is going to go overtime with processing of the next email, and quit just before it does.
Use duration_ms_list
to determine a time adjustment for quitting before next cycle:
scan_in_est_dur_per_cycle_s
+ scan_repies_start_time
=
scan_in_est_quit_cs
And yet, predicting the duration of the future process is difficult. What if the email is 10MB and needs parsed, whereas all prior emails were less then 10kb? What if one of the callbacks converts a pdf into a png and annotates it for a web view and takes a few minutes? What if the next 5 emails have callbacks that take 5 to 15 minutes to process each waiting on an external service?
The process needs to be split into at least two to handle all cases.
The first process collects incoming email and puts it into a system standard format with a minimal amount of effort sufficient for use by callbacks. The goal of this process is to keep up with incoming email to all mail available to the system at the earliest possible moment.
The second process should render a prioritized queue of imported email that have not been processed. First prioritizing new entries, perhaps re-prioritizing any callbacks that error or sampling re-introducing prior errant callbacks etc. then continuing to process the stack.
Using this paradigm, parallel processes could be invoked for the queue without significantly changing the paradigm.
To reduce overhead on low volume systems, these processes should be scheduled to minimize concurrent operation.
Priorities should offer 3 levels of performance. Colors designate priority to discern from other email priority schemes:
- High (abbrev: hpri, Fast Priority, a priority value 1 to mpri_min (default 999): allow concurrent processes. That is, when a new process starts, it can also process unprocessed cases. As the stack grows, processes run in parallel to reduce stack up to acs_mail_lite_ui.max_concurrent.
- Med (abbrev: mpri, Standard Priority, a priority mpri_min to mpri_max (default 9999)): Process one at a time with casual overlap. (Try to) quit before next process starts. It's okay if there is a little overlapping.
- Low (abbrev: lpri, Low Priority, a priority value over mpri_max): Process one at a time only. If a new cycle starts and the last is still running, wait for it to quit (or quit before next cycle).
Priority is calculated based on timing and file size
set range priority_max - priority_min set deviation_max { ($range / 2 } set midpoint { priority_min + $deviation_max } time_priority = $deviation_max ( clock seconds of received datetime - scan_in_start_cs ) / ( 2 * scan_in_est_dur_per_cycle_s ) size_priority = $deviation_max * (( (size of email in characters)/(config.tcl's max_file_upload_mb *1000000) ) - 0.5) set equation = int( $midpoint + ($time_priority + size_priority) / 2)
Average of time and file size priorities.
hpri_package_ids and lpri_package_ids and hpri_party_ids and lpri_party_ids and mpri_min and mpri_max and hpri_subject_glob and lpri_subject_glob are defined in acs_maile_lite_ui, so they can be tuned without restarting server. ps. Code should check if user is banned before parsing any further.
A proc should be available to recalculate existing email priorities. This means more info needs to be added to table acs_mail_lite_from_external (including size_chars)
Import Cycle
This scheduling should be simple. Maybe check if a new process wants to take over. If so, quit.
Prioritized stack processing cycle
If next cylce starts and current cycle is still running,
set scan_in_est_dur_per_cycle_s_override
to actual wait time the current cycle has to wait including any prior cycle wait time --if the delays exceed one cycle (accumulative_delay_cycles
.
From acs-tcl/tcl/test/ad-proc-test-procs.tcl # This example gets list of implementations of a callback: (so they could be triggered one by one) ad_proc -callback a_callback { -arg1 arg2 } { this is a test callback } - set callback_procs [info commands ::callback::a_callback::*]
Each subsequent cycle moves toward renormalization by adjusting
scan_in_est_dur_per_cycle_s_override
toward value of
scan_in_est_dur_per_cycle_s
by one
replies_est_dur_per_cycle
with minimum of
scan_in_est_dur_per_cycle_s
.
Changes are exponential to quickly adjust to changing dynamics.
For acs_mail_lite::scan_in,
Keep track of email flags while processing.
Mark /read when reading.
Mark /replied if replying.
When quitting current scheduled event, don't log out if all processes are not done.
Also, don't logout if imaptimeout
is greater than duration to cycle_est_next_start_cs
.
Stay logged in for next cycle.
Delete processed messages when done with a cycle? No. What if message is used by a callback with delay in processing? Move processed emails in a designated folder ProcessFolderName parameter. Designated folder may be Trash. Set ProcessFolderName by parameter If empty, Default is hostname of ad_url ie: [util::split_location [ad_url] protoVar ProcessFolderName portVar] If folder does not exist, create it. ProcessFolderName only needs checked if name has changed.
MailDir marks email as 'read' by moving from '/new' dir to '/cur' directory. ACS Mail Lite implementations should be consistent as much as possible, and so mark emails in IMAP as 'read' also.
Email attachments
Since messages are not immediately deleted, create a table of attachment url references. Remove attachments older than AttachmentLife parameter seconds. Set default to 30 days old (2592000 seconds). Unless ProcessFolderName is Trash, email attachments can be recovered by original email in ProcessFolderName. No. Once callbacks are processed, assume any transfer of attachments has occurred, so that processed email can be purged.