Forum OpenACS Development: Re: Problem with sending out the email-address-verification email for new users (needs a package; exploring.)

i've just now started my server from a fresh checkout from oacs-5-8, and tagged the following packages with openacs-5-8-compat:
oacs-5-8 acs-admin
oacs-5-8 acs-api-browser
oacs-5-8 acs-authentication
oacs-5-8 acs-automated-testing
oacs-5-8 acs-bootstrap-installer
oacs-5-8 acs-content-repository
oacs-5-8 acs-core-docs
oacs-5-8 acs-datetime
oacs-5-8 acs-developer-support
oacs-5-8 acs-events
oacs-5-8 acs-kernel
oacs-5-8 acs-lang
oacs-5-8 acs-mail-lite
oacs-5-8 acs-messaging
oacs-5-8 acs-reference
oacs-5-8 acs-service-contract
oacs-5-8 acs-subsite
oacs-5-8 acs-tcl
oacs-5-8 acs-templating
oacs-5-8 acs-translations
oacs-5-8 ajaxhelper
oacs-5-8 attachments
oacs-5-8 calendar
oacs-5-8 categories
oacs-5-8 faq
oacs-5-8 file-storage
oacs-5-8 general-comments
oacs-5-8 intermedia-driver
oacs-5-8 news
oacs-5-8 notifications
oacs-5-8 oacs-dav
oacs-5-8 openacs-default-theme
oacs-5-8 ref-countries
oacs-5-8 ref-language
oacs-5-8 ref-timezones
oacs-5-8 rss-support
oacs-5-8 search
oacs-5-8 tsearch2-driver
oacs-5-8 xotcl-core
oacs-5-8 xotcl-request-monitor
oacs-5-8 xowiki
To be on the safe side, i've bumped version numbers of acs-tcl and acs-developer support, such that install-from-repository will pick it up. tomorrow morning MEZ the .apm files will be regenerated with the files above.

i've as well upgraded openacs.org to tcllib 1.15. to have as well there the same versions. Still, everything works as expected.

Maybe the following might help you: when openacs has an incorrect SMTPHost setup, the error message from smtp::sendmessage is rather crude

error reading "sock10": connection refused
The easiest way to test your sendmail setup is probably to open ds/shell and try something like the following
acs_mail_lite::send_immediately -from_addr jim@xx.com -to_addr jim@xxx.com -subject hi -body hi
Note, that none of this explains, why "catch" apparently does not work for you. You have not answered what tcl-versions you are using.

I mtried running acs_mail_lite::send_immediately from the shell with parameters almost exactly as you suggested, and got this in the return box:

The exact line I used from the shell is near the bottom of this, except I obfuscated the actual "to" addr.

Back in a few hours...

ERROR:

    while executing
"ad_raise notfound"
    (procedure "rp_serve_abstract_file" line 32)
    invoked from within
"rp_serve_abstract_file "$root/$extra_url""
    ("uplevel" body line 2)
    invoked from within
"uplevel $code"
    invoked from within
"smtp::sendmessage ::mime::25 -originator mailto:bounce-2536-F6B959E1B3FE54F3CB1FBEDFAD213955D3697DF6-376@jam.sessionsnet.org -header {From mailto:jim@xx.com} -heade..."
    ("eval" body line 1)
    invoked from within
"eval $cmd_string"
    (procedure "acs_mail_lite::smtp" line 30)
    invoked from within
"acs_mail_lite::smtp -multi_token $tokens  -headers $headers_list  -originator $originator"
    (procedure "acs_mail_lite::send_immediately" line 155)
    invoked from within
"acs_mail_lite::send_immediately -from_addr mailto:jim@xx.com -to_addr mailto:dev1@obfus.foo -subject "ds/shell test" -body "ds/shell test""
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 [string map {"\\\r\n" " "} $script]"

tcl version on my system is 8.5.13
strange. That's the version to use. When you type in the ds/shell "catch {ds_init}" then you should see either a "0" or "1" in the reply box, but not an error message. Can you confirm that?

Since you have neither confirmed not denied that you used "install/upgrade from directory", i still assume that you installed that way. The .apm files are rebuild by now. have you updgraded from repository? same results?

The smtp client implementation of of tcllib 1.15 starts more or less with a
catch {package require Trf 2.0}
What do you get in the message box of ds/shell when you type this command in? Btw, i checked just now: openacs.org has no trf installed.
Well I actually built and installed the latest Trf, so just the package require itself would return 2.1.4 and so the catch returns 0.
Yes, as mentioned before, I did try that with the ds I had installed (that did not have ds_init) and from both the tclsh I built for naviserv and from ds/shell, catch {ds_init} returned 1.

Since then, I upgraded ds, and now ds_init is present, but I haven't tried the same test as above (which should return 0, yes?)

No, I didn't use install/upgrade from directory, I used install/upgrade from repo.

-Jim

yes, "catch {ds_init}" should return 0. what puzzles me most is that you wrote, that you got an error from "catch {ds_init}", and by removing this line, the error disappeared. from your last posting, i get the impression that catch works... very wierd.

anyhow, do a "upgrade from repository", maybe you get some more updates this way.

Yaknow, I have a thought...

I've been pondering why the comment "If you use this, I will kill you" was placed on the commentary to ad_raise... in the year 2000...

all ad_raise does is...

return -code error -errorcode [list "AD" "EXCEPTION" $exception] $value

Could this cause something else to seem to be the line an error occurred on?

Heya Gustaf,

What I realized about our exchange while I was looking at things was, at the time I was changing things faster than I was telling you, and I know from the past this can get confusing and makes it hard to know what to suggest. Sorry for that, I'll try to keep you more informed next time you're helping me to look at things.

Along those same lines, I have no idea how catch seemed to be failing, I tested catch on random strings (worked, returned 1), on catch {ds_init} -- which works: before upgrading ds, returned 1, after, returned 0 -- and catch {package require Trf} would return 1 before Trf is installed and 0 afterwards -- as mentioned below, an interaction between return -code error, ad_raise and ad_try occasionally causes information about errors to be lost, and so it also causes messages to become uninformative, but I'm getting ahead of myself.

I continued to look at the mail situation, and mostly the error I was getting announced itself as "Error, ad_raise notfound", and when I wanted to test the mail sending in a tchsh shell -- in order to look at the situation completely free of openacs -- this required me to stop using acs_mail_lite::sendimmediately and to start using smtp::sendmessage, which is not in openacs (it's in tcllib).

When I was building up the call (it required some other stuff, like headers and a mime part to send), I did it in ds/shell, and consequently this showed me reasonably informative errors, which allowed me to fix the problems as they came up, until there was one point where it again showed "Error: ad_raise notfound". When I moved the test (with the setup) to run on the tclsh that was built for naviserv, I actually got an informative result, which I'll get to momentarily.

What I discovered about ad_raise, is it does one simple return statement, it is meant to raise an exceptional condition, to be caught by ad_try. It does something like return -code error -errorcode [list this is an exception $exception_name] and my belief is that when the smtp::sendmessage also returns an error code, it gets caught by the request processor as if it were one of these exceptions and partly because of that, some details about the actual error is either lost, or just not reported properly. I don't have complete details yet on exactly what happens,

Lastly, when I try to run smtp::sendmessage in the tclsh shell, the precise code I'm running is:

package require mime

set part [mime::initialize -canonical text/plain -string hi]

package require smtp

smtp::sendmessage \
    $part \
    -originator mailto:jim@jam.sessionsnet.org \
    -recipients mailto:dev1@jam.sessionsnet.org \
    -header {message-id mailto:9123@jam.sessionsnet.org} \
    -header {date {Thu Apr  3 01:42:29 PDT 2014}}

And as I'm typing each separate command, I can observe that all but the last return no errors. The error returned by the last one (the smtp::sendmessage command) is:

421: 4.3.0 collect: Cannot write ./dfs338xUr4013746 (bfcommit, uid=0, gid=104): No such file or directory

This message is an improvement in how informative it is, and I have no idea what parts of it mean, I see it's running as root, I don't know what "collect" is, and I have no idea how a write to a file in . can fail with file/dir not found.

-Jim

Hi Jim,

I think the catch not catching is due to flooding a sequence of catches with errors. I've been searching for a reference about this without success.

I believe I ran into the problem once due to a permissions issue, where I inadvertently changed the permission of a file that nsd had previously checked permission on and was accessing or writing to but subsequently the OS denied. nsd then spun with high CPU and diagnosing was difficult because catch didn't work as expected.

So, if Gustaf hasn't identified the exact issue, I do believe he is on to a central cause, namely that a file permission has changed for nsd, perhaps a lib file.

cheers,

actually, the error messages for SMTP errors "4.3.0 collect: Cannot write ..." hint on a permission problem of the mail delivery system on SMTPHost, not a permission problem with nsd. in the particular case chmod 1777 /var/spool/mqueue (whether the value of 1777 as recommended is a perfect value can be discussed, but one can at least potentially rule out permission problems, if the problem persists). Other possible causes might be that sendmail runs under wrong permissions.

Maybe the package parameter SMTPHost should point to a different mailhost with a correct sendmail/postfix/... setup. maybe jim is trying this on a new instance (fresh linux/bsd/..., fresh openacs database with default SMTPHost, etc).

nevertheless, the error feedback from openacs (acs-mail-lite, maybe acs-tcl, templating involved) should be improved.

I've committed a change that avoids that potential errors from smtp::sendmail can be swallowed silently from higher calling levels. bumped as well the version numbers, such that tomorrow one can get this change via "install from repository". I hope, this improves the situation.
Gustaf, I have another suggestion for a commit to acs-mail-lite, and it depends on your read of the tcllib smtp::sendmessage. The question being: if one provides username and password, does smtp::sendmessage know to use authenticated smtp, and (main point is) if one does -not- provide these, does it use the original unathenticated protocol?

If so... I have a suggestion, and I'll post it a bit later, meanwhile I'm going to test a coupla more times.

-Jim

I finally got success by providing -servers {a.smart.host} to smtp::sendmessage. This solution completely bypasses using the virtual server machine (aka localhost), and instead uses a machine nearby.

Initial test on my changes to acs_mail_lite::sendimmediately show it needs more work. On that now, results coming soon.

-Jim

After setting the smarthost parameter, acs_mail_lite::send_immediately works too, and with my to-be-proposed changes. One more test...

-Jim

I wanted to change how it's decided whether to use smtp auth or not, so I added code that either adds the smtp password and username, or does not add them, depending on whether the user and password are set in the acs-mail-lite parameters.

The diff:

--- cut here ---
diff -Naur /home/mu-new/openacs-5.8.0/packages/acs-mail-lite//tcl/acs-mail-lite-procs.tcl acs-mail-lite//tcl/acs-mail-lite-procs.tcl
--- /home/mu-new/openacs-5.8.0/packages/acs-mail-lite//tcl/acs-mail-lite-procs.tcl      2013-08-29 02:53:44.000000000 +0400
+++ acs-mail-lite//tcl/acs-mail-lite-procs.tcl  2014-04-04 14:38:06.000000000 +0400
@@ -141,7 +141,16 @@
        foreach header $headers {
            append cmd_string " -header {$header}"
        }
-        append cmd_string " -servers $smtp -ports $smtpport -username $smtpuser -password $smtppassword"
+        append cmd_string " -servers $smtp -ports $smtpport"
+
+      set smtppass_p [expr {$smtppassword ne ""}]
+      set smtpuser_p [expr {$smtpuser ne ""}]
+
+      # change the condition as you like: right now, both user and pass must be set to use auth.
+      if { $smtpuser_p && $smtppass_p } {
+          append cmd_string " -username $smtpuser -password $smtppassword"
+      }
+
        ns_log Debug "send cmd_string: $cmd_string"
        eval $cmd_string
    }
--- cut here ---

-Jim

When looking a little closer, I noticed a problem:

When the user fills out the registration form and clicks OK, the system sends the registration email, which the new user receives. But, (when the user clicks OK on the reg form) it still shows a blank page entitled Account Closed. Everything else seems to be working, but the UI makes it seem it got stuck or something's wrong.

When the user receives the verification email, it contains the link, which verifies the user properly.

There's two cases... one, if the verification email used send_immediately, I guess it's not reporting an error to the caller.

If the verification email is queued, two things one, how is the user informed that s/he needs to check their email? and two, how would the system know whether an error occured in sending the mail?

I altered my copy of acs_mail_lite::sent_immediately, maybe something I did caused this problem. I'll attach my code to the next message so you can see and comment.

-Jim

To clarify further, what should happen after the new user clicks OK on the registration form, is the user should be told to expect a verification email in the next few minutes, and the system is not showing a web page with that message.

Can anyone confirm that in openacs-5.8 it's having the user wait for the verification email?

-Jim

It seems like both sendmail and exim4 are running on the machine... I'll look into that.

Sendmail (which, according to ps aux) is running as root, while exim is running as userid 109. So one could figure, semdmail can't be the permission problem... still, sendmail has pieces and maybe they run as different users. I dunno, maybe I'll replace it all with qmail (which I might be able to get working with webmail if I even want to do that), or with exim (which has an easy setup).

One thing we knew a few days ago, was the email problem existed completely outside openacs, and we found this out when I tried sending from a tclsh.

One thing, I used the unix tool "mail" to send a mail, and that worked.

Anyway, still exploring. I'm also going to look into one of Torbin's suggestions, that is to try a smarthost other than localhost, and configure acs-mail-lite accordingly.

-Jim