Forum OpenACS Development: Re: Problem with sending out the email-address-verification email for new users (needs a package; exploring.)

Heya Gustaf,

What I realized about our exchange while I was looking at things was, at the time I was changing things faster than I was telling you, and I know from the past this can get confusing and makes it hard to know what to suggest. Sorry for that, I'll try to keep you more informed next time you're helping me to look at things.

Along those same lines, I have no idea how catch seemed to be failing, I tested catch on random strings (worked, returned 1), on catch {ds_init} -- which works: before upgrading ds, returned 1, after, returned 0 -- and catch {package require Trf} would return 1 before Trf is installed and 0 afterwards -- as mentioned below, an interaction between return -code error, ad_raise and ad_try occasionally causes information about errors to be lost, and so it also causes messages to become uninformative, but I'm getting ahead of myself.

I continued to look at the mail situation, and mostly the error I was getting announced itself as "Error, ad_raise notfound", and when I wanted to test the mail sending in a tchsh shell -- in order to look at the situation completely free of openacs -- this required me to stop using acs_mail_lite::sendimmediately and to start using smtp::sendmessage, which is not in openacs (it's in tcllib).

When I was building up the call (it required some other stuff, like headers and a mime part to send), I did it in ds/shell, and consequently this showed me reasonably informative errors, which allowed me to fix the problems as they came up, until there was one point where it again showed "Error: ad_raise notfound". When I moved the test (with the setup) to run on the tclsh that was built for naviserv, I actually got an informative result, which I'll get to momentarily.

What I discovered about ad_raise, is it does one simple return statement, it is meant to raise an exceptional condition, to be caught by ad_try. It does something like return -code error -errorcode [list this is an exception $exception_name] and my belief is that when the smtp::sendmessage also returns an error code, it gets caught by the request processor as if it were one of these exceptions and partly because of that, some details about the actual error is either lost, or just not reported properly. I don't have complete details yet on exactly what happens,

Lastly, when I try to run smtp::sendmessage in the tclsh shell, the precise code I'm running is:

package require mime

set part [mime::initialize -canonical text/plain -string hi]

package require smtp

smtp::sendmessage \
    $part \
    -originator mailto:jim@jam.sessionsnet.org \
    -recipients mailto:dev1@jam.sessionsnet.org \
    -header {message-id mailto:9123@jam.sessionsnet.org} \
    -header {date {Thu Apr  3 01:42:29 PDT 2014}}

And as I'm typing each separate command, I can observe that all but the last return no errors. The error returned by the last one (the smtp::sendmessage command) is:

421: 4.3.0 collect: Cannot write ./dfs338xUr4013746 (bfcommit, uid=0, gid=104): No such file or directory

This message is an improvement in how informative it is, and I have no idea what parts of it mean, I see it's running as root, I don't know what "collect" is, and I have no idea how a write to a file in . can fail with file/dir not found.

-Jim

Hi Jim,

I think the catch not catching is due to flooding a sequence of catches with errors. I've been searching for a reference about this without success.

I believe I ran into the problem once due to a permissions issue, where I inadvertently changed the permission of a file that nsd had previously checked permission on and was accessing or writing to but subsequently the OS denied. nsd then spun with high CPU and diagnosing was difficult because catch didn't work as expected.

So, if Gustaf hasn't identified the exact issue, I do believe he is on to a central cause, namely that a file permission has changed for nsd, perhaps a lib file.

cheers,

actually, the error messages for SMTP errors "4.3.0 collect: Cannot write ..." hint on a permission problem of the mail delivery system on SMTPHost, not a permission problem with nsd. in the particular case chmod 1777 /var/spool/mqueue (whether the value of 1777 as recommended is a perfect value can be discussed, but one can at least potentially rule out permission problems, if the problem persists). Other possible causes might be that sendmail runs under wrong permissions.

Maybe the package parameter SMTPHost should point to a different mailhost with a correct sendmail/postfix/... setup. maybe jim is trying this on a new instance (fresh linux/bsd/..., fresh openacs database with default SMTPHost, etc).

nevertheless, the error feedback from openacs (acs-mail-lite, maybe acs-tcl, templating involved) should be improved.

I've committed a change that avoids that potential errors from smtp::sendmail can be swallowed silently from higher calling levels. bumped as well the version numbers, such that tomorrow one can get this change via "install from repository". I hope, this improves the situation.
Gustaf, I have another suggestion for a commit to acs-mail-lite, and it depends on your read of the tcllib smtp::sendmessage. The question being: if one provides username and password, does smtp::sendmessage know to use authenticated smtp, and (main point is) if one does -not- provide these, does it use the original unathenticated protocol?

If so... I have a suggestion, and I'll post it a bit later, meanwhile I'm going to test a coupla more times.

-Jim

I finally got success by providing -servers {a.smart.host} to smtp::sendmessage. This solution completely bypasses using the virtual server machine (aka localhost), and instead uses a machine nearby.

Initial test on my changes to acs_mail_lite::sendimmediately show it needs more work. On that now, results coming soon.

-Jim

After setting the smarthost parameter, acs_mail_lite::send_immediately works too, and with my to-be-proposed changes. One more test...

-Jim

I wanted to change how it's decided whether to use smtp auth or not, so I added code that either adds the smtp password and username, or does not add them, depending on whether the user and password are set in the acs-mail-lite parameters.

The diff:

--- cut here ---
diff -Naur /home/mu-new/openacs-5.8.0/packages/acs-mail-lite//tcl/acs-mail-lite-procs.tcl acs-mail-lite//tcl/acs-mail-lite-procs.tcl
--- /home/mu-new/openacs-5.8.0/packages/acs-mail-lite//tcl/acs-mail-lite-procs.tcl      2013-08-29 02:53:44.000000000 +0400
+++ acs-mail-lite//tcl/acs-mail-lite-procs.tcl  2014-04-04 14:38:06.000000000 +0400
@@ -141,7 +141,16 @@
        foreach header $headers {
            append cmd_string " -header {$header}"
        }
-        append cmd_string " -servers $smtp -ports $smtpport -username $smtpuser -password $smtppassword"
+        append cmd_string " -servers $smtp -ports $smtpport"
+
+      set smtppass_p [expr {$smtppassword ne ""}]
+      set smtpuser_p [expr {$smtpuser ne ""}]
+
+      # change the condition as you like: right now, both user and pass must be set to use auth.
+      if { $smtpuser_p && $smtppass_p } {
+          append cmd_string " -username $smtpuser -password $smtppassword"
+      }
+
        ns_log Debug "send cmd_string: $cmd_string"
        eval $cmd_string
    }
--- cut here ---

-Jim

When looking a little closer, I noticed a problem:

When the user fills out the registration form and clicks OK, the system sends the registration email, which the new user receives. But, (when the user clicks OK on the reg form) it still shows a blank page entitled Account Closed. Everything else seems to be working, but the UI makes it seem it got stuck or something's wrong.

When the user receives the verification email, it contains the link, which verifies the user properly.

There's two cases... one, if the verification email used send_immediately, I guess it's not reporting an error to the caller.

If the verification email is queued, two things one, how is the user informed that s/he needs to check their email? and two, how would the system know whether an error occured in sending the mail?

I altered my copy of acs_mail_lite::sent_immediately, maybe something I did caused this problem. I'll attach my code to the next message so you can see and comment.

-Jim

To clarify further, what should happen after the new user clicks OK on the registration form, is the user should be told to expect a verification email in the next few minutes, and the system is not showing a web page with that message.

Can anyone confirm that in openacs-5.8 it's having the user wait for the verification email?

-Jim

It seems like both sendmail and exim4 are running on the machine... I'll look into that.

Sendmail (which, according to ps aux) is running as root, while exim is running as userid 109. So one could figure, semdmail can't be the permission problem... still, sendmail has pieces and maybe they run as different users. I dunno, maybe I'll replace it all with qmail (which I might be able to get working with webmail if I even want to do that), or with exim (which has an easy setup).

One thing we knew a few days ago, was the email problem existed completely outside openacs, and we found this out when I tried sending from a tclsh.

One thing, I used the unix tool "mail" to send a mail, and that worked.

Anyway, still exploring. I'm also going to look into one of Torbin's suggestions, that is to try a smarthost other than localhost, and configure acs-mail-lite accordingly.

-Jim