Forum OpenACS Development: letsencrypt error on renew

Collapse
Posted by Raul Rodriguez on
Running NaviServer/4.99.16
Ran letsencrypt about 2 months ago with no issues.
Ran today and got an error:
key "key-change" not known in dictionary
    while executing
"dict get ${:apiURLs} $kind"
    (procedure ":URL" line 2)
    invoked from within
":URL key-change"
    (procedure ":getAPIurls" line 17)
    invoked from within
":getAPIurls $config"
    (procedure "getCertificate" line 39)
    invoked from within
"$c getCertificate"
    ("uplevel" body line 762)
    invoked from within
"uplevel {
    #
# letsencrypt.tcl --
#
#   A small Let's Encrypt client for NaviServer implemented in Tcl.
#   To use it, set enabled to 1 and drop it..."
    (procedure "code::tcl::/var/www/mysite//packages/acs-subsite/www/admin/l..." line 2)
    invoked from within
"code::tcl::$__adp_stub"
    ("uplevel" body line 12)
    invoked from within
"uplevel {

        if { [file exists $__adp_stub.tcl] } {

            # ensure that data source preparation procedure exists and is up-to-date
      ..."
    (procedure "adp_prepare" line 2)
    invoked from within
"adp_prepare"
    invoked from within
"template::adp_parse $themed_template {}"
    (procedure "adp_parse_ad_conn_file" line 14)
    invoked from within
"$handler"
    ("uplevel" body line 2)
    invoked from within
"uplevel $code"
    invoked from within
"ad_try {
                $handler
            } ad_script_abort val {
                # do nothing
            }"
    invoked from within
"rp_serve_concrete_file [ad_conn file]"
    (procedure "::nsf::procs::rp_serve_abstract_file" line 60)
    invoked from within
"rp_serve_abstract_file "$root/$extra_url""
    ("uplevel" body line 2)
    invoked from within
"uplevel $code"
    invoked from within
"ad_try {
                rp_serve_abstract_file "$root/$extra_url"
                set ::tcl_url2file([ad_conn url]) [ad_conn file]
                se..."
Collapse
Posted by Gustaf Neumann on

I just ran the the letsencrypt script on openacs.org to renew its certificate, and everything worked fine.

When you run the letsencrypt request, you get a log displayed, starting with something along these lines:

Obtaining a certificate from Let's Encrypt using the Production API:

Let's Encrypt URLs (production API):

   https://acme-v02.api.letsencrypt.org/acme/key-change
   https://acme-v02.api.letsencrypt.org/acme/new-nonce
   https://acme-v02.api.letsencrypt.org/acme/new-order
   https://acme-v02.api.letsencrypt.org/acme/new-acct
   https://acme-v02.api.letsencrypt.org/acme/revoke-cert

Reuse existing account registration at Let's Encrypt (/usr/local/ns/modules/nsssl/letsencrypt-production-account.key)
parseAccountKey /usr/local/ns/modules/nsssl/letsencrypt-production-account.key
...

These are the interface URLs from letsencrypt. You error message indicates, that these URLs (actually the "key-change" URL) were not set, so i would suspect some earlier error message hinting the cause of the problem.

Can it be, that someone blocks requests from your site to letsencrypt ? I had a few weeks a problem with OCSP stapling on openacs.org, since the a firewall suddenly blocked the outgoing traffic to letsencrypt, and since a web client requested a stapled certificate, and the cache expired, NaviServer had to issue a request for verification to letsencrypt, which was blocked (due to some blacklist that the firewall used).

all the best -g

Collapse
Posted by Raul Rodriguez on
I am not seeing a firewall blacklist situation.
I tried to renew on another server running NaviServer/4.99.18 and got the same error. Could it have something to do with acme-v01 instead of acs-v02 in the URLs?:

[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Notice: update_object_doc  ::letsencrypt::Client ...
[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Notice: letsencrypt: <html lang="en"><head><title>NaviServer Let's Encrypt client</title></head><body>
[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Notice: letsencrypt: <h3>Obtaining a certificate from Let's Encrypt using  the Production API:</h3>
[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Notice: SockConnect: target host <acme-v01.api.letsencrypt.org> has associated multiple IP addresses <172.65.32.248 2606:4700:60:0:f
53d:5624:85c7:3a2c>
[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Notice: async connect to 172.65.32.248 on sock 30 returned EINPROGRESS
[24/Apr/2021:14:38:22][644.7fa1f0ff9700][-conn:mysite:2:3115-] Error: key "key-change" not known in dictionary
    while executing
"dict get ${:apiURLs} $kind"
    (procedure ":URL" line 2)

Collapse
Posted by Gustaf Neumann on
You should probably update naviserver+lestencrypt. The ACME v1 interface is being phased out in multiple steps, what you are seeing is probably a consequence of this.

Starting with version 0.5 of the letsencrypt interface (released Dec 2019) support ACME v2.

Collapse
Posted by Raul Rodriguez on
I updated naviserver+letsencrypt. I am now getting this error:
[28/Apr/2021:20:15:26][694.7fcbf574b700][-conn:mysite:default:0:362-] Error: can't connect to mysite.com port 80: operation now in progress
:        while executing
:    "ns_http run -timeout 5.0 $wellknown_url"
:        (procedure ":authorizeDomain" line 43)
:        invoked from within
:    ":authorizeDomain $auth_url [dict get $id value]"
:        (procedure "getCertificate" line 111)
:        invoked from within
:    "$c getCertificate"
:        ("uplevel" body line 879)
:        invoked from within
:    "uplevel {
:        #
:    # letsencrypt.tcl --
:    #
:    #   A small Let's Encrypt client for NaviServer implemented in Tcl,
:    #   supporting the ACME v2 interface of let..."
:        (procedure "code::tcl::/var/www/mysite//packages/acs-subsite/www/admin/l..." line 2)
:        invoked from within
:    "code::tcl::$__adp_stub"
:        ("uplevel" body line 12)
:        invoked from within
:    "uplevel {
Collapse
Posted by Gustaf Neumann on
Dear Raul,

have you edited the error message? i wonder because of the use of "mysite.com". The error message means that the server could not open a connection to the site, sometimes related with DNS problems. If you see this, and you can open the connection to the mentioned URL by other means, it helps sometimes to run the script again.

Collapse
Posted by Raul Rodriguez on
Setup:

Router
WAN setup with Static External IP
Port Forwarding 80,443 to Internal server with internal ip listening on 8070/8073

OPENACS (5.9.1)
ACS Tcl Library Instance Parameters: SupressHttpPort 1
Main Instance Parameters: RestrictToSSL * acs-admin/*

Live server Running NaviServer 4.99.16 setup as above, as of yesterday, old letscenrypt worked with ACME v1 interface.

Upgrade:
Cloned live server
Replaced Live Server with Clone
Updated clone with NaviServer 4.99.21 and latest letsencrypt
Replaced NaviServer 4.99.16 openacs-config.tcl with NaviServer 4.99.21 openacs-config.tcl

Got error:
[30/Apr/2021:19:44:13][867.7f097c429700][-conn:jmspcs:default:3:2516-] Error: GET /, PeerAddress: 108.178.141.75
:    key "host" not known in dictionary
:        while executing
:    "dict get $li host"
:        (procedure "security::configured_driver_info" line 17)
:        invoked from within
:    "security::configured_driver_info"
:        (procedure "security::locations" line 7)
:        invoked from within
:    "security::locations"
:        (procedure "util::external_url_p" line 11)
:        invoked from within
:    "util::external_url_p $target_url"
:        (procedure "::nsf::procs::ad_returnredirect" line 19)
:        invoked from within
:    "ad_returnredirect $secure_url"
:        (procedure "::nsf::procs::security::redirect_to_secure" line 6)
:        invoked from within
:    "security::redirect_to_secure -script_abort=false [ad_return_url -qualified]"
:        (procedure "rp_filter_aolserver" line 94)
:        invoked from within
:    "rp_filter_aolserver $why"
:        (procedure "rp_filter" line 1)
:        invoked from within
:    "rp_filter preauth"
:    (context: filter proc)

Replaced 'host [dict get $li host]' with 'host [lindex [dict get $i address] 0]' in /var/www/jmspcs/packages/acs-tcl/tcl/security-procs.tcl

ad_proc -private security::configured_driver_info {} {

        Return a list of dicts containing type, driver, location and port
        of all configured drivers

        @see util_driver_info

    } {
       	set defaultport {http 80 https 433}
	set result {}
        foreach i [ns_driver info] {
            set type     [dict get $i type]
	    set location [dict get $i location]
            set proto    [dict get $i protocol]
            set li       [ns_parseurl $location]

            if {[dict exists $li port]} {
                set port [dict get $li port]
                set suffix ":$port"
            } else {
                set port [dict get $defaultport $proto]
		set suffix ""
            }
            lappend result [list \
                                proto $proto \
                                driver [dict get $i module] \
                                host [lindex [dict get $i address] 0] \
                                location $location port $port suffix $suffix]
        }
        return $result
    }

letsencrypt failed with:
[28/Apr/2021:20:15:26][694.7fcbf574b700][-conn:jmspcs:default:0:362-] Error: can't connect to jmspcs.com port 80: operation now in progress
:        while executing
:    "ns_http run -timeout 5.0 $wellknown_url"
:        (procedure ":authorizeDomain" line 43)
:        invoked from within
:    ":authorizeDomain $auth_url [dict get $id value]"
:        (procedure "getCertificate" line 111)
:        invoked from within
:    "$c getCertificate"
:        ("uplevel" body line 879)
:        invoked from within
:    "uplevel {
:        #
:    # letsencrypt.tcl --
:    #
:    #   A small Let's Encrypt client for NaviServer implemented in Tcl,
:    #   supporting the ACME v2 interface of let..."
:        (procedure "code::tcl::/var/www/jmspcs//packages/acs-subsite/www/admin/l..." line 2)

Updated OPENACS and removed the '*' that restricted the entire site to SSL via:
Main Instance Parameters: RestrictToSSL acs-admin/*

letsencrypt then failed on:
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Notice: letsencrypt: wellknown_url http://jmspcs.com/.well-known/acme-challenge/SL4skwMzi3Au5E1pleV9yKYZ444l4Lf7CSGMUJrrq7U returned status 302 time 0:16872 headers d15 body {!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.01//EN">
:    html
:    head
:    title Redirection /title
:    /head
:    body
:    h2 Redirection /h2
:   a href="https://jmspcs.com/.well-known/acme-challenge/SL4skwMzi3Au5E1pleV9yKYZ444l4Lf7CSGMUJrrq7U" The requested URL has moved here. /a
:    p align='right' small  i NaviServer/4.99.21 on http://192.168.0.127:8070 /i  /small /p
:    
:    /body /html
:    }
[15/May/2021:17:02:45][1611.7fd861924700][-conn:jmspcs:default:1:52-] Dev: config: ns/server/jmspcs/acs/acs-lang:SiteWideLocale value= (string)
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Debug(ns:driver): NsWriterQueue: size 40 bufs 0x7fd8621241e0 (3) flags 1030170 stream 000040 chan (nil) fd -1 thread 2
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Debug(ns:driver): NsWriterQueue: file is too small(40 < 1024)
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Notice: letsencrypt: challenge can not retrieved from server: http://jmspcs.com/.well-known/acme-challenge/SL4skwMzi3Au5E1pleV9yKYZ444l4Lf7CSGMUJrrq7U
:    
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Debug(ns:driver): NsWriterQueue: size 108 bufs 0x7fd8621241e0 (3) flags 1030170 stream 000040 chan (nil) fd -1 thread 2
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Debug(ns:driver): NsWriterQueue: file is too small(108 < 1024)
[15/May/2021:17:02:45][1611.7fd862125700][-conn:jmspcs:default:0:50-] Notice: letsencrypt: 
:                            Validation of domain www.jmspcs.com failed (final status invalid).
:                            p Please restart the procedure at a href="https://www.jmspcs.com/admin/letsencrypt" https://www.jmspcs.com/admin/letsencrypt /a
:                        

Updated Router:
Port Forwarding 80,443 to Internal server with internal ip listening on 80/443
Updated server systemctl and config.tcl to use ports 80/443

After this, letsencrypt worked. It seems that the previous version of naviserver/letsencrypt handled the redirect and ports other than 80/443. Is there something else that can be done to get letsencyrpt to work without changing ports in config and keeping RestrictToSSL to the entire site?
Collapse
Posted by Gustaf Neumann on
The following was due to some older version of OpenACS installed

: key "host" not known in dictionary
: while executing
: "dict get $li host"
: (procedure "security::configured_driver_info" line 17)

I would recommend to either upgrade or to replace security::configured_driver_info with the actual version [1] or better to upgrade to the version in the oacs-5-10 branch.

Port Forwarding 80,443 to Internal server with internal ip listening on 8070/8073

If I understand correctly, this forwards to different ports on the same host.... and probably, 8070 and 8073 and accessible from the outside. ... and for this reason, you have set

ACS Tcl Library Instance Parameters: SupressHttpPort 1

I see here two main problem areas: (a) letsencrypt.tcl is a pure NaviServer module, it does not know about OpenACS and its package parameters and (b) letsencrypt accesses the server via plain HTTP ("/YOUR_DOMAIN/.well-known/acme-challenge/TOKEN") to retrieve the challenge, ... and does not accept redirects.

So, when OpenACS forces redirects to https, then there is a problem. I wonder, how this worked before on your setup. Maybe the latter point has changed between ACME v1 and v2.

I see two options: implement a high-priority filter in the NaviServer letsencrypt module, that allow temporarily plain HTTP access to .well-known/acme-challenge, or to modify the OpenACS filter, to let these requests through without https redirects. Not sure yet, what's better.

Would you also agree with the analysis with your understanding from your setup?

Many thanks for the good feedback, and congratulations to get it working.

-gn

[1] https://openacs.org/api-doc/proc-view?proc=security::configured_driver_info&source_p=1

Collapse
Posted by Gustaf Neumann on
There are some updates to letsencrypt in the repositories:

The background operation "check_expired_certificates" in OpenACS 5.10 checks for expiring certificates and sends warnings before the expiry. This functionality is available since while. New is the feature that this function performs automated certificate renewal for letsencrypt certificates.

This change reduce maintenance effort by automating certificate renewal. When the NaviServer letsencrypt module is installed and configured, the background operation check_expired_certificates will automatically update the certificates when these expire soon (as defined by the "ExpireCertificateWarningPeriod" parameter of acs-admin). When a recent version of NaviServer is used that supports certificate re-fetch on SIGHUP, the new certificates are automatically updated without a server restart.

Prerequisites:

  • Recent version of letsencrypt NaviServer module installed (0.6) and configured
  • Recent version of NaviServer (currently Bitbucket tip) for automated certificate reloading

A sample configuration is in [1]. When the recent letsencrypt module is not installed, "check_expired_certificates" sends expiration warnings as usual. Therefore, the change does not harm for sites using certificates from different sources.

This new functionality was used for latest certificate renewal on openacs.org and on lehrbuch-wirtschaftsinformatik.org.

Hopefully, someone will find this useful
-g

[1] https://bitbucket.org/naviserver/naviserver/src/master/openacs-config.tcl