Forum OpenACS Development: Issues with site-wide SSL configuration

1: Issues with site-wide SSL configuration

Posted by Jose Mendez on 03/25/19 08:47 PM

We have an old ACS installation (pre-OpenACS) that we have been running for years with an SSL certificate that we use to secure the login page and other HTTPS restricted applications. We recently tried to restrict the entire site to HTTPS but ran into issues.

We introduced the configuration change over a weekend when traffic is low and the site seemed to be working fine. As soon as traffic increased the following Monday, the site became slow and eventually unresponsive. Load on the server was non existent and there were no unusual messages reported in the logs.

Before applying this change in production, we had introduced it in our development environment months ahead and we never noticed the issue. We have tried to reproduce the behavior using some load testing tools but we have been unsuccessful.

We have done additional troubleshooting in production and noticed that when HTTPS request became unresponsive, HTTP request were served properly. We also logged some CLOSE_WAIT events which made us suspicious of the nsopenssl driver.

For now we are thinking of upgrading OpenSSL and nsopenssl driver to their latest stable releases to see if it helps. Has anybody run into similar issues that may have some insight?

This is what we are currently running:

CentOS Linux 7
AOLServer 4.5.1
Tcl 8.5
OpenSSL 1.0.2.k-fips
nsopenssl ??? (We think it is 3.0beta26 but we are not 100% sure)
ACS (Kernel) 4.2

2: Re: Issues with site-wide SSL configuration (response to 1)

Posted by Gustaf Neumann on 03/26/19 09:37 AM

Wow, this is a methusalem! ACS was released 2001, OpenACS 4.5 was in 2002, this is quite a while.

You might consider the following options:
1) experiment with the various nsopenssl updates [1,2] or try to contact scott goodwin
2) upgrade from AOLserver to NaviServer, and use current OpenACS sources as a running documentation, what changes are needed (when i added the first NaviServer support to OpenACS more than 10 years ago, i remember, it was not much work)
3) use a reverse proxy such as nginx [2] for ssl/tls offloading

If you not want to invest effort in this application, (3) is probably the easiest step.

[1] https://panoptic.com/wiki/aolserver/Nsopenssl
[2] https://github.com/ahelsley/nsopenssl
[3] https://www.nginx.com/

3: Re: Issues with site-wide SSL configuration (response to 2)

Posted by Jose Mendez on 03/26/19 04:04 PM

Gustaf, Thanks for your reply. Indeed, this is quite an old installation. I believe it was instantiated in 2001.

Option (1) is where we are today and we plan to run some tests this week. If the issues persist, I think we will try option (3) next as we want to get this done as soon as possible.

We had also considered moving this to NaviServer but were afraid it would be an insurmountable task. It's encouraging to hear that it is not too much work 😊

5: Re: Issues with site-wide SSL configuration (response to 3)

Posted by Gustaf Neumann on 03/27/19 09:02 AM

As far i remember, i spent more time on bringing NaviServer's https driver to its current state (A+ rating) than adding changes to OpenACS to support NaviServer. Current OpenACS works with AOLserver and NaviServer, essentially based on the result of [ns_info name]. Many of the newer changes are not needed to for functioning, but for improved performance.

The link [1] shows the changes in packages/acs-* in chronological order (starting 13 years ago) to give you a glimpse what's needed.

-g

[1] http://fisheye.openacs.org/search/OpenACS/?comment=naviserver&contents=&addedText=&deletedText=&filename=%2Fopenacs-4%2Fpackages%2Facs*%2F**&branch=&tag=&fromdate=&todate=&datesortorder=ASCENDING&groupby=changeset&col=path&col=revision&col=author&col=date&col=csid&col=comment&refresh=y

6: Re: Issues with site-wide SSL configuration (response to 5)

Posted by Jose Mendez on 03/27/19 04:36 PM

Thanks, Gustaf. This is an excellent resource.

4: Re: Issues with site-wide SSL configuration (response to 1)

Posted by Andrew Piskorski on 03/27/19 03:29 AM

I got a recent NaviServer working with an OpenACS 5.2.3 (2006-06-25) codebase, and it wasn't bad. It basically amounted to backporting the OpenACS NaviServer support, as Gustaf said, plus commenting out certain "deprecated syntax" warnings in NaviServer itself. ACS 4.2 is truly ancient, but likely not much harder.

7: Re: Issues with site-wide SSL configuration (response to 4)

Posted by Jose Mendez on 03/27/19 04:41 PM

It sounds like it would be a worthwhile effort to attempt the upgrade to Naviserver. Thanks.

8: Re: Issues with site-wide SSL configuration (response to 1)

Posted by Jose Mendez on 04/05/19 12:28 AM

Sad to report that the upgrade to OpenSSL (v1.0.2.r) and recompiling of the nsopenssl driver did not resolve our issues. After testing it on our development environment we did the upgrade in Production with the same results.

Initially we only updated the software without enabling the site-wide configuration. We ran it for a day and our HTTPS-restricted applications showed no crippling problems but did notice some connections would be in a CLOSE_WAIT state for some time and would eventually go away.

This morning we forced HTTPS site-wide, and as the activity on the site increased we started to see intermittent delays until the site became unresponsive. netstat showed a large number of connections in a CLOSE_WAIT state. After some time, we received a "fatal signal 11" error and the site crashed. The crash is not something we had experienced before. We suspect this is because we had not left it running in this state for as long as we did today.

After rolling back the site-wide configuration, we restarted the service and noticed that HTTPS connections would still hang from time to time. In our original configuration non HTTPS-restricted URLs are redirected back to HTTP and these redirects would also take a long time. This is something we had not experienced before. It took restarting the server to alleviate this condition.

At this point, we have ran out of ideas on what to look for and troubleshoot. We are going to start testing the NGINX solution and hope we won't run into similar issues. I welcome any other tips or troubleshooting ideas in the meantime.

9: Re: Issues with site-wide SSL configuration (response to 8)

Posted by Gustaf Neumann on 04/05/19 09:03 AM

You can get a stable configuration with both other approaches.

- We run a large site behind nginx with up 5mio page views per day, all requests via HTTPS, backend currently via HTTP (change to HTTPS on the backend is on the agenda). When running behind a proxy, also the backend requires some changes. There are as well more changes involved for secure cookie handling for secure cases, in case you need this. There is a small wiki page [1] for current OpenACS (which is certainly not applicable for your methusalem), which might point to some problem areas.
- with NaviServer+nsssl, the reverse proxy issues won't arise, but you have to handle the porting steps mentioned above in advance. E.g. OpenACS.org runs with current OpenACS+NaviServer. This is a low traffic site, between 100k and 300k requests per day (the same nsd handles openacs.org, dotlrn.org and soon fisheye.openacs.org).

Both approaches run very stable, we every experiences the issues you are mentioning.

all the best
-gn
[1] https://openacs.org/xowiki/running_behind_a_proxy

10: Re: Issues with site-wide SSL configuration (response to 9)

Posted by Jose Mendez on 04/08/19 04:44 PM

Thanks Gustaf.

We got NGINX running in our test environment. Listening on port 80 and 443 with the AOLServer back-end listening on 8080. Installation and setup was straight forward.

The one thing we noticed was that our code uses [ad_conn location] for navigation links and we were getting the backend port (8080) in our links. To address this, we pulled some procs from a newer version of ACS (util::split_location and ns_parseurl) and added code in ad_conn to handle getting the location without the port number.

I noticed that in the version of ACS where we pulled the procs from, that this would be handled by a utility proc (util_current_location) and not in ad_conn directly. Should we be concerned about adding this to ad_conn?

Best

-Jose

11: Re: Issues with site-wide SSL configuration (response to 10)

Posted by Maurizio Martignano on 04/08/19 05:08 PM

Hello Jose,
I believe you can fix this issue by setting the following parameter

proxy_set_header X-Forwarded-For $remote_addr;

in the server section of your Nginx configuration file.
Try to google how to use the 'X-Forwarded-For' parameter in some Nginx configuration examples.

Hope it helps,
Maurizio

12: Re: Issues with site-wide SSL configuration (response to 10)

Posted by Gustaf Neumann on 04/09/19 02:38 PM

This is what i referred to with "When running behind a proxy, also the backend requires some changes". Your old version of ACS has to be made aware that it is not directly communicating with the peer, but with a proxy. So it will think that the other side can directly reply to its own address, or that the peer is the proxy server, which will be noted as well in the access.log containing just the IP addresses of the proxy instead of the IP address of the real peer.

Current OpenACS has the commands (see [1]) for handling such cases:
- [ad_conn peeraddr]
- [ad_conn behind_proxy_p]
- [ad_conn behind_secure_proxy_p]

Check out the source code of current OpenACS, when something special have to be done in the reverse proxy cases, some of these might not be relevant for you.

-gn
[1] https://openacs.org/xowiki/running_behind_a_proxy

13: Re: Issues with site-wide SSL configuration (response to 1)

Posted by Jose Mendez on 05/09/19 02:26 AM

Over the weekend we implemented NGINX in production to handle SSL and happy to report that it appears to have resolved the issues we had. It took some doing as we had to retrofit some procedures to have the site work properly behind NGINX. The site has remained stable since the change and it seems to be a bit snappier.

Thank you Gustaf for the pointers.