Forum OpenACS Development: Issues with site-wide SSL configuration
We introduced the configuration change over a weekend when traffic is low and the site seemed to be working fine. As soon as traffic increased the following Monday, the site became slow and eventually unresponsive. Load on the server was non existent and there were no unusual messages reported in the logs.
Before applying this change in production, we had introduced it in our development environment months ahead and we never noticed the issue. We have tried to reproduce the behavior using some load testing tools but we have been unsuccessful.
We have done additional troubleshooting in production and noticed that when HTTPS request became unresponsive, HTTP request were served properly. We also logged some CLOSE_WAIT events which made us suspicious of the nsopenssl driver.
For now we are thinking of upgrading OpenSSL and nsopenssl driver to their latest stable releases to see if it helps. Has anybody run into similar issues that may have some insight?
This is what we are currently running:
CentOS Linux 7
nsopenssl ??? (We think it is 3.0beta26 but we are not 100% sure)
ACS (Kernel) 4.2
You might consider the following options:
1) experiment with the various nsopenssl updates [1,2] or try to contact scott goodwin
2) upgrade from AOLserver to NaviServer, and use current OpenACS sources as a running documentation, what changes are needed (when i added the first NaviServer support to OpenACS more than 10 years ago, i remember, it was not much work)
3) use a reverse proxy such as nginx  for ssl/tls offloading
If you not want to invest effort in this application, (3) is probably the easiest step.
Option (1) is where we are today and we plan to run some tests this week. If the issues persist, I think we will try option (3) next as we want to get this done as soon as possible.
We had also considered moving this to NaviServer but were afraid it would be an insurmountable task. It's encouraging to hear that it is not too much work
The link  shows the changes in packages/acs-* in chronological order (starting 13 years ago) to give you a glimpse what's needed.
Initially we only updated the software without enabling the site-wide configuration. We ran it for a day and our HTTPS-restricted applications showed no crippling problems but did notice some connections would be in a CLOSE_WAIT state for some time and would eventually go away.
This morning we forced HTTPS site-wide, and as the activity on the site increased we started to see intermittent delays until the site became unresponsive. netstat showed a large number of connections in a CLOSE_WAIT state. After some time, we received a "fatal signal 11" error and the site crashed. The crash is not something we had experienced before. We suspect this is because we had not left it running in this state for as long as we did today.
After rolling back the site-wide configuration, we restarted the service and noticed that HTTPS connections would still hang from time to time. In our original configuration non HTTPS-restricted URLs are redirected back to HTTP and these redirects would also take a long time. This is something we had not experienced before. It took restarting the server to alleviate this condition.
At this point, we have ran out of ideas on what to look for and troubleshoot. We are going to start testing the NGINX solution and hope we won't run into similar issues. I welcome any other tips or troubleshooting ideas in the meantime.
- We run a large site behind nginx with up 5mio page views per day, all requests via HTTPS, backend currently via HTTP (change to HTTPS on the backend is on the agenda). When running behind a proxy, also the backend requires some changes. There are as well more changes involved for secure cookie handling for secure cases, in case you need this. There is a small wiki page  for current OpenACS (which is certainly not applicable for your methusalem), which might point to some problem areas.
- with NaviServer+nsssl, the reverse proxy issues won't arise, but you have to handle the porting steps mentioned above in advance. E.g. OpenACS.org runs with current OpenACS+NaviServer. This is a low traffic site, between 100k and 300k requests per day (the same nsd handles openacs.org, dotlrn.org and soon fisheye.openacs.org).
Both approaches run very stable, we every experiences the issues you are mentioning.
all the best
We got NGINX running in our test environment. Listening on port 80 and 443 with the AOLServer back-end listening on 8080. Installation and setup was straight forward.
The one thing we noticed was that our code uses [ad_conn location] for navigation links and we were getting the backend port (8080) in our links. To address this, we pulled some procs from a newer version of ACS (util::split_location and ns_parseurl) and added code in ad_conn to handle getting the location without the port number.
I noticed that in the version of ACS where we pulled the procs from, that this would be handled by a utility proc (util_current_location) and not in ad_conn directly. Should we be concerned about adding this to ad_conn?
I believe you can fix this issue by setting the following parameter
proxy_set_header X-Forwarded-For $remote_addr;
in the server section of your Nginx configuration file.
Try to google how to use the 'X-Forwarded-For' parameter in some Nginx configuration examples.
Hope it helps,
Current OpenACS has the commands (see ) for handling such cases:
- [ad_conn peeraddr]
- [ad_conn behind_proxy_p]
- [ad_conn behind_secure_proxy_p]
Check out the source code of current OpenACS, when something special have to be done in the reverse proxy cases, some of these might not be relevant for you.
Thank you Gustaf for the pointers.