Forum OpenACS Q&A: Naviserver upgrade issue on RHEL 7.9

Hi all,

Upgrading an older installation running on RHEL 7.9 from OpenACS 5.8 to 5.10 and in the process upgrading Naviserver/Tcl. Using the latest install-ns script, we were able to build Naviserver 4.99.31 with the defaults for dependencies (Tcl 8.6.16, etc.).

When trying to start the server, even with the included simple config, the output looks like this:

$ sudo /usr/local/ns/bin/nsd -u nsadmin -g nsadmin -f -t /usr/local/ns/conf/simple-config.tcl
[-main:conf-] Notice: OpenSSL 1.0.2k-fips 26 Jan 2017 initialized (pid 23963)
[-main:conf-] Notice: initialized locale en_US.UTF-8 from environment variable LANG
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:conf-] Notice: nsmain: NaviServer/4.99.31 (tar-4.99.31) starting
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:conf-] Notice: nsmain: security info: uid=1002, euid=1002, gid=1002, egid=1002
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:conf-] Notice: nsmain: Tcl version: 8.6.16
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:conf-] Notice: nsmain: max files: soft limit 4096, hard limit 4096
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:conf-] Warning: nsmain: current limit of maximum number of files > FD_SETSIZE (1024), select() calls should not be used
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: init server default: using zlib version 1.2.7
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: pool default: queueLength 90 low water 9 high water 72
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: nsd/init.tcl[default]: booting virtual server: Tcl system encoding: "utf-8"
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: modload: loading module nslog from file nslog
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: nslog: opened '/usr/local/ns/logs/access.log'
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: modload: loading module nssock from file nssock
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: nssock:0: enable 0 spooler thread(s)
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Notice: nssock:0: enable 0 writer thread(s)
[30/Apr/2025:13:23:36][23963.7f58bdf0b980][-main:default-] Fatal: received fatal signal 11
Aborted

strace and gdb have not helped turn up any clues.

Any suggestions?

Thanks.

Collapse
Posted by Gustaf Neumann on

Hi Michael,

My first guess is that this comes from a binary mismatch (C based components compiled with a different Tcl version).

Compile with debug enabled in a fresh build dir, such as e.g. with the following command:

sudo with_debug_flags=1 build_dir=/usr/local/ns-src \
     bash install-ns.sh build

If you still see a crash, run nsd under gdb and show me the backtrace. If there is then still some problem, i will try to install somewhere a VM with RHEL 7.9.

all the best
-g

Collapse
Posted by Michael Steigman on
Thanks, Gustaf. Clean build directory remedied the issue.

We're in the process of slowly upgrading our project to RHEL 9. The build was the first step.

We're currently running the same Nginx proxy settings as those in front of our stage/prod server and the same config.tcl but with this new build and OpenACS 5.10 imported into our code base, page requests for / result in "too many redirects" errors. Looking for cookie settings and anything else that could be in play here. Do you have any suggestions? Any adjustments that might be necessary to config.tcl or on the NS side?

For example, from a different machine, a curl command like

curl -i -L https://mydomain.com/acs-admin/

results in a stream of 302s to

Location: https://mydomain.com/acs-admin/?

Thanks.

Collapse
Posted by Gustaf Neumann on

Glad that the clean install helped for the original problem!

Concerning the redirects: Maybe the following can shed light on this.

Add the following code to the end of packages/acs-tcl/tcl/utilities-procs.tcl, effectively redefining ad_returnredirect to be verbose.

d_proc -public ad_returnredirect {
    {-message {}}
    {-html:boolean}
    {-allow_complete_url:boolean}
    target_url
} {
    Write the HTTP response required to get the browser to redirect to
    a different page, to the current connection. This does not cause
    execution of the current page, including serving an ADP file, to
    stop. If you want to stop execution of the page, you should call
    ad_script_abort immediately following this call.

    <p>

    This proc is a replacement for ns_returnredirect, but improved in
    two important respects:
    <ul>
    <li>
    When the supplied target_url isn't complete, (e.g. /foo/bar.tcl or
    foo.tcl) the prepended location part is constructed by looking at
    the HTTP 1.1 Host header.
    </li>
    <li>
    If a URL relative to the current directory is supplied
    (e.g. foo.tcl) it prepends location and directory.
    </li>
    </ul>

    @param message A message to display to the user. See
                   util_user_message.

    @param html Set this flag if your message contains HTML. If
                specified, you're responsible for proper quoting of
                everything in your message. Otherwise, we quote it for
                you.

    @param allow_complete_url By default we disallow redirecting to
                              URLs outside the current host. This is
                              based on the currently set host header
                              or the hostname in the config file if
                              there is no host header. Set
                              allow_complete_url if you are
                              redirecting to a known safe external web
                              site. This prevents redirecting to a
                              site by URL query hacking.

    @see util_user_message
    @see ad_script_abort
} {
    ad_log warning "ad_returnredirect allow_complete_url $allow_complete_url target_url <$target_url>"
    if {$message ne ""} {
        #
        # Leave a hint, that we do not want to be consumed on the
        # current page.
        #
        set ::__skip_util_get_user_messages 1
        util_user_message -message $message -html=$html_p
    }

    if { [util_complete_url_p $target_url] } {
        ns_log notice "ad_returnredirect is complete <$target_url>"
        # http://myserver.com/foo/bar.tcl style - just pass to ns_returnredirect
        # check if the hostname matches the current host
        if {[util::external_url_p $target_url] && !$allow_complete_url_p} {
            error "Redirection to external hosts is not allowed."
        }
        set url $target_url
    } elseif { [util_absolute_path_p $target_url] } {
        #
        # The URL is an absolute path such as: /foo/bar.tcl
        #
        set url [expr {[::acs::icanuse "relative redirects"] ? "" : [util_current_location]}]
        append url $target_url
        ns_log notice "ad_returnredirect path is absolute, updated URL <$url>"
    } else {
        #
        # URL is relative to current directory.
        #
        set url [expr {[::acs::icanuse "relative redirects"] ? "" : [util_current_location]}]
        append url [ad_urlencode_folder_path [util_current_directory]]
        if {$target_url ne "."} {
            append url $target_url
        }
        ns_log notice "ad_returnredirect path is relative, updated URL <$url>"
    }

    # Sanitize URL to avoid potential injection attack
    regsub -all -- {[\r\n]} $url "" url

    ns_log notice "ad_returnredirect final redirect to <$url>"
    ns_returnredirect $url
}

I can't exclude that NaviServer 4.99.31 might contribute to the problem. To try with NaviServer 5, rebuild with

sudo with_debug_flags=1 version_ns=GIT build_dir=/usr/local/ns5-src \
     bash install-ns.sh build

all the best
-g

Collapse
Posted by Michael Steigman on
Thanks for the suggestions. In trying to build NS from git, I ran into this error:

tls.c: In function ‘Ns_TLS_CtxClientCreate’:
tls.c:1309:9: error: unknown type name ‘SSL_verify_cb’
SSL_verify_cb verifyCB = NULL;
^

A little searching led me to install openssl11 and openssl11-devel to pick up this new type name. However, I haven't been able to instruct NS to use the newer version. I tried exporting OPENSSL_CFLAGS, OPENSSL_LIBS, CPPFLAGS and LDFLAGS along with modifying the script with --with-openssl=/usr.

Any ideas on how to move past this?

Collapse
Posted by Michael Steigman on
I was able to move past that issue with SSL. I installed the optional openssl11 packages then went in and modified src/naviserver/include/Makefile.global

and changed the following lines to reference the newer version.

OPENSSL_LIBS = -L/usr/lib64/openssl11 -lssl -lcrypto
CFLAGS += -I/usr/include/openssl11 -I/usr/include

I am in the process of trying to work through some Tcl errors but do not seem to be dealing with redirects under NS5 any more.

I will follow up with any other questions as I come across them. Thanks.

Collapse
Posted by Gustaf Neumann on
New Insight: Infinite Redirection Caused by Root‑Node Permission Drop

We’ve identified a likely root cause of the infinite redirection issue: when the read permission on the top‑level site node (/) is unintentionally removed, anonymous visitors get stuck in a redirect loop.

What Happened

  1. Navigate to the “/” site‑map permissions form.
  2. Click Confirm Permission Settings without making any changes.
  3. A bug prevented direct (read‑only) permissions from being resubmitted, so they were dropped.
  4. As a result, anonymous users see: “The page isn’t redirecting properly”

This issue can happen, when the read permissions are removed from the top-level site-node entry (/). These permissions were erroneously dropped, when submitting the “/” site‑map permissions form without any changes. There was a bug that removed the permissions in this situation.

Huge thanks to Khy H for reporting this bug, which is already in the OpenACS 5.10.1 release!

The problem is fixed is in the main and oacs-5.10 branches, and is tagged with openacs-5-10-compat. Users upgrading the acs-subsite package from the repository via will automatically receive the patch (starting tomorrow, after the nightly rebuild of the repository archives).

See full details at:
https://openacs.org/bugtracker/openacs/bug?bug_number=3477