Forum OpenACS Q&A: naviserver cluster running inside of docker questions

We are continuing to test the naviserver cluster using the cachingmode=none option and would like to restart all servers in the cluster after an upgrade. After upgrade the acs-admin/server-restart page is called which restarts the current server but we would like to have this url restart all of the servers in the cluster.

We are testing on release/5.10 branch of OACS. with naviserver 4.99.23 build.

Since we are running inside of docker containers/network we do not know the IP addresses of the containers in advance and cannot enter them into the ClusterAuthorizedIP kernel parameter. The IP ranges could be used but opens up the range too far as docker assigns IPs in a larger range than we are comfortable.

We have run into the following issues:

1. Trying to use the ::acs::clusterwide cmd does not appear call the servers in the cluster currently.

acs-cache-procs.tcl -> broadcast calls the following code:
foreach server [::acs::Cluster info instances] {
      ns_log notice "CALLING ===> $server message $args"
      $server message {*}$args
}

However, whenever the [::acs::Cluster info instances] is called it returns nothing and will not enter into the foreach loop. I have verified during boot up that we have created the server instances (and can loop over them during boot up) but after we get all of the way up and issue the [::acs::Cluster info instances] cmd it does not return anything. I have tried running the following command from acs-admin/server-restart.tcl and from the shell, but it still does not seem to work. Any ideas or insights you may have on this would be greatly appreciated.

2. Since we are inside a docker network and cannot know the IPs of the servers in the cluster beforehand I have written some code to do an nslookup on the docker network names to get their IP addresses inside of cluster-init.tcl and server-cluster-procs.tcl. There are some timing issues with doing this so I was wondering if you have any opinions/suggestions on a better way to do it.

cluster-init.tcl
#
# Check if cluster is enabled, and if, set up the custer objects
#
if {[server_cluster_enabled_p]} {
    set myConfig [server_cluster_my_config]
    set cluster_do_url [::acs::Cluster eval {set :url}]

    #
    # Iterate over all servers in the cluster and add Cluster objects
    # for the ones, which are different from the current host (the
    # peer hosts).
    #
    foreach hostport [server_cluster_all_hosts] {
        set config [server_cluster_get_config $hostport]
        dict with config {

            # If inside Docker get the IP of the host and put into allowed_host list
            if {[info exists ::env(NS_INSIDE_DOCKER)] && $::env(NS_INSIDE_DOCKER) eq "true"} {
                set ip [docker_host_to_ip $host]
                if { $ip eq "0" } {
                    ns_log error "FAILED to find ip for $host !!"           
                } else {
                    ::acs::Cluster eval [subst {
                        set :allowed_host($ip) 1
                    }]
                }
            }

            if {$host in [dict get $myConfig host]
                && $port in [dict get $myConfig port]
            } {
                ns_log notice "Cluster: server $host $port is no cluster peer"
                continue
            }
            ns_log notice "===> Cluster: server $host $port is a cluster peer $cluster_do_url"
            ::acs::Cluster create CS_${host}_${port} \
                -host $host \
                -port $port \
                -url $cluster_do_url
        }
    }

    set info [::acs::Cluster info instances]
    ns_log notice "=====> CLUSTER INFO = $info !!"
    foreach server [::acs::Cluster info instances] {
        ns_log notice "==> SERVER ====> $server"
    }

    if {![info exists ::env(NS_INSIDE_DOCKER)] || $::env(NS_INSIDE_DOCKER) ne "true"} {
        ns_log notice "==>> NS_INSIDE_DOCKER=$::env(NS_INSIDE_DOCKER)"
        foreach ip [parameter::get -package_id $::acs::kernel_id -parameter ClusterAuthorizedIP] {
            ns_log notice "==> AuthorizedIP = $ip"
            if {[string first * $ip] > -1} {
                ns_log notice "==> ALLOWED_HOST_PATTERN=$ip"
                ::acs::Cluster eval [subst {
                    lappend :allowed_host_patterns $ip
                }]
            } else {
                ns_log notice "===> Allowing Cluster IP=$ip"
                ::acs::Cluster eval [subst {
                    set :allowed_host($ip) 1
                }]
            }
        }
    }

    set url [::acs::Cluster eval {set :url}]

    #
    # TODO: The following test does not work yet, since
    # "::xo::db::sql::site_node" is not yet defined. This requires
    # more refactoring from xo* to the main infrastructure.
    #
    if {0} {
        # Check, if the filter url mirrors a site node. If so,
        # the cluster mechanism will not work, if the site node
        # requires a login. Clustering will only work if the
        # root node is freely accessible.

        array set node [site_node::get -url $url]
        if {$node(url) ne "/"} {
            ns_log notice "***\n*** WARNING: there appears a package mounted on" \
                "$url\n***Cluster configuration will not work" \
                "since there is a conflict with the filter with the same name! (n)"
        }
    }

    #ns_register_filter trace GET $url ::acs::Cluster
    ns_register_filter preauth GET $url ::acs::Cluster
    #ad_register_filter -priority 900 preauth GET $url ::acs::Cluster
}

We start all of our naviservers at the same time on bootup with docker-compose. So they are basically starting up at the same time and get their networks defined at the same time too. I added the following code because I know there could be a timing issue here. I have seen it hit the code once on a retry and then got it the second time. But for the most part it should just work without retries.

server-cluster-procs.tcl

Your thoughts and insights would be greatly appreciated. Is there a better way to implement this for a docker environment?

ad_proc docker_host_to_ip {
    docker_host
    {-retry_cnt 3}
    {-ms_sleep_on_retry 1000}
} {

    Use nslookup to resolve docker hostname to ip

} {
    if { ![server_cluster_enabled_p] } {
        return 0
    }

    for {set i 0} {$i <= $retry_cnt} {incr i} {
        set cnt [expr $i + 1]
        # Note: nslookup must be present inside the docker image.
        # Note: 127.0.0.11 is always the docker resolver
        set cmd "nslookup $docker_host 127.0.0.11"

        if { [catch {set nslookup_info [eval exec $cmd]} errmsg] } {
            ns_log notice " (try $cnt of $retry_cnt) executing $cmd: $errmsg"
        } else {
            set match [regsub {.*Address: ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*} $nslookup_info {\1} ip]
            if {$match} {
                ns_log notice "Matched IP=$ip for HOST=$docker_host"
                return $ip
            }
            ns_log notice " (try $cnt of $retry_cnt) no nslookup match found for ($docker_host) nslookup output=$nslookup_info"
        }
        # It is possible that this particular docker network is just not up yet in the docker boot-up process.  
        # Sleep before retrying.  Once second should be more than enough.
        after $ms_sleep_on_retry
    }

    return 0
}

Also, I found a little issue in the cluster-init.tcl file concerning setting of the allowed_host_patterns member variable. It was missing the leading colon on the member variable. It should be 'lappend :allowed_host_patterns $ip'

32            ::acs::Cluster eval [subst {
33      ==>          lappend allowed_host_patterns $ip
34            }]

Thanks for your assistance, Marty

We are required (for security reasons) to run https port 443 for all communication between processes.

When trying to run the cluster behind nginx using https port 443 I ran into a problem where the code would only support http port 80. Here is my solution to get around this issue.

release/5.10 branch of OACS. with naviserver 4.99.23 build

server-cluser-procs.tcl -> server_cluster_my_config

ad_proc -private server_cluster_my_config {} {
} {

    set driver_section [ns_driversection -driver nsssl]
    set my_ips   [ns_config $driver_section address]
    set my_ports [ns_config -int $driver_section port]

    if {$my_ips eq "" || $my_ports eq ""} {
        set driver_section [ns_driversection -driver nssock]
        set my_ips   [ns_config $driver_section address]
        set my_ports [ns_config -int $driver_section port]
    }

    set my_ips   [ns_config $driver_section address]
    set my_ports [ns_config -int $driver_section port]
    return [list host $my_ips port $my_ports]
}
If there is a better solution please let me know. We appreciate your knowledge and assistance;) Marty