Forum OpenACS Q&A: Server won't serve anything once .tcl libraries loaded at startup.

Sorry to have to ask this but I have now spent about two weeks trying
to tie this down and am really stuck. Please help someone.

I had a full prototype installation working with virtual hosting and
multiple ACSs with PostgreSQL 7.1 under Mandrake Linux 7.1 which had
been installed with OS security set to LOW.

Confident that I now fully understood the configuration processes I
hosed the machine and installed Mandrake 8.2 with security option
HIGH. I then installed AOLserver ad33.13, PostgreSQL 7.2, and all
other components just as before.

I have since been unable to serve a single .tcl page from ACS 3.2.5.
I have tested everything I can think of. I have removed the virtual
hosting and focused on just one AOLServer process for a single ACS. I
have recompiled AOLServer without the nsvhr patches, I have re-
compiled PostgreSQL without the ODBC drivers that I was hoping to use.

I have also trawled the forums and tried everything I can find in
relation to the browser error message - "the Document contained no
data". No luck.

All of the other drivers appear to be loading on startup (log file
attached below), the database responds to queries in psql, the server
log reports that the database connection is ok and the .tcl libraries
are reported as loaded ok also.

But here is the really confusing bit.......

If I comment out the line :

    ns_param library "/web/${server}/tcl"

so that the libraries are not parsed into memory at startup, the
AOLServer will successfully serve a test static index.html file
from /web/${server}/www. However as soon as I uncomment this line to
enable the ACS libraries I can't even get the static page to serve.
Netscape just reports "the Document contained no data, Please contact
your administrator". (But I am the Administrator - so what now??!)

I am left wondering the following :

1) Should I not be using PostgreSQL v7.2? (But I cannot see why that
should be a problem because the data is there and the server log
shows that queries are running successfully).

2) Is there any possibility that the OS security settings could have
an effect on the preauth filtering .tcl procs? Is there anything that
I have missed setup wise that would lead the ACS .tcl filters to
refuse to allow even the serving of the default index.html file?

Can anyone help with suggestions of other things to try?

I tried to included below my server1.tcl file and a copy of the
server log but the formatting is lost in the upload and I didn't want
to clutter up the forum. If anyone is willing to take a look I will e-
mail them. I have been using the sample ad.tcl renamed to
ad_server1.tcl. Once again sorry to bother everyone with this -
really stumped.

Thank you very much, Richard.

PS. I have also tried :  set hostname localhost
but no luck there either.

you can use <pre> tags to keep the formatting of your server log.  are there any error messages after a restart with your library in?
I have put the files on the following URL.

Thanks very much for replying, hope you will see what I have missed!

NSD Config :  http://freespace.virgin.net/richard_s.hamilton/server1.tcl.htm

Server Log File :  http://freespace.virgin.net/richard_s.hamilton/server1.htm

Regards
Richard

your config does

set hostname            [ns_info hostname]

If you have ForceHostP set in your parameters/ file, it will try to redirect all requests to this host.  This could be your problem if you were messing with virtual servers.

If you type http://192.168.100.2:8000/index.html (based on your server log output), can you see the static page?
If I comment out the line :
ns_param library "/web/${server}/tcl"
so that the libraries are not parsed into memory at startup, the AOLServer will successfully serve a test static index.html file from /web/${server}/www. However as soon as I uncomment this line to enable the ACS libraries I can't even get the static page to serve. Netscape just reports "the Document contained no data, Please contact your administrator". (But I am the Administrator - so what now??!)
In 3.2.5, don't forget the default parameter in parameters/ad.tcl:
#precedence for file extensions, e.g., "tcl,adp,html" means "serve 
# a .tcl file if available, else an .adp file if available, else an 
# .html file if available, else the first file available in alphabetical 
# order". Comma-separated 
ns_param ExtensionPrecedence tcl,adp,html,jpg,gif
This parameter will control what pages get served first. So if you only typed http://192.168.100.2:8000/, it will ignore your static index.html and serve index.tcl instead (if it exists). By itself, this does not explain why aolserver is not serving anything, unless you modified the default www/index.tcl and forgot to include ns_writes to actually output something to the connection. Something to check...
I have not modified any of the .tcl files and have also tried explicitly requesting pages as a test :
http://192.168.100.2:8000/index.html,
http://192.168.100.2:8000/monitor.tcl, http://192.168.100.2:8000/index.tcl,

Neither of these specific page requests produce any results at all once the .tcl libraries have been loaded.

What output do you get with?
telnet 192.168.100.2 8000(enter) 
GET /(enter) 
(enter) 
(enter) 
David,

Result of test here:

http://freespace.virgin.net/richard_s.hamilton/telnet_test.htm

Is this telnet service a default service provided by AOLserver or should I have configured it before carrying out this test? I have not enabled the general telnet server daemon on the machine for security reasons.

I do not at this stage have the experience to understand why the browser request should be deemed invalid by AOLserver. Would very much welcome an explanation.

Regards & thanks for your help so far.

Richard

This tells us that your server is listening on port 80.
The bad request is because you typed "GET/" instead of "GET /".
This isn't really a telnet service. We are using telnet to check the web service.

You can fix this by changing ForceHostP to 0 or false (in /web/${server}/parameters/${server}.tcl) as Jonathan mentioned or by setting hostname to 192.168.100.2
Have changed hostname to 192.168.100.2 and have also changed ForceHostP to 0. Still no luck.

I did try GET / before but the response was simply :

"Connection closed by foreign host."

I then tried GET/

After changing ForceHostP I tried: telnet 192.168.100.2 8000
GET /

The response is still the same:

"Connection closed by foreign host."

My only thought at this stage is why the connection is closed by foreign host when the connection is actually on localhost. Is this relevant?

Regards
Richard

Foreign host is whatever host is on the other end of your
connection.

Check your server's log files for the answer at
/usr/local/aolserver/log/server-???.log or wherever it might be.
Your server is probably generating some error in a preauth filter.
(I created a patch for AOLServer 4.0 to return an error to the user
in this case instead of just dying.  We'll see what they do with
it.)

If that is the case, what can I do about it? This problem did not arise on a previous installation. Could it be related to the OS security settings? i.e. If I re-install Linux with LOW security settings could that fix the problem? Could it be related to the use of Postgres 7.2 instead of 7.1? Is it likely to happen with ACS4.0 also?

Richard

Could it be related to the OS security settings?

Unlikely. Insofar as Mandrake is a Red Hat Linux derivative, by HIGH security settings they probably mean that most ports on the system are blocked.

I am not a Mandrake user. If Mandrake has the lokkit utility as Red Hat does, you can use it to inspect your "security settings". On Red Hat Linux, it is installed at /usr/sbin/lokkit.

Alternatively, you can go for the real thing and inspect your firewall settings directly with ipchains (/sbin/ipchains on Red Hat) like so:

bash$ su - -c "ipchains -L"
Password: 
Chain input (policy ACCEPT):
target     prot opt     source                destination           ports
ACCEPT     udp  ------  someplace.redhat.com  anywhere              domain ->   1025:65535
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   netbios-ns
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   netbios-dgm
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   netbios-ssn
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   1025
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   ssh
ACCEPT     tcp  -y----  anywhere              anywhere              any ->   http
ACCEPT     udp  ------  anywhere              anywhere              bootps:bootpc ->   bootps:bootpc
ACCEPT     udp  ------  anywhere              anywhere              bootps:bootpc ->   bootps:bootpc
ACCEPT     all  ------  anywhere              anywhere              n/a
REJECT     tcp  -y----  anywhere              anywhere              any ->   0:1023
REJECT     tcp  -y----  anywhere              anywhere              any ->   nfs
REJECT     udp  ------  anywhere              anywhere              any ->   0:1023
REJECT     udp  ------  anywhere              anywhere              any ->   nfs
REJECT     tcp  -y----  anywhere              anywhere              any ->   x11:6009
REJECT     tcp  -y----  anywhere              anywhere              any ->   xfs
Chain forward (policy ACCEPT):
Chain output (policy ACCEPT):

These are slightly modified MEDIUM security settings in Red Hat. For example, this box accepts connections to ports 22 (ssh) and 80 (http), and rejects connections to all ports in the 0-1023 range that do not have an explicit ACCEPT policy. Your setup should be similar.

In any case, if you say you can serve static pages when server is running on port 8000, that means your box IS accepting connections on that port. When you try telnetting into the port on which your webserver is listening, does it look like this:

bash$ telnet www.yahoo.com 80
Trying 64.58.76.224...
Connected to www.yahoo.com.
Escape character is '^]'.

If telnet says, Connected to localhost, then your security settings are fine. You shouldn't waste your time reinstalling the system with MEDIUM security settings.

It is not related to the OS security settings.  We verified with
telnet that we can connect to the web server.  There is no way to
know what to do to fix it until we know what is wrong.  And the
answer to the is in your server's log files.
Here is a section of access.log from /usr/local/aolserver/server1/modules/nslog

192.168.100.2 - - [12/May/2002:00:03:10 +0100] "GET / HTTP/1.0" 304 0 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:00:09:58 +0100] "GET / HTTP/1.0" 304 0 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:00:12:03 +0100] "GET / HTTP/1.0" 304 0 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:00:18:56 +0100] "GET / HTTP/1.0" 302 302 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:00:20:48 +0100] "GET /register.tcl HTTP/1.0" 302 314 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:00:25:19 +0100] "GET /doc/audit.html HTTP/1.0" 302 316 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:11:32:14 +0100] "GET / HTTP/1.0" 302 302 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.2 - - [12/May/2002:11:32:23 +0100] "GET /monitor.tcl HTTP/1.0" 302 313 "" "Mozilla/4.78 [en] (X11; U; Linux 2.4.8-26mdk i686)"

192.168.100.3 - - [12/May/2002:22:40:21 +0100] "GET / HTTP/1.1" 302 302 "" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows NT 5.0)"

192.168.100.3 - - [12/May/2002:22:40:21 +0100] "GET / HTTP/1.1" 302 302 "" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows NT 5.0)"

[Thousands of re-tries]

192.168.100.2 - - [12/May/2002:22:45:12 +0100] "GET /monitor.tcl HTTP/1.1" 302 313 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:12 +0100] "GET /monitor.tcl HTTP/1.1" 302 313 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:12 +0100] "GET /register.tcl HTTP/1.1" 302 314 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /monitor.tcl HTTP/1.1" 302 313 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /register.tcl HTTP/1.1" 302 314 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /monitor.tcl HTTP/1.1" 302 313 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /register.tcl HTTP/1.1" 302 314 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /monitor.tcl HTTP/1.1" 302 313 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

192.168.100.2 - - [12/May/2002:22:45:13 +0100] "GET /register.tcl HTTP/1.1" 302 314 "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914"

[Thousands of re-tries - 70 per second]

You will notice that I have tried connecting from an NT5.0 box 192.168.100.3 also.

The server logs each request but returns no data at all.

Thank you for persisting with this everyone. Much appreciated.

Regards
Richard

Did you look at your server's error log too? ~nsadmin/log/xxx-
error.log
Based on your access log files it appears that the ForceHostP/ set hostname xxx should have worked.

The log files I was interested in are at /usr/local/aolserver/log/server-???.log but they probably don't have anything interesting in them based on your access log files. make sure you kill all existing aolserver instances "killall nsd", and try again. perhaps you either didn't restart the server after changing the config files or, if you did, an old instance is still running somewhere.
I'm sorry, I misunderstood which log you were referring to because I had posted a link to the /usr/local/aolserver/log/server1.log file earlier in this thread. The link is:

Server Log File : http://freespace.virgin.net/richard_s.hamilton/server1.htm

I have restarted the machine completely after each alteration of config to avoid any danger of old instances running. A keepalive thread prevents me stopping the last instance of the server using kill -9 xxxx so a system restart is the only way I know to ensure that AOLserver restarts with modified config.

I hope that the error is obvious in the error log because I am completely baffled by this - it worked under Mandrake 7.1 with PostgreSQL 7.1.

Regards & Thanks

Richard

[12/May/2002:22:23:04][1290.2051][-sched-] Error: dns: gethostbyname failed: temporary error - try again
You still don't have your network setup. It has been a long while since I used 3.x so this is strange.
I thought that that error was just because the server is not on an internet connection and so the 'http://info.webcrawler.com/mak/projects/robots/active/all.txt'; link is not accessible. If I am mistaken please tell me. I believe that the network is set up because I can SSH in from other machines and can http:// from them as well to view the static test page (as long as I do not enable the .tcl procs).

Regards
Richard

I could be wrong, but I believe that gethostbyname uses your /etc/hosts or /etc/host.conf to get a hostname, specifically yours. <p>
I have tried everything that Patrick suggested above with no success
but I am very interested in the comment about the
/usr/local/aolserver/modules/tcl files. I have copied these into the
/web/server1/tcl directory (specified to source the .tcl files) and they
still do not load according to the error.log. Also in my previous
installation (which worked) these files must have been loaded by default.
Why would they not load this time by default. Is there an entry or selection
somewhere that tells AOLserver where to find the modules? Am I missing this
entry in my server1.tcl config file?

Regards
Richard

Have tested *.adp pages and found that they serve ok. So the problem is restricted to *.tcl pages. What on earth have I done wrong?

Richard

OK, thank you everyone for your input. The advice given here eliminated most of the possibilities and as a result I have now resolved the issue. For anyone who is interested here is the reason for the strange behaviour:

As noted above I had used PostgreSQL version 7.2 and AOLserver version 33.13. I had compiled the postgres.so driver against these respective libraries, most notably pointing the driver makefile to libpq.so.2.2 instead of libpq.so.2.0.

The previous successful installation had been setup using PostgreSQL v7.1.3 which ships with libpq.so.2.1. Pointing the makefile to this library had worked fine.

I came to the conclusion that the preauth filters which are loaded at server startup time must be hitting the database and failing ahead of any page serve but there were no entries in the error log to indicate that postgres was returning no data. The server error log reported that the db connections were all working fine and that the driver was functioning normally.

The server access logs were correctly indicating the page requests but the browser was being sent no data. The only clue was an error in the lastvisit cookie process which reported an invalid token length for a data type. This looked like a data type definition fault which made me suspicious of the postgres.so driver.

I therefore compiled Postgres_7.1.3 and re-compiled the postgres.so driver against the 7.1.3 libraries. ACS WORKED.

It seems that one or more of the following statements is true:

Either

1) OpenACS_3.2.5 is not compatible at present with the latest version of PostgreSQL

and/ or,

2) The AOLserver driver version 2.0.1 for postgres is not compatible with PostgreSQL_7.2

and/or,

3) the library libpq.so.2.2 (that ships with PostgreSQL_7.2) is not backwardly compatible with libpq.so.2.0 and libpq.so.2.1 resulting in the need for a revised AOLserver driver beyond version 2.0.1

The bottom line is that for those considering upgrading to PostgreSQL_7.2 at the moment my advice is - don't. It cost me a month.

Thanks to everyone for your ideas and help - very much appreciated.

Regards
Richard