Forum OpenACS Q&A: Long-running AOLServer is forgetful?
But the real kicker was that when I checked the .tcl source code (in both the "tcl" and "www" directories), it was all 100 percent OK. That is, my code was correct, but somehow things were getting lost between the Moreover, restarting the server solved the problem.
So I'm really confused, and even a bit worried. Why would AOLServer (and/or OpenACS) be keeping around old versions of my .tcl files? Why would the old ones suddenly come to life? Why should restarting the server fix this? (Does it have something to do with ADP caching?)
Has anyone else seen this problem? It was quite disturbing to see this happen, and I'm beginning to wonder what I should do to avoid it in the future.
But I've never seen behavior like this. Usually if something changes, Aolserver recaches the .tcl bytecode, I believe.
What are your maxthreads, minthreads, and threadtimeout settings on that server?
Did you, maybe, turn on PerformanceModeP sometime during that week? That might well kill all file watching for any new threads created. Actually, if changing the PerformanceModeP flag (to either on or off) is obeyed at anything other than server start time, it would probably give you weird and nasty results - old threads would have the latest watched files, new threads would have older Tcl procs without the watched files.
If I remember correctly, the watched files stuff never changes the contents of the AOLserver master thread (or whatever it's called) at all, as that thread is written to only during AOLserver startup. When a new thread is spawned, AOLserver copies the master thread to the new thread. The OpenACS file watching stuff roughly does two things: One, it sources all the watched files into all existing threads. Two, after each new AOLserver thread is spawned, it sources the watched files into those threads too. Naturally these extra steps make thread spawning less efficient.
I'd never purposely leave a server running for a week with watched files, btw. I believe the whole file watching business was intended strictly as a development tool. If it's a Development server, I would simply restart the server every time I finish testing something and commit the code to CVS. A Production server probably shouldn't ever use watched files at all.
Oh yeah, and from the mailing list, it sounds like there's some last minute work going into AOLserver 4.0 to give us lazy loading of Tcl procs there. They're doing it in order to speed up start time of AOLservers with really large amounts of Tcl code, but it may end up giving us better tools for implementing stuff like the OpenACS file watching as well.
Andrew's description of how it works is incorrect. Each thread will reload the watched files when they next serve a request, with the call to the apm proc located in the request processor. The mtime for the file at watch time is cached for all threads in an nsv array, then each thread does a "file mtime" and compares it to the cached mtime. If it differs, the file is sourced into the current interpreter. The code looks like it might be susceptible to race conditions but as Andrew points out it's really meant for development. That why setting performance mode "true" turns off the check.
Also ... if you're making changes to /www .tcl files the caching is being done by AOLserver, with mtime also being used to determine when to recompile the file. You don't need to "watch" because AOLserver checks each time the page is served (/tcl libarary files are different because they're never served, they just consist of procs called from served pages)
So ISTM that something else is the culprit here. Are there any proxy servers in the loop? What are the browser cache settings? etc etc
(a) I didn't do anything with the .tcl files after turning on the watch. But at some point, the watching began to ignore chnages to the files, and kept the in-memory version. It's almost as if watching was turned off in practice, but kept on as far as the system flags were concerned.
(b) It sounds reasonable that watches should only be active for a short period of time. The performance issues with watches were always obvious to me, but on a development machine, I didn't really care. Indeed, with a lot of changes going on, I found it very useful and helpful to keep a file watched for long periods of time -- days or weeks. It might be nice to warn people against keeping watches active for more than a short period of time.
In other words: Yeah, I shouldn't have kept the watching on for so long, and I'll avoid doing that in the future. But I feel like hackers should get a warning about keeping watches active for too long, because there does seem to be some sort of corruption/caching issue here, and I would hate for someone else to encounter the same problem in the future.
But actually, I think Don was also saying that he suspects the file watches may not have been the problem...
The really bizarre thing is that later that day, the file reverted back to it's previous form again, and I had to go through the above again. I was definitely hearing Twilight Zone music that afternoon....
I'm not sure if this is related or not, but sometimes I have to empty my browser cache and then kill and restart the browser. It may have to do with the mtime on the file plus the cache's behavior. I essentially had the same problem: an error kept appearing.
So question: did the log also report the error?
Also debugging a request should be easier than it is. I started using index.vuh files, and it seems that once the rp hits that file, it stops logging. It actually reports the wrong file (it is the adp template it thinks it is going to use, but if the tcl file uses ad_return_template with something else, the rp has no clue).
Watching is only relevant for .tcl library files.
However, the templating system does its own caching of compiled .adp files, in that it compiles the .adp file into a proc in a certain namespace, which is the reused.
I'm actually not too familiar with the details, but I do recall having been bitten by an .adp page not being up-to-date wrt the .adp in the file system.
Reuven, can you help us by clarifying whether this was a library /tcl/*-procs.tcl file, a www/*.tcl file, or a www/*.adp file?
Yes, in my case the error did show up in the log.
I'm pretty sure that the problem was with a .tcl file in the tcl/ directory. I looked through my mail archive from that project, and I don't see a 100 percent answer on this topic.
But the fact that I checked to see if the file was being watched, and the proc names that I mention in my internal e-mail messages are in .tcl files lead me to believe that it was in the tcl/ directory.
But it makes me feel *really* good to know that I'm not the only one scratching his head and experiencing some weird behavior here. As someone else wrote, it was definitely Twighlight Zone-ish to modify a file and not see the results reflected on the Web page.
I modified the -html_p parameter to forum::message::new, renaming it -format. There are no traces of html_p left anywhere that matters (I'm on Oracle):
[janine@pacad2 forums]$ fgrep -rl html_p *
However, every time I restart the server, it reverts back to thinking that the parameter is called -html_p, and when I call it with -format I get an error. I turn on watching for the forums package and go into tcl/messages-post.tcl and add a comment to the ::new function. Voila, everything works again until the next restart, even if I remove the comment.
I've even verified that it thinks I should be using -html_p by changing the name back in message-post.tcl and verifying that the error goes away. This is truly bizarre.
Hm, I haven't paid any attention yet to how the new Tcl service contracts work. Is it possible that some service contract type thing is reading the PL/SQL function definitions with the "html_p" switch and dynamically defining Tcl procs that match? I kind of doubt it, but...
make sure to do a case-insensitive grep:
fgrep -ril html_p *
Is the ADP parser (and/or the OpenACS templating system) so stupid that it tries to load backup files? Can this possibly be the case? (I'll try to look through the source code to find out, but someone else here is probably quite an expert on the subject.)
If it really is the emacs backups, which causes problems with watching files, then for a quick fix just check your new packages into your own CVS repository. Emacs will not generate the backups when it notices it is working in a file under CVS control.