Just thought I'd give an update on this, for future references!
The solution mentioned in the earlier thread sortof works, because it increased the max-file limit in the kernel. Summary:
echo "24576" > /proc/sys/fs/file-max
echo "98304" > /proc/sys/fs/inode-max
What bugged me was that my server had opened some 8192 files (the default file-max). Increasing the file-max helped, but after a couple of days, the same error came up again -- nsd had opened even more files this time, and it hit the new limit.
A little more searching (had to compensate for starting this thread without searching first :) brought me to this:
http://freshmeat.net/projects/lsof/
Lsof is a Unix-specific diagnostic tool. Its name stands for LiSt
Open Files, and it does just that. It lists information about any
files that are open by processes currently running on the system. It
can also list communications open by each process.
(Quote from Freshmeat)
Using lsof (which was in /usr/sbin on my Redhat 7.2 server) I was able to quickly trace the offending tcl scripts (mine :) and solve *most* of the problem. Still working on the rest.
Perhaps this can be useful to 1) help anyone else who encounters problem in the future -- push up the file limit AND fix whatever's making your server open 9000 files at a time; and also to 2) buy myself some pardon for opening my mouth too soon :)