I had a look at my server logs tonight and it was full of
[10/Feb/2002:00:00:02][8770.4101][-nssock-] Warning: serv: no free
connections, dropping this one, total so far: 0
hundreds and hundreds of this. So I checked the access log and I was
getting hit 10 times per second by one IP address. I went to the
page in question and added right after I get the user_id
if {$user_id == foo} {
return
}
but this STILL wasn't clearing it out fast enough to avoid dropping
connections. (remember regular moderately heavy site traffic is
still going on at this time.) nstelemetry, when it loaded, confirmed
several times across several server restarts that yes 90% of the
threads were tied up with this one ip's requests. So I have a bunch
of questions:
- First, "total so far" was always 0. Bug?
- Second, when I'd try to access a page, it would just take FOREVER
to come up, and apparently the same was going on with the other
users; nobody said they got errors just "really bad lag." so is nsd
silently re-trying? to me "dropped" means "Sayonara, see you later,"
so this is a bit confusing.
- Third, and most important, why didn't doing "return" clear out
the conn quickly enough? I'd think it should be able to handle this
kind of minimal page hundreds if not thousands of times per second
and here it was crippled by maybe 10ps. (I also tried ns_returning
something, but figured that would have to wait for his modem to
handle the data so just returning would be better.) After a cursory
check it looks like the security procs are memoizing where they
should so there shouldn't be any db activity going on just to lookup
the user session after the first hit. ns_tcl_abort is pretty broken
or I'd have tried that too. (I haven't added any filters that hit
the db, either, and my referer_log is small so he wasn't generating
an insert to that either.) maxconnections/maxthreads were at
whatever the default is.
I thought for sure someone had set up a very antisocial script but
the guy swears he was just clicking on the link as fast as he could
before the browser could refresh. (He eventually stopped before I
finished reading the cisco docs on adding a packet filter. :) If one
guy could be that stupid I'm sure someone else will be, and I'm
concerned that apparently my nsd/acs setup isn't robust enough to
handle it.