Forum OpenACS Development: Blocking access to login based on repetitive hits by ip address?

My web page stats (of my personal site) show something quite interesting.

/register/ receives over 4000 hits a month, but I have less than 20 registered users in over 1 year.

And my site allows anonymous comments, so it's not because people are being thrown to /register/ and leaving.

So I'm assuming that people are hitting my registration page with automated username/password robots trying to brute force their way in.

My proposed solution is to detect > x hits to /register/ from an individual ip address within y minutes, and block that ip for y hours.

Has anyone implemented anything like this in openacs? wouldn't be too hard, but I would need to investigate the performance hit of logging via db, memory, disk file etc.

I suspect you're getting 4000 hits because your robots.txt does not exclude /register.

Feel free to offer this as a patch for OpenACS. None of us should be getting this behavior.

If we do implement something on this, I think Google Mail has the best interface approach. I'll send you a GMail invite if you want to check it out.

I agree fully with Jade, but just in case you would want to do this blocking anyway for some reason: don't do it in oacs. Chances are that you're running some flavor of unix and most unices have packetfiltering capabilities (netfilter/iptables on linux, ipfw on bsd, sunscreen lite for solaris). A simple script can tail the fw log or access log and add/delete fw rules based on what it sees. There are some implementations for this concept as well (mostly netfilter/iptables based) - freshmeat/google is your friend.
Good answers.

In this case I do have control over the firewall, but in many cases the application administrator will not (like in multi-hosted environments) so I think both are beneficial.

Jade - what part of G-Mail's interface do you mean? I haven't seen it yet.

Some robots just don't respect robots.txt...

On my personal site, which has had 1386 hits inside /register/ since last October, 924 are from msnbot, 154 from FAST-WebCrawler, 126 from "http://www.almaden.ibm.com/cs/crawler", and another 21 from miscelaneous robot-like UAs... and /register/ is in the robots.txt... at least google seems to be doing the right thing...

jeez... you'd think at least MSN would be capable of respecting standards on the web... :)

hi there,

i have developed a small throttle and monitoring
package, we use permanently on our server. it is
written in XOTcl and uses Zoran libthread packages.
A controlling thread is created that recieves
information about requests (begin and end of request).
when the server is on high load and a user
requests within a time window to many requests,
the user is throttled. If he/she continues to
be eager, the user is kicked out (e.g. a
short error reply is sent back). In addion,
we keep a lot of statistics such as
graphs about active users, views per second or
hour, avg response time per minute, hour, etc.

on our system we have up to 3.5 mio hits per day,
around 15 dynamic views per second (sustained
avg over an hour, not counting images/css files).
The original need for the package was to cope
with users that like to mirror the whole content
of our site, especially, when the traffic is high.
Such "attacks" brought the system to a hold. Now
the problem is gone.

If there is interest, we can remove site specific
stuff and make it available...

-gustaf

  # This is a simple request-throttle application that
  # avoids simple DOS-attracks on an AOL-server.
  # The user (request key) can be specified via ipAddr or some other key,
  # such as an authenticated used.
  # Parameters:
  #  - timeoutMs: time window to keep statistics for a user
  #  - startThrottle: if user requests more than this
  #, he is throttled
  #  - toMuch: if user requests more than this
  #, he is kicked out
  #
  # The throttler is defined as a class to make to extensible
  # to define e.g. different kinds of throttling policies for
  # different kind of request keys. Note that the throttle thread itself
  # does not block, only the request thread blocks if necessary.

  Class ThrottleStat -parameter { type user_id timestamp ip_adress url }

  Class Throttle -parameter {
    {activeUserMinutes 10}
    {timeoutMs 2000}
    {startThrottle 3}
    {toMuch 7}
....
}

I think this looks quite interesting. You say it's a package...as in OpenACS singleton service? This would be a useful addition.
By all means Gustav. I'm looking forward for both a valuable package and the first XOTcl contribution.

/Bart

That rocks! It's also a great example of the flexibility of the aolserver infrastructure. I'm looking forward to seeing it.
Please provide this. After all, it will allow us to have a look at XOTcl and it's benefits for OpenACS. Thanks in advance
Collapse
11: Re: XOTcl (response to 10)
Posted by Mark Aufflick on
I hadn't particularly investigated XOTcl - there are after all a number of object extensions to Tcl, none of which have set the world on fire.

At the moment, nearly all my client work has been developing Object Oriented Perl - some systems over 50,000 lines of code. I have to say that after that, going back to Tcl REALLY HURTS.

XOTcl gets a way towards providing better software engineering options. Given that one of my biggest gripes with Tcl is the lack of advanced data structures and the necessity for upvar's, XOTcl would allow me to design object libraries to be data structures for me. While not nearly as cool as Perl, if we could implement, for example, the entire gang of four set of objects & methods then OpenACS will be able to present itself as a far more serious and mature development environment.

Any comments or flames?

Collapse
12: Re: XOTcl (response to 11)
Posted by Ola Hansson on
Mark,

Could you elaborate on why going back to Tcl, after having used Perl, really hurts? Is it the object orientation versus procedural style, or something else?

As for "upvar" ... it isn't much worse than pointers in "C", is it? And I bet that you use pointers all the time when you're doing stuff with C.

So "object orientation" is far more mature than procedural now? That's funny, I would have thought it was the other way around. 😉

Having said that, I think there's no doubt that OOP almost always provides a better way to write large programs than procedural programming does. But I am questioning if it is all that constructive to at this stage - given the established and well-proven structure OpenACS has today - introduce a new obstacle towards learning how existing packages work and how to develop new ones.

Putting an example package in contrib, that folks can look at for inspiration if they want to write custom code in OO style, is one thing (Herr Neumann, please do!) ... It is another thing to endorse XOTcl and divert scarse OACS developper resources to rewriting existing packages. That would have to be a fork of this toolkit, IMHO.

<blockquote>Randy O'Meara:
You say it's a package...as in OpenACS singleton service
</blockquote>

it uses 2 "nonstandard" c-based extensions
  * thread 2.6
  * xotcl

when these are installed, we have
  * 2 (xotcl) files for .../modules/tcl/
    (one high level interface for threads, one for
    throtteling + statistics)
  * 3 files or so for the presenting the statistics

in a first step i can put together a tar file + README,
it should not be hard to make an apm package out of this.

we have some statistics included that depend on our
e-learning packages (like exercises per minute), that
must be stripped off; i should be able to do this
over the weekend and make it available...

<blockquote> Ola Hansson:
Having said that, I think there's no doubt that OOP almost
always provides a better way to write large programs than
procedural programming does. But I am questioning if it is
all that constructive to at this stage - given the
established and well-proven structure OpenACS has today -
introduce a new obstacle towards learning how existing
packages work and how to develop new ones.
</blockquote>

This is certainly a very important concern.
an OO language can be used in various ways;
given the large corpus of existing code, moving
the core structure to OO is not realistic in
the near future and would most probably
lead to a different project (Neophytos did
something like this quite a while ago for a
site he is operating).

However, using OO to improve reusability and
fexibility of the tcl structures and functions
is worthwile. after all, xotcl is tcl with a
few more predefined commands, so it integrates
smoothly. e.g. xotcl can improve caching in
many ways.

-gustaf

here is the first shot of the package.
  http://media.wu-wien.ac.at/download/throttle+stats0.1.tar.gz

I have removed the site specific content, changed
the place where the throttle check is placed
(we had it in the request processor, moved it
into a filter - still seems to work), made it
usable from plain AS and OACS. Hope that i
introduced no glitches....

Contents:

  ./README
  ./COPYRIGHT

    # user interface
  ./pages/stat.tcl
  ./pages/stat.adp
  ./pages/stat-details.adp
  ./pages/stat-details.tcl
  ./pages/backcolor.gif

  # main packages
  ./modules
  ./modules/tcl
  ./modules/tcl/throttle_mod.xotcl
  ./modules/tcl/thread_mod.xotcl

This distro should really split into two pieces,
thread-mod and throttle+stats in the near future.

-gustaf neumann

Gustaf,

As I began a code audit and subsequently started to research XOTcl (to which I've never been exposed), I uncovered the fact that you are a primary creator of XOTcl. So, I expect that your contribution will serve as an excellent intro to XOTcl! Thank you so much for sharing.

/R

Collapse
16: throttle+stats (response to 14)
Posted by Andrew Piskorski on
Gustaf, your stats and throttling package definitely does sound interesting. Is it actually dependent on OpenACS in any way though? I suspect that it can be used on any AOLserver site which has XOTcl and the Tcl Thread Extension installed for AOLserver, whether OpenACS is installed or not - is that correct?

Hm, I see a bit of code that is in pages/stat.tcl that uses the OpenACS templating system if it is present, but that's optional, it will still work fine in plain AOLserver?

Collapse
17: Re: throttle+stats (response to 16)
Posted by Andrew Piskorski on
Oh my, I didn't read very carefully: Gustaf already said above that his throttle+stats package works both with OpenACS and without it. Sorry.
<blockquote> Randy O'Meara
So, I expect that your contribution will serve as an excellent intro to XOTcl!
</blockquote>

well, there is a tutorial on www.xotcl.org, which will be
more suitable than this. It is some working code, we have
in production since at least two years. I took your comment
to remind me to clean it some more up (made some improvements just now).

there is now an updated version on
http://media.wu-wien.ac.at/download/throttle+stats0.2.tar.gz

changes against 0.1 are: fixed a few typos, put in more
statistics, that i have ripped out before, made some
corrections for idle sites (when there are no requests
for a couple of minutes).

The term "views per second" might not correct
for other uses (we direct requests for static pages
to a different server), since it counts all monitored
requests.

As a side-note: one can query the number of online
users in this package via "throttleThread do Users total".
This call is about twice as fast on our system as
whos_online::num_users, at least, when there are a few
hundred users online.

best regards
-gustaf

Hallo Gustaf,

you wrote:
"there is now an updated version on
http://media.wu-wien.ac.at/download/throttle+stats0.2.tar.gz "

It seems that the package is not available anymore:

" invocation error: Object 'download/throttle' unknown

Status Code 404: Not Found
Resource Name: download/throttle+stats0.2.tar.gz"

Be well,
Koyan

Hi Gustaf,
you say "on our system we have up to 3.5 mio hits per day,
around 15 dynamic views per second (sustained
avg over an hour, not counting images/css files)."

This is very impressive! You must have some interesting war stories to tell. Can you provide us with some information on your setup, hardware, failover etc? I'm sure many of us would like to hear how such a high performance site is setup.

Regards and many thanks.
Brian

There is a slide-set from the .LRN-Meeting in Heidelberg at

http://nm.wu-wien.ac.at/research/publications/learn-heidelberg-1.pdf
http://nm.wu-wien.ac.at/research/publications/learn-heidelberg-2.pdf

In part 1 towards the end, you see our setup. The load is
distributed to three dual Pentium-4 processor servers,
which are
  (1) reverse proxy pound + aol-server for static requests
  (2) aolserver for dynamic requests
  (3) database server running PostreSQL

Server (1) is handling the load easily, (2) is the
bottleneck of the configuration. These servers use
a common RAID-System. We have a fallback configuration
for each system, but we attempted no automatic switch
through a heart-beat etc, but switching is quite
easy though the reverse proxy. The system is running very
robust, e.g. server 2 has currently an uptime of >400
days.

The performance tweaking was done by Peter Alberer and was
achieved through caching on various places
and downstripping some packages
(e.g. using static portal pages for courses and classes).

All performance figures are from this described configuration.
For the next month we expect between 4 and 5 mio requests
per day, and we hope to switch back to the dynamic portlets
in many places. We would not be able to handle this load
with the current configuration easily; furthermore it is
quite hard to switch the configuration during the term.

Fortunately, we got a hardware grant and bought two
eight-processor xeon systems (2.7GHz MP) for
server (2) and (3). On friday we switched to
this new server; from the SPECint rate, we should
be able to get nearly three times the throughput
of the old system. Currently, we have only 1.2
mio requests (the term ist just starting), it is
to early to have a feeling about the real world
performance of the new servers...

-gustaf

Thanks Gustaf. Again it's very impressive. Congratulations!
Collapse
Posted by Andrew Piskorski on
Indeed, that is quite impressive!

Gustaf, your slides mention that your using AOLserver 4.1. 4.1 is still in alpha, so I'm curious why you're using it rather than 4.0.x - could you tell us more about that? You said your rate limitor is serving dynamic content in AOLserver, so is it because of performance improvements there in 4.1?

Collapse
Posted by Gustaf Neumann on
We simply got 4.1 (from cvs?) and it happens to work nicely.
I have no idea whether there is a performance difference
compared to the 4.0.* versions.  Here is the version we are using:

from ns.h:
*      $Header: /cvsroot/aolserver/aolserver/include/ns.h,v 1.58 2004/03/10 04:45:04 dossy Exp $
*/
#define NS_PATCH_LEVEL          "4.1.0"

The throttle module can throttle all kinds of requests. We
did not want to catch cases, where a user requests
an HTML page including a couple of images, but cases,
where multiple HTML pages are requested frequently within
a time period form a user. In "throttle check", we do simply
  ...
  if {[string match image/* [ns_guesstype $url]]} {
    return 0
  }
  ...
On our site, practically all HTML page requests
are quite costly dynamic requests. Without the
throttling code, we had users who tried to copy
the whole site content with IE or other tools.
Our uses did that in particular, when the server was
quite busy. This eager copying however had an effect
of a DOS attack bringing the server to its knees.
The blocking code simply returns a short error
message to the requestor telling him to slow down...

Does this answer your question?
-gustaf

Eight-way SMP boxes are horrendously expensive. Wouldn't you have been better off buying only one of those for the RDBMS, and using several dual CPU servers to run the AOLservers for the dynamic pages?
There are many aspects to this.

* first of all, we applied for national hardware
grant and it looks as if our impressive numbers
and some vision helped us to get it. However, the grant
was purely a server-hardware grant and we are obliged
to spend the money for this purpose.

* Secondly, we got a very good deal on the machine,
much better than i expected (i can't talk about
prices here). There would have been
a significant price jump between the 2.7 GHz and 3 GHz
processors. Another, much more significant
price jump would have been heading towards itanium
machines. We got the machine without OS, put FC2
on it without any problems and that was it.

Concering just one big iron: there are a couple of
aspects:

  - in our current setup (up to last week),
    the biggest bottleneck
    was the "dynamic" aolserver, followed by the
    database server (with earlier versions of
    postgres, it was the other way around).
    so, using two n-way machines helps us
    immediately without restructuring the
    apps, thinking aboutflushing distributed
    caches, etc. With this setup we are simply
    on the safe side.

  - Many of the dynamic requests depend on the
    content repository and therefore the file
    system. I did not make tests, but i would
    not be surprised to run into problems when
    different machines hammer around in the
    same file-system (e.g. shared via NFS).
    We are frequently thinking about further
    distribution and more redundant backend
    servers (managed by pound), we will go for it
    when necessary.

  - the two 8-way machines can be combined
    into one 16-way SMP machine. So we have the
    option to switch to one big database machine
    in the future.

Alltogether, we are not only worrying about
performance, but also about reliability, robustness
and maintainability. The new machines can be maintained
over the web (rebooted etc), they are highly redundant
(they can have hot swap memory, but we did not go
for that), and they are nicely engineered.

Our system is seen as central infrastructure of a large and
important university. As it is with infrastructures
and utilities, people expect it to work 7x24 (but
we have no personnel for ensuring that). While many
(most?) learning management systems provide mostly print
materials (slides, handouts) in electronic form,
we have mostly interactive materials, providing immediate
feedback. So, the students really prepare for their
exams over the system, they rely on it. If our system
would work unrelyably, many people would be immediately
upset. If this would happen at a bad moment, we will most
likely make it to the newspapers. So, spending more money
in robustness seems worthwhile.

-gustaf

Oops sorry, it is available again from the mentioned place
-gustaf
This is a short follow-up, in case someone is interesting
how we are doing with our new configuration (two 8-proc machines with 2.7GHz Xeon MPs for database and dynamic pages
+ one dual pentium 2.8 GHz server for pound and static pages).

During the last weeks we collected some experience
from our configuration:

- number of currurrent users (with .LRN)
    old system: ~500
    new system: ~1000
  these are actual numbers; concurrent users
  denote users who performed views in a 10 minute window.
  The server handles thousand users with a moderate load.
  Firstly we did run into some problems, since with the
  standard FC2 configuration we were not able to run
  more than 400 concurrent pound
  threads (stack was running out). This means that the
  maximum number of concurrent transmissions was limited
  to that number. We lowered the stack size per thread to
  fix the problem quickly (the default values in FC2
  is quite large).

- sustained rate of page views/sec
    old system: 15
    new system: 33
  These figures are the average over one hour,
  in certain seconds we had values above
  200 (see the snapshot in the link below).
  The number of hits is roughly 4 times this
  value (including images & css-files)

- the average response time for page views is in the
  range between 0.2 and 0.4 seconds (including downloads).

With that load, most processore are still more than 60%
idle. So i believe we could handle much higher load,
maybe even come close to doubling these load figures
(with os- and aol-server configuration tuning).
The difference between a fast system and a
horrendously slow system can be very little: if one
resource runs out, requests take longer and will
immediately pile up at such request rates.

These performance gains correlate highly with
the SPECint2000rate value, which seems quite
a good measure to choose configurations for
the oacs und dotlrn.

we have currently more than 500 dotlearn classes,
more than 16000 users, 5000 users visit the system
per day. We are still quite "lucky" that only
1000 use it at the same time.

The link below is a sample snapshot from live
monitoring using throttle+stats

http://media.wu-wien.ac.at/download/stat-2004-11-23a.htm

-gustaf
PS: To get a better understanding of
system bottlenecks, we are using
now hotsanic which i can certainly recommend.
It produces quite nice charts for various
system figures (memory, tcp-connections, load,
cpu usage, nr of processes, traffic, ....)
for e.g. last hour, last day,
last week, month, year. i added a quick hack
to monitor some figures from the throttle+stats
package as well (response time, view per minute,
concurrent users)

Collapse
29: HotSaNIC (response to 28)
Posted by Andrew Piskorski on
HotSaNIC, hm.