Forum OpenACS Q&A: Open ACS handling heavy traffic

Collapse
Posted by roland dunn on
Wonder if people could possibly help. We will hopefully be building a
website for a client, for a promotional campaign, and will therefore
be backed by fairly heavy media coverage. Consequently we estimate
that at its peak it will be experiencing 90 simultaneous unique
users/minute.

We will have control over platform and hosting, so naturally would
want to go with a Linux platform. Database probably Oracle as we've
used that before. We've used ACS TCL 4.2 for previous websites, but
none at that level of performance. I was wondering whether anyone has
implemented a production site with this level of traffic, and how
either Open ACS or ACS TCL performed and scaled?

The main aim of this email is to find out whether in practice either
of the ACS implementations have performed and scaled under this level
of traffic.

Detailing which Open ACS or ACS TCL used (e.g. 3 or 4 in either case)
would be most helpful. Any advice re: machine configuration would be
most welcome. We'd plan to use one small webserver and a large Oracle
DB together with failover machines, but any thoughts on this would be
most helpful.

Collapse
Posted by Jonathan Ellis on
Users per minute doesn't help much unless you say how many page views each user is expected to generate and how individualized each page is. 90 page views per minute (1.5 views/s) isn't a whole lot even with moderately heavy db queries.

You might find the ASJ article on scaling helpful. They don't say what ACS version it deals with, although given the problems with aD 4.x permissioning that are fixed in OpenACS (see also this thread) I doubt it was 4.x.

Collapse
Posted by Stephen van Egmond on
Look into Squid, and think very carefully about what page elements can be cached.  If the home page and top 50% of the trafficked pages can be cached, even if only for five minutes, that will save you a huge amount of load.
Collapse
Posted by Don Baccus on
I'm working on scalability issues with permissions as we speak, and I think it is safe to say that OpenACS 4 will outperform ACS 4 Classic in this regard.  The LAUSD (Jon Griffin's employer) has already provided speed-ups for acs_permission.permission_p itself, and I'm poking around into the use of acs_party_privilege_map which is very slow in PostgreSQL and apparently not all that swift in Oracle, as well.

So this will remove one bottleneck.  What other bottlenecks will we uncover?  Hard to say.  As Jonathan mentioned, more information on the actual number of pages to be served is needed in order to give you feedback, and as was mentioned later caching can help a lot, too.  You can cache directly in AOLserver and perhaps have more fine-grained control over when to expunge cached pages than through the use of Squid or similar tools (only your code knows when a dynamic page changes, after all).

One thing we are working on is the provision of automated testing tools, and this includes load testing using scripts that "know" about the structure and perhaps usage profiles of a site.  This will give users the ability to drive development sites with input that more accurately reflects reality (by posting to bboards, etc) than is possible with a simpler tool than (say) apache bench.

I realize none of this answers your question directly.  In part that's because I think the scalability issue for [Open]ACS 4 has yet to be answered - I'm outlining the steps we're taking to boost performance and to enhance our ability to test performance (and correctness) later.

One point of reference for OpenACS 3.2 ... this site right here withstood a slashdotting when running PostgreSQL 6.5 on a single dual-processor PII 400 box.

Collapse
Posted by David Walker on
Does anyone have any data on how much traffic a slashdotting might
produce?

I'd like bandwidth numbers even though I know those will vary
greatly depending on number of images,etc and type of content
provided and how well the site draws in users.  Maybe normal and
slashdotted bandwidth numbers would be good.

Collapse
Posted by Todd Gillespie on
David -- the rule of thumb I have heard is that a slashdotting triples your hits for a couple of days, but in huge misshapen bursts than any kind of well-formed load. There's some sample bias here, in that generally a site featured in its own article on slashdot usually has substantial traffic already, a trend that has become more common as slashdot has evolved in the direction of a big circle jerk with its usual linked sites. A smaller site is harder to find numbers for, but it is increasingly unlikely that it will be slashdotted until said site has grown substantially and has been putting out neat stuff for some time...

As an aside, I think 2 years ago a prof somewhere contacted a number of /.ed sites and graphed the common pattern of hits following the /. post -- and when his paper was /.ed, his site matched his numbers. If you grovel around in the /. search pages you might find it.

Collapse
Posted by Rodger Donaldson on
Slashdotting?  I saw some detail when I got ./-ed.  The traffic wasn't as great as you'd think - 3-5 pages per second requested.

The main bottleneck on my 486-100 Apache/mod_perl server (backed by a DB on a P166) was an untuned Linux 2.0 kernel.  Switching to 2.2 with a bit of tuning allowed the system to cope with the load admirably (it collapsed under 2.0).  After an hour or so of proving it could handle the load adequately, I pulled the web server down, since I pay per meg for traffic.  Basically, unless you're doing something complex on your pages, or have made the mistake of using a high-priced application server, grunt isn't likely to be a problem.

Lesson: find out how to tune the Linux kernel - there are a number of articles on tuning the TCP/IP parameters to speed up closing sessions to avoid your system dying from over-allocating resources.  The default settings for most *ix IP stacks are based on numbers that were fine for 300bps links in 1982, but aren't realistic for modern infrastructure.

Other lesson: Bandwidth costs.  About $200 for that episode...

The biggest problem, though, is that a /. mention appears to attract not merely hoards of visitors, but hoards of script kiddies.  Not only were there acres of DOS attacks against my server, but upstream network infrastructure and servers on related subnets all saw a huge volume of attacks.

As far as sizing for 1.5 pages per second, think of it this way:

Each page must be ready in 0.6 seconds.  That means if your average page generation time is 1 second, you're in trouble.  If it's 0.1 second, you're in easy street.  Log some page generation times (request arriving in AOLServer, page generation finishing).  That should give you an idea of whether you're on target in terms of server grunt.

Do this on your test systems.  You may be surprised - unless you're doing some horribly complex (or poorly written) things, even the lowest end Intel servers money can buy today should be more than adequate for your needs.

Consider your bandwidth requirements.  Multiply out your page sizes (including graphics and the like) by the number of requests, and add 20% or so for protocol overhead, and you've got an idea if your bandwidth is adequate.

Consider your audience probably have low-bandwidth modems.  That means delivery can be an issue, because most testing of pages is conducted on high speed networks.  Say you're delivering 100KB pages  over a 100Mb link to a client on a 10Mb link; you're taking a fraction of a second to get the finished page (one thread) and graphics (a thread each) to the client.

OTOH, when delivering to a client on a 56Kb modem, that 100KB takes a couple of minutes to trickle down, tying up AOLServer threads (and associated memory, especially) for all that time.

One approach to the problem is to have a big, complex web server (AOLServer with all the addons) built for serving dynamic content, which is often the smaller part of the page size, and a stripped down http daemon (AOLServer with no add-ons, thttpd) for serving large, static files, thereby reducing the size of threads/processes which are doing nothing but trickling data to a modem (and don't need DB drivers and the like, bloating the thread/process).

Collapse
Posted by Jerry Asher on
You can probably consider 100 requests per second to be an upper limit. That's assuming that everyone that comes to slashdot decides to hit your link (it's that faked shot of Bill and the goatsex guy you photoshopped up.)

100 requests per second is what slashdot appears to serve at its best. This was mentioned by joe random user in a story there yesterday and it was also what I estimated back in August in a message to Roberto

Hi Roberto,

Thought you might be interested...

According to CommanderTaco, http://slashdot.org/~CmdrTaco/journal/ The new slashdot:

"We topped out over 55 pages/second for several minutes (Live Slashdot does only around half that much)."

Also we know that slashdot was: from http://slashdot.org/articles/00/05/18/1427203.shtml

  • 5 load balanced Web servers dedicated to pages
  • 3 load balanced Web servers dedicated to images
  • 1 SQL server
  • 1 NFS Server
So the old slashdot needed 5 webservers * 27.5 pages/second = 137.5 pages per second.

So those are 137.5 big pages (the homepage is right now about 45K). Outrageous BOTEC: all pages are 45K so 137.5 pages/second or about 6Mbytes/second throughput. Big, but not too big. Apache Bench tells me that I can do 98 45Kbyte pages per second on one machine (PIII 450) with AOLserver (system claiming to be 38% idle). What do they need 5 web servers for?

So for every user, how many db transactions are there *on average* to present a page? I think at most one and it's a read. Assuming every user is logged in, slashcode needs to get the preferences (what kind of content boxes (slashbox?) the user wants) and then find content to fill those boxes. But in general filling a content box should be a cached operation that changes only on the rare writes. I.e, I look up user id from cookie (3872), determine that you're the real bruce perens, determine you want these 20 boxes, and then look in the cache to see what's in those boxes and push the bytes out. Even the ordering and other various thresholds can probably be cached. Sort this box in highest order first, showing all messages, now cache it again with order by date, showing messages scored 2 or higher. You need some way to post a comment, commit to the db, and flush/fix up the cache.

I do think it's interesting to say that OpenACS is not a good fit for a Slashdot site. OpenACS as it is today is better suited to sites that need many more than one db transaction per page, and where there are significant chunks of writes, and not just reads. (In that sense, MySQL may not be such a bad fit for Slashdot.) Another way of saying this is that OpenACS is for "communities" of collaborating peers, not just mobs of lurkers.

Anyway, I don't know where I'm going with this....

And I still don't.
Collapse
Posted by Don Baccus on
Remember that MySQL doesn't implement subselects and a variety of other things that simplify building One Really Big Query to build page contents (vs. A Bunch Of Little Queries).  MySQL is also reputed to suck  on joins of several tables.

So Slashdot may need to process a lot more MySQL queries than Jerry has in mind, each with its parsing and optimizing overhead.  I don't know this for sure, but it is something to keep in mind.

Slashdot's pages are quite complex, with the ads that are served, and the various summary boxes (heavily cached I'm sure) that run in the margins.

As to whether or not OpenACS is reasonable for a slashdot volume site, with all that hardware a highly customized version of OpenACS 3.2.5 could, I'm sure.  "highly customized" would include having the AOLserver front-ends cache the front page and Q&A threads for several seconds (or more?) just as slashdot does now.  The toolkit isn't set up to do this out-of-the-box but ns_cache and util_memoize would make it quite simple to do so.

[Open]ACS 4 is another matter altogether with the request processor and more complex permissions model.  Thus our need to do more performance testing once we feel like we've made sufficient progress in our speed-up efforts.  Until we see some real numbers I'm not crawlin' out on no limb, uh-uh.

One interesting datapoint I do have: I can create about four bboard posts a second on my PIII 500 with its 5400 RPM IDE drive (the one that causes a "kernel panic" in DMA mode, therefore it is running in high-overhead PIO mode).  A 15K SCSI disk would quite easily double this I think, given the ugly (but free and big for its age) disk.  A fast dual-processor system would probably double throughput again, getting us up to 16 bboard posts inserted a second.  It's not hard to imagine the combination getting above 20 bboard posts inserted a second, either.  This is OpenACS 4.  This hints at reasonable performance - but it is no more than a hint and I'm well aware of it...

Collapse
Posted by Don Baccus on
Oh, a bit more on the above ... I've got query logging turned on and it and all PG files (including the WAL (REDO) log) are all on the same disk.  On the other hand this is all being done in a huge transaction so the WAL's not being fsynch'd after each bboard insert, so I may be overestimating the performance in a more general context.  However the WAL blocks are being written and the PG datafiles are written at each checkpoint and Linux itself writes the file cache every so often so there really is quite a bit of disk I/O going on.

(I'm using the present tense because I'm busy importing the openacs.org forums into my OpenACS 4 test platform so I can do some performance testing on the speed-up stuff I've been working on recently).

Also regarding Slashdotting.  AFAIK Ben did not track the slashdot volume.  However the trigger was a reference to the infamous "Why not MySQL?" paper and resulting comments, so undoubtably a lot of Slashdotters paid a visit.  After all, they're mostly rabid MySQL fans and get deeply offended whenever anyone criticized their favorite Open Source database (ignoring the fact that it wasn't Open Source at the time, as we don't want to fan the flames, do we? :)

Ben used it as a work machine at the time and did report that load barely topped "1" on the modest dual processor system.

Collapse
Posted by Jerry Asher on
Yes, that was most of my point. Not an OpenACS vs slash or pg vs mysql post, but mainly that at first glance, /. appears to be a site with only moderate demands.  As I said, I suspect the vast majority of the db accesses are reads, not writes, and I doubt they accumulate four new posts each second.  To me, that's a problem domain that should be reasonably well solved with cacheing.

I come not to bury slashdot, but to praise slashdot and learn from slashdot -- I am confident they have better sysadmins and engineers than I, but I am curious what takes this first glance "moderate demand" site and turns it into a site that requires all the hardware that they have thrown at it, and a site that appears firmly mired in one nines reliability :(.

Is it really just that MySQL sucks harder than the vacuum 150 miles above us all?  Does it seem reasonable that a site needs that much hardware for a "mere" 100 requests per second?  What else is going on?  Is it Apache?  Wrong hardware choices?  If my assumptions are wrong, that's cool too.  Tell me where I am off, and I can learn something more about web development.

Collapse
Posted by Don Baccus on
Well, I just downloaded the code for the heck of it.  It's got a very simple datamodel (about 450 lines in the PostgreSQL case).  They supply PG, Oracle and MySQL datamodels.

They abstract out SQL differences by wrapping every query in a function, with all functions for each RDBMS being placed in the directory DB/[your RDBMS here].  There aren't very many queries in the system, either.  And, as I suspected, they're all very simple.  You see functions that call other functions to get values to use in the query, leading to the "execute lots of little queries" model of execution.  Also they don't seem to be wrapping inserts, updates and deletes in transactions even when using PG or Oracle though I can't be sure (in other words, have spent all of 15 minutes reading the code).

They template everything with templates drug from the database - of course this, like everything else, is heavily cached.

OK, time for someone else to spend 15 minutes reading the code in order to add to our understanding :)

Collapse
Posted by Patrick Giagnocavo on
David asks about Slashdotting... one of the sites on my servers got Slashdotted, and did not break a sweat in terms of CPU usage; however, this was a static HTML site, with a few large files to download, which is not really typical of what an OpenACS site that got /.'ed would see.

My only mistake was that the parameter that covered the number of pages served before one httpd would exit and another start (in case of memory leakage) was too low.  It had been set for 30, but with hits coming in at 30 hits/second a new httpd child process was started every second.

I saw very high traffic, 2 GB of data go out in just a few hours.

Collapse
Posted by Mat Kovach on
Several years ago, when the SlashCode 0.2 code came out, a friend and I basically rewrote it in C, with Postgres (6.5 at the time) over a weekend and ran if for about a month.  It was my first real experience with MySQL.  I haven't looked at the code lately but if that was the base, there were numerous areas where the SQL could be improved.  I still think Slashdot threw hardware at a performance problem, but I don't know enough about.

I've been slashdoted and maxed out at about 6 hits/second over about 3 hours to the specific page that was mentioned.  The increase on the rest of the was not that substational.  We basically ended up creating a geocities site for the images, but the P133 with 128MB of memory and Linux 2.2 handled it well.  The only real problem was that the Apache/Jserv (at the time) combo got a little slow getting the pages going (all pages were .jsp's).  We ended up having to tune the kernel a bit and do a quick reboot.

I did have an OpenACS 3.2.5 development site running and (unknown to me) it was included in a E! documentary and received 5 hits/sec for 2 hours without a problem while the show was on TV.  I did move the site back to the pre-OpenACS code during the replays, but that was mainly because the site wasn't finished.

Hope that helps you a bit.

Collapse
Posted by MaineBob OConnor on

What is the best way to time my dynamic tcl pages so I can clean up any time consuming selects and other code?

I could do an
ns_log Notice "Start time..."
at the beginning of the page and again just before
ns_return 200 text/html $page_content

Doing a
set now [database_to_tcl_string $db "select [db_sysdate] from dual"]
May in itself add a bit of time to the test results...so,

Suggestions and code snippets would be helpful.

TIA
-Bob

Collapse
Posted by Daryl Biberdorf on

A big advantage of Oracle (and perhaps Postgres, but I'm not qualified to speak about it) over MySQL is that it scales better. Oracle is huge and somewhat bloated, but it is really intended to hold up even when stressed out. I could be wrong on this, but it doesn't look like MySQL even has a library cache (see here and here), which databases like Oracle use to cache parsed queries and their execution plans. A typical web application is going to generate many, many of the same queries (with different values plugged into the WHERE clause) over and over again. An application using Oracle bind variables (and shame on you if you aren't) will mean that the overhead associated with parsing queries is simply not there. Tweak the Oracle shared pool size so that it'll hold all the needed queries in cache (and use bind variables), and parsing overhead goes away. The queries will only be parsed the first time they're executed.

For applications without much load, the raw speed of its "lightweightness" makes MySQL look like a screamer compared to big, bloated Oracle. But increase the load, and Oracle simply ramps up while MySQL struggles under its own lack of sophistication. I believe similar results were shown by the big PostgreSQL versus MySQL web benchmark that was done a year or two ago. Under small loads, MySQL won. As traffic increased, PostgreSQL coped much better. Based on that, it looks like Slashdot needs a lot of hardware to support software that's not as fancy as it should be.

Collapse
Posted by Daryl Biberdorf on
Oh yeah...I'll retract my statements about MySQL's not having a library cache if someone can point me to documents to the contrary.
Collapse
Posted by Don Baccus on
Unfortunately Postgres doesn't do query caching nor does it have bind variables.  However, there has been some work in this area and I think it will get incorporated at some point.

Of course PL/pgSQL only generates query plans the first time a function is run within a particular backend's context, and with much of OpenACS 4's logic tied up in PL/[pg]SQL we do get the benefit of that.

Caching query plans doesn't help when the basic queries suck, though (says Don as he's slowly making progress getting bboard to scale).

Collapse
Posted by Jonathan Ellis on
In my experience, you have to write much hairier queries than the ACS uses for the parse time to be appreciable, in any case -- whether in postgres or oracle.  Or sybase, for that matter.  I'm sure there's applications where this is not the case but the performance advantage is just negligable here.
Collapse
Posted by Stephen van Egmond on
In our experience, the biggest threat to Postgres's scalability is that it uses one process per connection.  It is a big problem if you're using Apache 1.x, and not a big problem if you're using Apache 2.x or AOLServer.

Put simply, each connection gets a process.  So if you have 50 connections hammering away, you have 50 context switches just waiting for attention.  Context switches really, really suck up the time.  In the case of Apache 1.x, each forked process takes up a connection.  So 5 servers times 30 processes equals 150 processes on the db server.  Aaagh.

AOLserver and (supposedly) Apache 2 deal with this with connection pooling. With pgsql it is a humongous deal.

Oracle, MySQL and Sybase don't fork processes for connections.  Sybase in particular uses select() to accept packets all in one thread.  Scary, but very effective.

Collapse
Posted by Don Baccus on
Yes, it is a huge deal with PG.  Setting up and tearing down an Oracle connection isn't all that cheap, either as it logs you into the system before it will execute your single query (if you're not managing persistent connections).

Server-managed pooled persistent database connections are something  AOLserver's had right since 1995...

Collapse
Posted by Daryl Biberdorf on

Jonathan, I respectfully disagree that the parse time is negligible, if you're really concerned about scalability. Sure, 1/10 of a second is nothing when doing interactive queries. But if hundreds of users are descending on your site, you need all the help you can get. I think Tom Kyte's example (given here) does a nice job of showing this. I took his example and ran it on our own development database server:

SQL> -- the wrong way (no bind variables)
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.20
SQL> set serveroutput on size 1000000
SQL> declare
  2      type rc is ref cursor;
  3      l_rc rc;
  4      l_dummy all_objects.object_name%type;
  5      l_start number default dbms_utility.get_time;
  6  begin
  7      for i in 1 .. 1000
  8      loop
  9          open l_rc for
 10          'select object_name
 11             from all_objects
 12            where object_id = ' || i;
 13          fetch l_rc into l_dummy;
 14          close l_rc;
 15      end loop;
 16      dbms_output.put_line
 17      ( round( (dbms_utility.get_time-l_start)/100, 2 ) ||
 18        ' seconds...' );
 19  end;
 20  /
11 seconds...

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.27
SQL> -- the right way (bind variables)
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.50
SQL> set serveroutput on size 1000000
SQL> declare
  2      type rc is ref cursor;
  3      l_rc rc;
  4      l_dummy all_objects.object_name%type;
  5      l_start number default dbms_utility.get_time;
  6  begin
  7      for i in 1 .. 1000
  8      loop
  9          open l_rc for
 10          'select object_name
 11             from all_objects
 12            where object_id = :x'
 13          using i;
 14          fetch l_rc into l_dummy;
 15          close l_rc;
 16      end loop;
 17      dbms_output.put_line
 18      ( round( (dbms_utility.get_time-l_start)/100, 2 ) ||
 19        ' seconds...' );
 20  end;
 21  /
1.06 seconds...

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.32

With PostgreSQL, you'll have to use techniques other than taking advantage of a library cache to get your speed. But if you're using Oracle, getting rid of those parse operations WILL give you better, measurable performance.

Collapse
Posted by Stephen . on
Oracle, MySQL and Sybase don't fork processes for connections.
Oracle forks a process to handle a connection by default, unless you configure it with the MTS option. MySQL starts a new thread for each connection, which under Linux has the same overhead as starting a new process.

Here's what Oracle has to say about bind variables, hard/soft parsing etc.

The following example shows the results of some tests on a simple OLTP application:
Test                         #Users Supported 

No Parsing all statements           270 

Soft Parsing all statements         150

Hard Parsing all statements          60

Re-Connecting for each Transaction   30
http://download-west.oracle.com/otndoc/oracle9i/901_doc/server.901/a87504/ch1.htm#25635
Collapse
Posted by Don Baccus on
To add more fuel to the fire the time required for the PG optimizer to create a plan is not linear in the number of tables being joined but rather linear in the number of ways those tables can be joined (i.e. via  hash, merge, or nested loop joins using sequential or index scans, in differing order using differing keys, etc etc).

The parser itself should be fast ...

Experiments run by the guy who started work on the caching of query plans was seeing about a 30% improvement on simple queries.  On more complex queries parsing/optimization overhead's going ot depend a lot on the actual query.  If you join a bunch of tables but only return a few rows creating the plan will take more time than executing it, and caching the plan will be a big win.  On the other hand if you're doing an cartesian join of ten huge tables the optimizer will have few choices and run quickly, while the output will be so huge you'll probably run out of swap space before it is completed :)

As far as Oracle goes, I'm sure everyone here is aware of the fact that [Open]ACS 4 uses bind variables throughout.  AFAIK though, no one has done any measuring on an active site to see how much memory Oracle should be allowed to use for caching in order to see the benefits without eating too mucn into your shared buffer space used to cache database blocks.

Collapse
Posted by Stephen . on
Someone asked me in email "What's the no-parse option?"

The Oracle driver implements bind variables, it does not however parse queries the first time it comes accross them, retrieve a statement handle, and use that handle with bind variables for subsequent invocations. Oracle's data above suggests this will be almost twice as fast as bind variables alone, depending on the query of course.

Looking through the Postgres docs I thought it might be possible to cheat bind variable, pre-parsed statement support... Queries in PL/PGSQL *are* cached (mentioned above), and the client side C API supports executing a procedure with arguments (as apposed to dumping the text to execute a procedure into a select statement). Unfortunately procedures in Postgres cannot return relations (yet).

Collapse
Posted by Roberto Mello on

Functions in PostgreSQL 7.2 (in 3rd beta right now) can return a cursor, which is equivalent to returning a relation. That should make working with Oracle <->PostgreSQL a lot easier.

7.2 also supports using the Oracle-style %TYPE in the declaration of parameters. Another win with working with Oracle.

Tom Lane did a nice job clarifying some obscure points of PL/pgSQL in the (now much better formatted) 7.2 documentation for PG's procedural languages: http://candle.pha.pa.us/main/writings/pgsql/sgml/plpgsql.html

Collapse
Posted by roland dunn on
Well thanks to everyone for all their responses, very interesting and helpful. I wonder if I could pop up a few more questions?

I'm thinking of using a couple of AOL servers running something OpenACS 3.2.5, one Oracle DB server, and one mirror/failover/clustered backup Oracle DB server.

A few questions then:

In https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=0003L3&topic_id=11&topic=OpenACS Jerry Asher writes "I plan to address the five nines with: load balancers and clustered aolservers (thanks for the BigIP recommendation)". Does anyone know more about this BigIP recommendation, or how to load balance a couple of AOL Servers?

I'm guessing that one should use some load balancing device something like a Cisco CSS 11050 Content Services Switch. What I'm also not sure of is how to deal with cookies and ACS - presumably a user who gets a cookie from webserver A has to be redirected back to webserver A rather than webserver B on their return. Am I right? Is this straightforward with a device lke the Cisco one above? How do people recommend such a thing is dealt with?

Re: the Oracle DB servers - we'd like to set them up so that if one of the Oracle DB's dies the other kicks in instantly. I've been told that the best way to do this is Oracle Clustering - anyone any thoughts on this? Is such a thing possible using Oracle Enterprise 8.1.7 on Linux? I'd also been told that Oracle Clustering isn't possible on Linux. Anyone any thoughts?

Collapse
Posted by Stephen van Egmond on
What I'm also not sure of is how to deal with cookies and ACS - presumably a user who gets a cookie from webserver A has to be redirected back to webserver A rather than webserver B on their return. Am I right? Is this straightforward with a device lke the Cisco one above? How do people recommend such a thing is dealt with?
I'm the one who suggested Big/IP. I see them on bay for several thousand $US, just so you know what you're up against.

They do connection-based load balancing. The short story is that you give Big/IP's external interface the IP of your www. URL. You configure the Big/IP to do load balancing among a number of servers. It will rotate each incoming connection, following some algorithm or another for 'rotation', among the servers you've identified.

You can do this per-port, per-IP. So say you want to route port 80 to machines A, B, C, and port 443 to beefier machines D, E, F, and G. No problem. Need to ssh into your farm? Route port 22 somewhere else.

etc. You get the idea, I hope.

There are free + libre solutions for Linux that do the same thing. But they're also huge hassles. If you look at the Big/IP hardware and add the time you'd spend configuring a Linux load balancing and failover system, big/ip ends up more or less worth it. Particularly if you want to add failover features (by buying a second big/ip) then it gets even more worth it.

To directly answer your question, this is not an issue with cookies. The browsers all think they're connecting to the same machine. Unless, of course, you're keeping client state on the web server (tied with cookies). In that case, you have a severe architectural problem and you'll have to find another form of load balancing. My bank appears to have this problem, since every time I do web banking, I get sent to webbankingXX.tdaccess.com for random values of XX.

Collapse
Posted by Michel Henry de Generet on
I saw one answer telling us that multi processes are slower than threads because of context switching. I have to say it is not true because, at least on Solaris but surely on other UNIX systems, threads and processes are considered the same from the scheduler point of view. The real difference is that threads share all the memory and they are faster to start. <p>What made the big difference is the synchonous mecanism used: mutex, semaphore, critical region, swap instruction available on some processors (very fast on i86 or UltraSparc)<p>
We should not forget that the thread pattern is usually easier to program and should scale better on multiprocessors systems.
Collapse
Posted by Michael Feldstein on
Roland, if you're going to use OpenACS 3.2.5, you have to use
PostgreSQL. OpenACS only supports Oracle from version 4.x
forward. If you want to use Oracle with some flavor of ACS 3.x, it
would have to be ACS Classic.
Collapse
Posted by roland dunn on
Thanks for the reply Michael.

OK - so what's the best, most stable and efficient version of ACS to use at the moment with Oracle? Of either of OpenACS and ACS Classic.

Collapse
Posted by Ciaran De Buitlear on
Funnily enough a client has just asked me "will a Cisco CSS 11050
Content Services Switch work with the new web site".  We're bulding the site with openacs 4.5.  Has anyone actually used this gizmo with Aolserver, the Open ACS or the ACS?  If so were there any problems?  In any case I guess it will be of limited use in our setup which is a booking system - low on graphics, high on database load.