Forum OpenACS Q&A: Performance of the server, how can we help?

This is a question mainly for Mike, but maybe someone else can answer it as well.

It seems that the server is reaching it's limits. I heard that there are volunteers to move OpenACS to a new site.

Can you give us a rundown on what is happening, a timeframe (or at least an order of things happening in the next couple of months) and pointers on how more people in the community could help.

Collapse
Posted by bill kellerman on
as an aside...  maybe it's just me, but i see slowdown in only two places:  confirmation of a forum post and displaying search results.  everything else seems snappy?
Collapse
Posted by Tilmann Singer on
A trigger is fired on forum post which walks up the hierarchy of messages to find the parent and schedule it for the search indexer - it might be inefficient. If you have time investigate it in openacs.org-dev. We definitely need something better for the toolkit forums search (which openacs.org will benefit from of course).
Collapse
Posted by Jonathan Ellis on
submitting a thread notify request is pretty slow too.
Collapse
Posted by Mike Sisk on
As I see it, there are two problems with the site/machine:

1. The machine has a hardware problem of some sort. After a reboot it takes about 5-minutes for the machine to get past the SCSI-bios disk-spin up. It could be a bad SCSI terminator or a sticky disk in the array. Furthermore, when I tried to update the kernel with the newest RPM from Red Hat (this machine is running a Dell-specific version of Red Hat 7.1) it went into a kernel panic and locked up. Rather than debug the problem at the time (since we had just driven down to NYC to fetch the machine from Ben and returned to Boston) I just put the old kernel in place and started the site.

2. There is a performance problem with some code somewhere. This machine is a 1-GHz PIII with 1.5 GB of RAM and 2 18-GB SCSI3 disks on a hardware RAID 1 controller. It doesn't get enough traffic to bog it down like it's been. We're running much busier sites on  less hardware.

I've provided a new machine to move the site to while we fix the current machine and get it updated. A team of volunteers lead by Dave Bauer is working to move the site, but I'm afraid I've probably been the bottleneck -- I've been busy and haven't had as much time to work on this as I'd like. But it's getting there. Dave will have a better idea of the current status.

Once we get the site moved to a new machine I'll be upgrade the old one with a second CPU and new, bigger disks and debug whatever hardware problems the machine might be having.

Collapse
Posted by Mike Sisk on
When I posted the above message the load on the machine spiked from 1.21 to 1.61 and there were 111 connections to port 80, about half of which looked like they belonged to a search engine. As I'm looking at it now, the load is stuck above 1.00 and there are 5 zombie processes. It's using 722 MB of memory.

The machine averages just over 256 Mbit/sec of bandwidth per month -- about 70 GB, but that includes the nightly backups, too. It's busy, but not excessively so.

Collapse
Posted by Randy O'Meara on
I am grateful to whomever fixed the forums submission problem. I'm now seeing 2-3 second response when submitting a post. Compared to well over a minute, it's lightning-fast!
Collapse
Posted by Mark Aufflick on
ditto - it's like magic :)
Collapse
Posted by Don Baccus on
Does someone want to explain what was done to speed this up?  Was it server-specific, specific to the openacs.org code, or something generic in the PG version of forums?
Collapse
Posted by Dave Bauer on
forums_message__root_message_id is defined iscachable on newer versions of Forums. The version on OpenACS.org was not. I replaced the function with the newer one.

Using the function to calculate the root message is approximately 30% slower than doing a join on the forums messages table and using tree_sortkey to calculate the root message. Its about 1000% faster using the newer version of the function.

Collapse
Posted by Jeff Davis on
I think it was the secret server gnomes since I didn't do
it and I don't think Dave did either (and I don't see
anyone else logged in long enough to really fix anything).
Maybe it was some sort of redhat up2date thing that fixed it?
Collapse
Posted by Jeff Davis on
Duh. read your mail before you post. Thanks Dave!
Collapse
Posted by Don Baccus on
Geez ... sometimes the simplist things help so much :)  Good catch, Dave.  A good audit project for a bored hacker would be to go through PL/pgSQL functions making sure those that can be declared isstrict and iscacheable are so declared.

More granularity's been added, too, I think in PG 7.3.  Another reason to revisit PL/pgSQL functions.

I did an audit of this sort on acs-kernel when isstrict and iscacheable were first implemented in PG, there were some noticable speed improvements as a result.

Collapse
Posted by Lars Pind on
How about the new object creation slowness experienced and discussed previously. Could that be related?

Do people know what I'm talking about, even?

/Lars

Collapse
Posted by Don Baccus on
Yeah, I know what you're talking about but apparently the root message id calculation was far, far worse.  Because it's only taking a couple of seconds to insert a message now vs. a minute or two.
Collapse
Posted by Randy O'Meara on
Thank you, Dave.

Thanks for taking the time and responsibility to fix this problem. I don't think I'm exagerating when I say that you just helped *every* user of this site and probably will cause uncounted new members to *stay* once they find us.

And, you didn't even stand up and take a bow...

Collapse
Posted by Dave Bauer on
Thanks goes out to Dirk Gomez for helping me analyze the query plan and Jeff Davis for thinking of checking if the function was defined isstrict, iscachable. Sorry I forgot you guys when  I responded.
Collapse
Posted by Randy O'Meara on
OK. I've added Jeff and Dirk to my official list of HEROS.
Collapse
Posted by Jun Yamog on
Thanks to Dave, Jeff and Dirk.