Forum .LRN Q&A: Appreciate help with dotLRN performance

1: Appreciate help with dotLRN performance

Posted by Shankar Venkatagiri on 10/13/03 05:15 PM

I have been running a courseat the Indian Institute of Management, Bangalore, with ample discussion forum postings using dotLRN. The configuration is as follows:

AMD Opteron 1.4 MHz
120 GB hard disk space
1 GB high speed RAM
SuSE Linux Enterprise Edition
PostgreSQL database

After a month or so, I conducted a dotLRN-based survey. Suddenly, I have noticed a drastic slowdown in performance. Looking up the top processes on the server, it's postgres and nsd that seems to having dotLRN for lunch. Could this be due to memory leaks? Any clues?

Shankar

2: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Jun Yamog on 10/14/03 05:12 AM

Have you tried to vacuum analyze postgres?

3: Re: Appreciate help with dotLRN performance (response to 2)

Posted by Shankar Venkatagiri on 10/16/03 05:27 PM

Thanks for the pointer. I went ahead and vacuumed the database (vacuumdb -a -f -v). I didn't see any progress even after this. More specifically, dotLRN drags when I try to load the Class Home page, which is a set of portlets. Any help?

Shankar

4: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Rocael Hernández Rizzardini on 10/16/03 06:28 PM

mmm... this is a problem related to survey, calling some psql function in the where clause causes it to not use the indexes, basicly unscalable, Dave Bauer fixed it in a project, but not sure if he finally commited ....
any comments Dave?

5: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Dave Bauer on 10/16/03 06:40 PM

Yes, the pl/sql functions in the where clause of the queries has been removed on HEAD/5.0

6: Re: Appreciate help with dotLRN performance (response to 3)

Posted by Roberto Mello on 10/16/03 06:55 PM

You forgot to analyze the database. That's the -z flag for vacuumdb. The -z flag is more important than the -f (full) flag for performance. I usually vacuum analyze my databases several times a day, but only vacuum full once a day, depending on DML operations of the database.

-Roberto

7: Re: Appreciate help with dotLRN performance (response to 6)

Posted by Shankar Venkatagiri on 10/17/03 06:25 PM

Thanks for the tip. I did go ahead and analyze the db. Not sure I understand all of the output, but will hve someone here look at it.

What I am positive about is that when I load the Class Home (set of portlets) the nsd processes take up a huge chunk of memory running whatever script they run. Apologies for the ignorance here:

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND

1748 shikshan 16 0 62528 61M 2336 S 2.9 6.1 0:13 nsd
1747 shikshan 15 0 62528 61M 2336 S 0.5 6.1 0:14 nsd

Also, the cache goes up significantly when this happens. Any help will be welcomed.

Shankar

8: Re: Appreciate help with dotLRN performance (response to 5)

Posted by Shankar Venkatagiri on 10/22/03 11:31 AM

Hi Dave:

Can you please suggest me an easy way to update this query? It should do us a world of good.

Shankar

9: Re: Appreciate help with dotLRN performance (response to 7)

Posted by Andrew Piskorski on 10/24/03 06:01 PM

A resident set size of 61 MB for your AOLserver is a "huge chunk of memory"? I don't think so. In your top output above, note that nsd is only taking 6% of your memory. That's not large, that's trivially small.

10: Re: Appreciate help with dotLRN performance (response to 9)

Posted by Shankar Venkatagiri on 10/27/03 07:42 AM

Thanks for the clarification. What I notice is that these processes don't "quit". Here's a sampler:

shikshan 1789 0.6 4.7 54276 48848 ? S 11:36 0:12 [nsd]
shikshan 1790 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]
shikshan 1791 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]
shikshan 1792 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]
shikshan 1797 0.0 4.7 54276 48848 ? S 11:36 0:01 [nsd]
shikshan 1798 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]
shikshan 1799 0.0 4.7 54276 48848 ? S 11:36 0:01 [nsd]
shikshan 1800 0.0 4.7 54276 48848 ? S 11:36 0:01 [nsd]
shikshan 1801 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]
shikshan 1802 0.0 4.7 54276 48848 ? S 11:36 0:00 [nsd]

Any clues? Also, does using Apache instead of AOLServer improve my situation?

Thanks in advance -
Shankar

11: Re: Appreciate help with dotLRN performance (response to 10)

Posted by Jeff Davis on 10/27/03 08:17 AM

AOLserver is multithreaded so what you are seeing is multiple threads in one process, not processes that fail to exit.

Using apache might improve your situation immensely but I doubt it will do so if you intend to run OpenACS.

12: Re: Appreciate help with dotLRN performance (response to 11)

Posted by Shankar Venkatagiri on 10/27/03 10:15 AM

Thanks for the pointer, Jeff. I reported the server's response to ps almost 10 minutes after my last interaction with dotLRN. The same processes linger on even now, five hours after I last posted the previous message. Could this indicate un-exiting processes?

I will go ahead and test dotLRN out with apache. I do not, however, seem to understand the distinction between OpenACS and dotLRN. My bad!

Shankar

13: Re: Appreciate help with dotLRN performance (response to 12)

Posted by Jeff Davis on 10/27/03 11:12 AM

Neither dotLRN nor OpenACS will work under apache (well, you might be able to fight with mod_aolserver for a few months and get it to run acceptibly but I certainly would not recommend it). I don't really think of dotLRN as being seperate from OpenACS (rather it is a particular install of OpenACS).

You also don't seem to understand the difference between a thread and a process. AOLServer is multithreaded, it creates threads within the server process to handle requests and typically those threads do not go away until the server process exits. ps on linux has the annoying -- not sure if you would call it a feature or a bug -- that it displays threads like they are processes, these don't really take any extra memory than is already taken by the server (note how they are all listed as being the same size and were all created at the same time -- thats because its all just the same server process).

14: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Shankar Venkatagiri on 10/27/03 01:04 PM

Thanks for the clarification. Looks like the problem is elsewhere.

Shankar

15: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Roberto Mello on 10/27/03 03:27 PM

Using the tree-view option of ps will help. It'll show the threads of the main process. Try "ps fax":

  922 ?        S      0:02 /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolserver/lbn.
  923 ?        S      0:01  \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolserver/
  924 ?        S      0:00      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser
  925 ?        S      0:11      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser
  933 ?        S      0:06      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser
 1311 ?        S      0:00      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser
28309 ?        S      0:01      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser
28311 ?        S      0:01      \_ /usr/local/lib/aolserver/bin/nsd -u nsadmin -g www-data -t /usr/local/stow/aolserver_local/etc/aolser...

-Roberto

16: Re: Appreciate help with dotLRN performance (response to 1)

Posted by Shankar Venkatagiri on 10/27/03 08:12 PM

I found the sucker - it's the survey indeed. I deleted it after copying its contents (from the survey) and I have sweet dotLRN back up and running wonderfully, not consuming 99% of the CPU cycles as earlier. Now this suggests some serious rethink of the survey module's interaction with the rest of the database. The fault is definitely with the population of the DB.

Shankar

17: Re: Appreciate help with dotLRN performance (response to 16)

Posted by Andrew Piskorski on 10/28/03 10:21 PM

So, just what query or page in the survey module do you say was causing the trouble? This might be quite easy to track down if you have the Developer Support package installed and turned on.

As it stands, you've basically said "something somewhere in the survey package sometimes takes a lot more CPU than I think it should", which is not especially useful as bug report.