Forum .LRN Q&A: Help Needed in Setting up .LRN to Scale
With this post I wanted to solicit any scaling experience that might have been gathered from other big installations of .LRN. We are of course looking at tuning the OpenACS datamodel and Tcl code if necessary, but we're also hoping that there is room for improvement in the OS/Oracle/AOLserver configuration. The Heidelberg setup and configuration currently more or less reflects the OpenACS documentation.
Thanks in advance!
does 'top' on the webserver machines show aolserver chewing a bunch of CPU? If so, looking to cache things could be a win.
Are the webserver machines i/o bound? If so, they might be swapping because too much is being cached.
Similarly, on the db machine, see if it is being CPU or i/o bound. If both sets of machines seem to be relatively idle, then there may be locking issues. Poking around the oracle data dictionary can show some of that stuff.
You'll find the docs for this in $ORACLE_HOME/rdbms/admin/spdoc.txt
Post the init.ora and the machine's parameters.
maybe you'll find something helpfull in this thread:
https://openacs.org/forums/message-view?message_id=156292
Cato
My economics of the situation: Getting a linux box running AOLserver should be around 1000 EUR plus 5 hours of work for setup and give you immediate gratification. Trying to tune OpenACS is most likely more costly *and* time consuming. The additional costs for system administration should be minimal but shouldn't be neglected, I agree.
Here's one thing that makes a big difference. We're regularly analyzing our
tables using the acs-monitoring package. Also, just after doing a new import, I
always analyze the entire schema using
SQL> exec dbms_stats.gather_schema_stats('DBUSER',cascade => true);
Cheers,
Andrew
We have made good experiences with the following setup:
- One server running a reverse proxy (we are using pound)
- One server with an aolserver instance (no openacs) serving "static images" (served from the file system not the content rep)
- One server with the openacs installation
- One db server (pg)
- The proxy uses the url to divide the requests between the image server and the openacs server
All servers are running linux rh8. We are currently using only one openacs server but of course that could be several machines as well. The proxy could do the load balancing. Unfortunately the more servers you have the more work you will have with handling failover problems :)
As far as openacs itself is concerned i think we (Vienna Univ of Business Admin) will have the same performance problems in a few weeks. I think a good point to start enhancing dotlrn performance is the portal system. For all community-portals (dotlrn_class, dotlrn_class_instance, dotlrn_club, dotlrn_department) but the user portal i have ripped out the real portal-system and created a "static" version where all portlet are called directly with given parameters (=unchangeable portal layout). Next important thing will be to enhance the real portal system and to cache portlet content.
Now that I've got the rewritten portal system working within OpenACS, my next two goals will be:
1. reintegration with .LRN (of course!)
2. Maximum caching of portal information. In particular parameters to portlets very, very rarely change and all the database operations to set the render call up can be cached more-or-less permanently. Since the parameters should only be changed through portal package API calls caching can be controlled with 100% accuracy unless someone goes out of their way to break the rules, in which case the screw
'em.
"Next important thing will be to enhance the real portal system and to cache portlet content."
Coming up with a useful scheme for this might be tricky, unless we just want to say "cache content for five minutes" or something like that. The problem with any portlet content caching approach implemented by the portal package itself is that portlet content won't match the content you get when you visit the package itself ... very confusing.
On the other hand if we implemented per-application package caching and if portlets share application code properly then they can both be made to render the same content, maybe even coherently if caching's implemented intelligently :)
But one thing is for sure ... portal-level stuff (determining layout, parameters to pass to portlets, etc) can be made to run with zero db hits (after the cache is filled by visitors, of course) without much trouble at all. I've been looking into it ...
</blockquote>
When looking in the cvs repository i found two different portal packages. Is the rewritten portal system the "portal" package in the /contrib directory? I currently use the new-portal package from the dotlrn repository. Is this the current solution?
<blockquote>2. Maximum caching of portal information. In particular >parameters to portlets very, very rarely change and all >the database operations to set the render call up can be >cached more-or-less permanently
</blockquote>
what i found difficult to deal with are portlets that use ns_query... to directly get some kind of user input (like the calendar list view). Do you have ideas how to get caching to work with those portlets?
At the moment it does not work with .LRN. Also its caching is probably less effective than new-portal at the moment since I didn't address this problem while reorganizing and rewriting big chunks of it (mostly because, if folks approve my TIP, I'd prefer to use the caching db_* API rather than util_memoize to do the caching).
Mostly I find OF's use of ns_sets for passing stuff around annoying but beyond that haven't looked into what it would take to make specific portlets cache their content. As I said above they really need to coordinate with caching versions of the underlying application if we're going to provide the user consistent views of application content. That's not a short-term fix, obviously, and short-term we may need to kludge things optionally ...
Of course one can ease the pain by minimizing the number of portlets per portal page. Ideal would be one, then you'd have the equivalent of application pages rather than portalled pages! :) OK, I'm being silly, but perhaps this helps make clear that when it comes to content it is really the application's responsibility if we're to present consistent views of content? Portal pages are bad performance-wise because rendering one's the equivalent of rendering index pages for several non-caching applications all at once.
I'm on a one week easter vacation now and won't have time to follow up on this thread before the 13:th of April. I have reassigned this scalability task to Joel Aufrecht so you can expect status updates from him on how the work progresses.
the machine we experience performance problems is
hardware:
- a sun fire 280r
- 2048 megabytes storage
- 2 * 36 gb disks, 1 * 200 gb raid
- 1 fastethernet adapter with 1 additional virtual
interface
software:
- solaris 2.8
- webct (uses about 128 mg storage, minimal cpu)
- dotlrn 2.0.1
- oracle 1.8.i server & db
michael
The biggest question is your disk IO. Just what is that "1 * 200 gb raid" exactly? If it is really a RAID 10 array with 4 or 8 disks or something like that, you might be fine. But I don't think that's what you have, and if my assumptions about your hardware are correct, it would probably only cost a few thousand dollars to buy a brand new and much faster Linux box.
Which isn't to say that your scaling problems are hardware related, they might not be. But when good server hardware is so cheap, it doesn't make sense to even try to run a large, high traffic site on a slow machine.
Because the box has only 2 GB of RAM, and .LRN isn't the only thing running on it, there is over 1 GB of swap in use. It appears that Oracle's SGA resides at least partly in swap (looking at iostat to see lots of swap activity while queries are run in sqlplus). This, of course, just kills performance.
To make matters worse, most everything is installed out on the disk array, so all of the log files, both nsd and Oracle, are being written to over a single SCSI channel.
My recommendation is to first get some more RAM, at *least* bring it up to 4 GB, and then if possible split things up across multiple disks, either by moving the log files to an internal disk or attaching a second disk array.
I'm pretty sure things will be fine after that, but if not, we'll continue looking into it at that point. I've tried doing a little query tuning but it's a lost cause right now; nothing I do makes any difference.
Does the installation not scale adequately or not perform adequately?
And how many people are accessing .LRN during regular times and during peak times?
How big is the Oracle SGA?
What exactly doesn't scale or perform? The whole system, some pages?
I have advised them to get the system up to at least 4 GB of RAM and see how we do then. More tuning may be needed at that point, but right now it's impossible to tell.
Al, you may be right that the system is underpowered, but furfly has a pretty busy Oracle-based ACS site on a dual Pentium with excellent performance, so you never know. Each installation seems to be different as far as how much load it puts on a system. I have mentioned to Lars that the system might not scale, but I think it's ok to take a wait and see approach for now.
What is eating up all the RAM?
2 GB is a paltry amount of RAM for a dual processor Sparc anyway; I would want to see 4 GB in that box even if we were only running one site on it.
Also shut down webct for a short test to see if makes a significant difference.
Can you also post your config.tcl without any sensitive data?
Also, can you post your authentication, kernel and main site parameter settings under /acs-admin/
How many authorities exists for your installation? Does it make a difference if you deactivate your URZ Heidelberg or Extern authority?
Have you ever tried postgresql instead. My installation with 22.000 users is quicker.
some testing over the past days with the following results:
Intensive testing and monitoring of the computer named "athena"
gave the following results:
a) The I/O-Usage of the RAID-Subsystem was about 6MB/sec for
writing and 18MB/sec for reading - not really high.
b) Running programs like iostat, vmstat and top has shown that
the highest data rate was caused by the TSM backup process.
In all other cases the data rate was less than 10 percent of
the values mentioned above.
c) Storage usage is about 90 percent, nearly no swap activities
have been recognized.
About 1 GB of storage is used by Oracle, the other application
Webct uses about 200 MB (for comparision: our very active Oracle
Server has been recently upgraded from 1 GB to 2 GB and performes
pretty well!!!)
d) CPU usage is below 1 percent - the highest CPU usage was caused
by the testing and monitoring programs mentioned above.
So we conclude that we do not have a performance problem caused by
hardware bottlenecks or by the other application Webct.
It looks likes we need some tuning of dotLRN and/or Oracle.
---------------------------------------
Based on the system specs Mat sent I think that if we cannot add RAM to this box then we may actually need to reduce the amount of space allocated to Oracle. I have set up statspack and taken a very quick snapshot of loading my own My Space page (and whatever else happened to go on during that time). This is not a very large sample but when I did this for Sloanspace it did help us pinpoint problems. One thing it hopefully will tell me is whether we have excess memory and can cut it back.
To be clear, I don't think this is the whole problem but it is certainly a contributing factor. In my opinion we need to get problems like this cleared up before we start tuning the application.
I'm going to go off now and study the report, which may take some time.
---------------------------------------
Because there is so little data in the report, I can't tell a whole lot about what our performance issues might be. But one thing is clear - we've got too much memory allocated to Oracle. The current size of the shared pool is 250,270,105 bytes, and at the moment I took the snapshot we were using 40% of it. That number is supposed to be between 75% and 85% for optimal performance. That, combined with our memory shortage, points to this being a number we should definitely change.
The number of bytes actually in use was 100,108,042, which is 75% of 133,477,389. Unless I hear any objections, I'll shut down the site and Oracle and change the shared pool size to that number. It may not be enough of a change to make much difference, considering we have almost 2 GB of swap being used, but it's the right thing to do in any case.
This is not necessarily the only change we'll want to make to the Oracle configuration, but the site needs to run a bit so I can take another snapshot with some better numbers in it. I think that the sort_area is probably too small, and the db_block_buffers might be too large, but I don't want to change them without some data to back it up. However, I think that even when all the tuning is done, we're still going to need more RAM for this system.
After I make the change to the shared pool size, the next step will be to start looking at the application. I am assuming that you want me to do this, and not just stick to Oracle tuning - let me know if that is not right.
I will wait about 15 minutes for objections and then make this change.
---------------------------------------
Ok, change has been made. Some stats:
With both Oracle and nsd shut down:
Memory: 2048M real, 1283M free, 675M swap in use, 4912M swap free
With Oracle running and nsd shut down:
Memory: 2048M real, 337M free, 1638M swap in use, 3948M swap free
With both running, after nsd had finished initializing:
Memory: 2048M real, 266M free, 1721M swap in use, 3865M swap free
So basically, there is a limit to what we can do here because the system is still using swap even with everything we are running on the box turned off! That might clear up with a reboot, but I expect it would happen again over time.
I will revisit this issue when I have more statspack data to work with but I think it's clear we aren't going to win this one without more RAM. Time to look at the application and see if there's anything we can do there.
---------------------------------------
I have examined several queries in detail, but no silver bullet has been found so far. The only thing that jumps out at me is that it has been a while since tables were last analyzed:
SQL> select last_analyzed from user_tables where table_name = 'ACS_OBJECT_TYPES';
LAST_ANALY
----------
2004-02-10
It would be a good idea to do this weekly, if not more often.
#1 - the dotlrn_users query in /dotlrn/admin/users
This query is *horribly* slow and does three full table scans. Unfortunately, none of my usual tricks worked to eliminate the scans.
#2 - the call to dotlrn_community_admin_p is the culprit here. Again, I was unable (so far) to make it run any faster.
However.... I have not given up, and I will continue working on this on Monday (possibly some on Saturday if I have time). It took a while to hit pay dirt on Sloanspace too; unfortunately (or fortunately, depending on your point of view) this installation doesn't have the Oracle misconfiguration that turned out to be responsible for a lot of our troubles on Sloanspace.
---------------------------------------
I have been thinking about this all weekend, and I kept coming back to the fact that the system is not heavily loaded, yet performance is poor. A situation that can be helped by tuning queries generally exhibits other signs of stress - high system load and Oracle processes using lots of CPU time. Not so here.
I asked Mike to take a look; he ran various OS tools looking at performance while I loaded the /dotlrn/admin/users page over and over. Mike believes he has found a potential problem. Here is what he wrote up for me, and I will comment further after:
"This looks like a disk I/O based performance problem.
The device to pay attention to is sd30 -- an external SCSI-attached disk array.
iostat shows that a large amount of disk I/O results when the page is loaded; kps is total traffic in kilobytes per second, tps is total transactions per second, and serv is service time (disk seek time) in milliseconds.
The disk service time is fine which tells us the disk array is not overloaded and the time to seek from the disk is reasonably speedy.
The ratio between the kps and tps tells us about file sizes -- in this case it looks like a lot of large files are being transfered when the page loads.
This looks to be a case where disk I/O bandwidth isn't sufficient for the query; multiple spindles are needed and the load should be divided between multiple disks (for example, sd30 has both /web and /ora8 which means the same disk is being hit to read from Oracle, write web access logs and transaction logs, as well as reading the html)."
Mike didn't see any signs of swapping going on during our tests.
Here's my version: a lot of data is going back and forth between the system and that disk array. Data gets read from Oracle tables, and intermediate results get written to the temporary tablespace. Redo, rollback and archive logs are written to. The nsd error and access logs are also written to. It appears that there is just so much data going through that one connection to the disk array that we're experiencing a traffic jam.
Now, it seems a bit odd to me that Oracle is doing this much disk access... I would have expected it and nsd to both keep this data in memory, especially as I reload the same page over and over again. I don't know off the top of my head how to tell how much of the database Oracle has got in memory; that will be tomorrow's research project, along with looking at another statspack report.
I'm not sure what to recommend as a course of action to fix this, assuming we end up agreeing that this is the problem, because I don't know what our options are. Do we have any other systems available which might be more suitable?
---------------------------------------
One thing that bothers me about this forming hypothesis is that we don't see any swap activity during page loads. It seems that we should, if we're going to blame the site's slowness on a disk i/o bottleneck. So I took the query from the /dotlrn/users/admin page and ran it in sqlplus, running iostat at the same time to monitor disk activity. This time I saw *lots* of disk activity on the swap device.
So what does this tell us? For one thing, I think it confirms the theory that the memory Oracle is using resides in the swap partition and not in RAM. That's a guaranteed performance killer, so we definitely have to fix that. It also tells us that some caching is happening somewhere, because when I load that page and the same query executes, there is very little swap activity. Unfortunately this doesn't explain why the page load is so slow anyway... the cache may also be out in the swap partition but that doesn't fully explain it.
At this point I believe that if we could bump the RAM in this system up to at least 4 GB it would help considerably. Mike also feels that there is too much disk activity going to one place - all those log files (nsd and Oracle) should be split up between at least two disks, preferably on separate channels.
In my opinion, it doesn't make sense to continue tuning queries or looking at the finer points of the Oracle installation until the hardware is adqequate to support the site; as I saw on Friday, the efforts are unlikely to result in any improvement.
---------------------------------------
Matthais, I'm not sure I understand the question, so let me just state clearly what I think we need to do.
First, if we are going to remain on this system we need more RAM. The system needs to have at least 4 GB (total) just to stop it from using any swap space, and it would be better if we had an extra GB or two (meaning 5 or 6 total) to have room for growth. If we have enough RAM, then everything that is supposed to be loaded into RAM, like Oracle's working area, will be and performance will be much improved.
At that point it's possible that things will be running well enough that the external disk array will no longer be a problem. If it is still a problem, then we will need either access to a second external array, so we can split up the log files, or (even better) an internal disk added to the system.
At this time there is no need for a high performance system, just a few more resources allocated to this one.
---------------------------------------
Here are the results of my experiment. I took snapshots via the top command at each step.
before:
Memory: 2048M real, 301M free, 1563M swap in use, 4023M swap free
nsd shut down:
Memory: 2048M real, 441M free, 1422M swap in use, 4165M swap free
Oracle shut down:
Memory: 2048M real, 1372M free, 459M swap in use, 5130M swap free
WebCT shut down:
Memory: 2048M real, 1413M free, 383M swap in use, 5207M swap free
At this point nothing is running but Solaris, so this is a baseline state. It's possible that a bit more memory would be available if we could reboot, but this looks pretty normal to me.
on the way back up:
Oracle started up:
Memory: 2048M real, 457M free, 1339M swap in use, 4248M swap free
nsd started up:
Memory: 2048M real, 430M free, 1363M swap in use, 4225M swap free
after site has come up all the way and a few pages loaded:
Memory: 2048M real, 275M free, 1480M swap in use, 4107M swap free
WebCT started up:
Memory: 2048M real, 266M free, 1512M swap in use, 4075M swap free
Conclusion:
Oracle grabbed 1915M of RAM, considerably more than was available, so even when it was the only thing running it caused the system to go into swap. It is the major resource hog here. WebCT used a small amount of memory so, at least as far as RAM goes, it's presence is not making a significant difference to system performance.
As you might expect, the site ran no faster with WebCT shut down, because the system was basically just as far into swap as it was when I started.
I am still convinced that adding RAM (at least 2 GB) is the most important thing we can do to improve the situation.
---------------------------------------
I forgot to mention one thing - I can make Oracle require less RAM, but I probably can't get it down small enough. And even if I could, it would only work for a short time; Oracle performs best when it is able to load the entire data set into RAM, and if it has a minimal amount of space to work with it will lose the ability to do that as your users add content. So performance would fall off quickly at some point in the not-too-distant future. It is really better to fix this properly now.
---------------------------------------
Ok, that's the trail so far. Comments?
Again, can you kindly post the following info:
What is the request info of ds for simply login into the system?
Does the above make a significant difference if webct is shut down?
How are the config.tcl settings?
What are the authentication, kernel and main site parameter settings under /acs-admin/?
How many authorities exists for your installation? Does it make a difference if you deactivate your URZ Heidelberg or Extern authority?
In particular we do not currently have PermissionCacheP turned on, but in that thread you reported having some trouble with it. Since this is a production system I don't want to turn it on if users might encounter errors. Is it is working cleanly now?
Request Information
Main Site : Developer Support : Request Information
Parameters
Request Start Time:
2004-04-08 17:59:39
Request Completion Time:
2004-04-08 17:59:41
Request Duration:
2215 ms
IP:
18.170.5.196
Method:
GET
URL:
/dotlrn/index
Query:
(empty)
Request Processor
+49.4 ms: Applied transformation from /web/product/www / dotlrn/index -> ? - 7.5 ms
+63.3 ms: Served file /web/product/packages/dotlrn/www/index.adp with adp_parse_ad_conn_file - 2146.0 ms
+2211.2 ms: Applied GET filter: (for /dotlrn/index ds_trace_filter) - 10.2 ms
returned filter_ok
show RP debugging information
Comments
rp_handler: trying rp_serve_abstract_file /web/product/www / dotlrn/index
rp_handler: not found
rp_handler: trying rp_serve_abstract_file /web/product/packages/dotlrn/www / index
Headers
Host:
athena2.uni-heidelberg.de
Accept:
*/*
Accept-Language:
en
Pragma:
no-cache
Connection:
Keep-Alive
Referer:
http://athena2.uni-heidelberg.de/register/?blocale=en%5fUS&return%5furl=%2fdotlrn%2findex
User-Agent:
Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)
UA-OS:
MacOS
UA-CPU:
PPC
Cookie:
ad_session_id=45199%2c110325%2c1%20%7b31%2010812456245179%20E459FB42B0DD8685A1F56F45645651CB19A532BA6BBDC8%7d; ad_user_login=110325%2c145636226%2cC4D30F674%20%7b531%201081468779%2071A84256603EB939973A0131D86AFB873DBA547E82B%7d
Extension:
Security/Remote-Passphrase
Output Headers
Expires:
Thu, 08 Apr 2004 15:59:41 GMT
Pragma:
no-cache
Cache-Control:
no-cache
Content-Type:
text/html; charset=utf-8
MIME-Version:
1.0
Date:
Thu, 08 Apr 2004 15:59:41 GMT
Server:
AOLserver/3.3.1+ad13
Content-Length:
18207
Connection:
close
Database Requests
Duration
Pool
Command
1 ms
pool2
gethandle (returned nsdb0)
4 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
3 ms
pool2
dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0
select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = :user_id)
4 ms
pool2
dbqd.dotlrn.tcl.community-procs.dotlrn_community::get_all_communities_by_user.select_communities_by_user: select nsdb0
select dotlrn_communities_full.*
from dotlrn_communities_full,
dotlrn_member_rels_approved
where dotlrn_communities_full.community_id = dotlrn_member_rels_approved.community_id
and dotlrn_member_rels_approved.user_id = :user_id
1 ms
pool2
getrow nsdb0
6 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
3 ms
pool2
dbqd.dotlrn.tcl.dotlrn-procs.dotlrn::get_portal_id_not_cached.select_user_portal_id: 0or1row nsdb0
select portal_id
from dotlrn_users
where user_id = :user_id
4 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
4 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
4 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::render.portal_select: 0or1row nsdb0
select portals.name,
portals.portal_id,
portals.theme_id,
portal_layouts.layout_id,
portal_layouts.filename as layout_filename,
portal_pages.page_id
from portals,
portal_pages,
portal_layouts
where portal_pages.sort_key = :sort_key
and portal_pages.portal_id = :portal_id
and portal_pages.portal_id = portals.portal_id
and portal_pages.layout_id = portal_layouts.layout_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::render.element_select: select nsdb0
select portal_element_map.element_id,
portal_element_map.region,
portal_element_map.sort_key
from portal_element_map,
portal_pages
where portal_pages.portal_id = :portal_id
and portal_element_map.page_id = :page_id
and portal_element_map.page_id = portal_pages.page_id
and portal_element_map.state != 'hidden'
order by portal_element_map.region,
portal_element_map.sort_key
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
9 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::evaluate_element.element_select: 0or1row nsdb0
select pem.element_id,
pem.datasource_id,
pem.state,
pet.filename as filename,
pet.resource_dir as resource_dir,
pem.pretty_name as pretty_name,
pd.name as ds_name
from portal_element_map pem,
portal_element_themes pet,
portal_datasources pd
where pet.theme_id = :theme_id
and pem.element_id = :element_id
and pem.datasource_id = pd.datasource_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::element_params_not_cached.params_select: select nsdb0
select key,
value
from portal_element_parameters
where element_id = :element_id
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
6 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
5 ms
pool2
dbqd.dotlrn.www.dotlrn-main-portlet.select_communities: select nsdb0
select dotlrn_communities_all.*,
dotlrn_community.url(dotlrn_communities_all.community_id) as url,
decode(dotlrn_communities_all.community_type, 'dotlrn_community', 'dotlrn_community',
'dotlrn_club', 'dotlrn_club',
'dotlrn_class_instance') as simple_community_type,
decode(dotlrn_community_admin_p(dotlrn_communities_all.community_id, dotlrn_member_rels_approved.user_id),'f',0,1) as admin_p,
tree.tree_level(dotlrn_communities_all.tree_sortkey) as tree_level,
nvl((select tree.tree_level(dotlrn_community_types.tree_sortkey)
from dotlrn_community_types
where dotlrn_community_types.community_type = dotlrn_communities_all.community_type), 0) as community_type_level
from dotlrn_communities_all,
dotlrn_member_rels_approved
where dotlrn_communities_all.community_id = dotlrn_member_rels_approved.community_id
and dotlrn_member_rels_approved.user_id = :user_id
order by dotlrn_communities_all.tree_sortkey
1 ms
pool2
getrow nsdb0
4 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::evaluate_element.element_select: 0or1row nsdb0
select pem.element_id,
pem.datasource_id,
pem.state,
pet.filename as filename,
pet.resource_dir as resource_dir,
pem.pretty_name as pretty_name,
pd.name as ds_name
from portal_element_map pem,
portal_element_themes pet,
portal_datasources pd
where pet.theme_id = :theme_id
and pem.element_id = :element_id
and pem.datasource_id = pd.datasource_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::element_params_not_cached.params_select: select nsdb0
select key,
value
from portal_element_parameters
where element_id = :element_id
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
6 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
4 ms
pool2
dbqd.forums-portlet.www.forums-portlet.select_forums: select nsdb0
select forums_forums.package_id,
acs_object.name(apm_package.parent_id(forums_forums.package_id)) as parent_name,
(select site_node.url(site_nodes.node_id)
from site_nodes
where site_nodes.object_id = forums_forums.package_id) as url,
forums_forums.forum_id,
forums_forums.name,
case when last_modified > (sysdate - 1) then 't' else 'f' end as new_p
from forums_forums_enabled forums_forums,
acs_objects
where acs_objects.object_id = forums_forums.forum_id and
forums_forums.package_id in (0)
order by parent_name,
forums_forums.name
1 ms
pool2
getrow nsdb0
4 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::evaluate_element.element_select: 0or1row nsdb0
select pem.element_id,
pem.datasource_id,
pem.state,
pet.filename as filename,
pet.resource_dir as resource_dir,
pem.pretty_name as pretty_name,
pd.name as ds_name
from portal_element_map pem,
portal_element_themes pet,
portal_datasources pd
where pet.theme_id = :theme_id
and pem.element_id = :element_id
and pem.datasource_id = pd.datasource_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::element_params_not_cached.params_select: select nsdb0
select key,
value
from portal_element_parameters
where element_id = :element_id
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
4 ms
pool2
dbqd.faq-portlet.www.faq-portlet.select_faqs: select nsdb0
select acs_objects.context_id as package_id,
acs_object.name(apm_package.parent_id(acs_objects.context_id)) as parent_name,
(select site_node.url(site_nodes.node_id)
from site_nodes
where site_nodes.object_id = acs_objects.context_id) as url,
faqs.faq_id,
faqs.faq_name
from faqs,
acs_objects
where faqs.faq_id = acs_objects.object_id
and faqs.disabled_p <> 't'
and acs_objects.context_id in (0)
order by lower(faq_name)
1 ms
pool2
getrow nsdb0
4 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::evaluate_element.element_select: 0or1row nsdb0
select pem.element_id,
pem.datasource_id,
pem.state,
pet.filename as filename,
pet.resource_dir as resource_dir,
pem.pretty_name as pretty_name,
pd.name as ds_name
from portal_element_map pem,
portal_element_themes pet,
portal_datasources pd
where pet.theme_id = :theme_id
and pem.element_id = :element_id
and pem.datasource_id = pd.datasource_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::element_params_not_cached.params_select: select nsdb0
select key,
value
from portal_element_parameters
where element_id = :element_id
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
10 ms
pool2
dbqd.news-portlet.www.news-portlet.select_news_items: select nsdb0
select news_items_approved.package_id,
acs_object.name(apm_package.parent_id(news_items_approved.package_id)) as parent_name,
(select site_node.url(site_nodes.node_id)
from site_nodes
where site_nodes.object_id = news_items_approved.package_id) as url,
news_items_approved.item_id,
news_items_approved.publish_title,
to_char(news_items_approved.publish_date, 'YYYY-MM-DD HH24:MI:SS') as publish_date_ansi
from news_items_approved
where news_items_approved.publish_date < sysdate
and (news_items_approved.archive_date >= sysdate or news_items_approved.archive_date is null)
and news_items_approved.package_id in (0)
order by parent_name,
news_items_approved.publish_date desc,
news_items_approved.publish_title
2 ms
pool2
getrow nsdb0
5 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::evaluate_element.element_select: 0or1row nsdb0
select pem.element_id,
pem.datasource_id,
pem.state,
pet.filename as filename,
pet.resource_dir as resource_dir,
pem.pretty_name as pretty_name,
pd.name as ds_name
from portal_element_map pem,
portal_element_themes pet,
portal_datasources pd
where pet.theme_id = :theme_id
and pem.element_id = :element_id
and pem.datasource_id = pd.datasource_id
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::element_params_not_cached.params_select: select nsdb0
select key,
value
from portal_element_parameters
where element_id = :element_id
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
7 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
3 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
6 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
73 ms
pool2
dbqd.calendar.www.view-one-day-display.select_day_items: select nsdb0
select nvl(e.name, a.name) as name,
nvl(e.status_summary, a.status_summary) as status_summary,
e.event_id as item_id,
(select type from cal_item_types where item_type_id= ci.item_type_id) as item_type,
cals.calendar_id,
cals.calendar_name
from acs_activities a,
acs_events e,
timespans s,
time_intervals t,
cal_items ci,
calendars cals
where e.timespan_id = s.timespan_id
and s.interval_id = t.interval_id
and e.activity_id = a.activity_id
and start_date between
to_date(:current_date_system,:ansi_date_format) and
(to_date(:current_date_system,:ansi_date_format) + (24 - 1/3600)/24)
and ci.cal_item_id = e.event_id
and to_char(start_date, 'HH24:MI') = '00:00'
and to_char(end_date, 'HH24:MI') = '00:00'
and cals.calendar_id = ci.on_which_calendar
and e.event_id = ci.cal_item_id
and on_which_calendar in (110394) and (cals.private_p='f' or (cals.private_p='t' and cals.owner_id= :user_id))
1 ms
pool2
getrow nsdb0
205 ms
pool2
dbqd.calendar.www.view-one-day-display.select_day_items_with_time: select nsdb0
select to_char(start_date, :ansi_date_format) as ansi_start_date,
to_char(end_date, :ansi_date_format) as ansi_end_date,
nvl(e.name, a.name) as name,
nvl(e.status_summary, a.status_summary) as status_summary,
e.event_id as item_id,
(select type from cal_item_types where item_type_id= ci.item_type_id) as item_type,
cals.calendar_id,
cals.calendar_name
from acs_activities a,
acs_events e,
timespans s,
time_intervals t,
cal_items ci,
calendars cals
where e.timespan_id = s.timespan_id
and s.interval_id = t.interval_id
and e.activity_id = a.activity_id
and start_date between
to_date(:current_date_system,:ansi_date_format) and
(to_date(:current_date_system,:ansi_date_format) + (:end_display_hour - 1/3600)/:end_display_hour)
and ci.cal_item_id = e.event_id
and (to_char(start_date, 'HH24:MI') <> '00:00' or
to_char(end_date, 'HH24:MI') <> '00:00')
and cals.calendar_id = ci.on_which_calendar
and e.event_id = ci.cal_item_id
and on_which_calendar in (110394) and (cals.private_p='f' or (cals.private_p='t' and cals.owner_id= :user_id))
order by to_char(start_date,'HH24')
1 ms
pool2
getrow nsdb0
3 ms
pool2
dbqd.acs-lang.tcl.locale-procs.lang::user::timezone_no_cache.select_user_timezone: 0or1row nsdb0
select timezone
from user_preferences
where user_id = :user_id
4 ms
pool2
dbqd.calendar.www.view-one-day-display.select_day_info: 0or1row nsdb0
select to_char(to_date(:current_date, 'yyyy-mm-dd'), 'Day, DD Month YYYY')
as day_of_the_week,
to_char((to_date(:current_date, 'yyyy-mm-dd') - 1), 'yyyy-mm-dd')
as yesterday,
to_char((to_date(:current_date, 'yyyy-mm-dd') + 1), 'yyyy-mm-dd')
as tomorrow
from dual
4 ms
pool2
dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0
select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = :user_id)
3 ms
pool2
dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0
select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = :user_id)
3 ms
pool2
dbqd.new-portal.tcl.portal-procs.portal::navbar.list_page_nums_select: select nsdb0
select pretty_name,
sort_key as page_num
from portal_pages
where portal_id = :portal_id
order by sort_key
1 ms
pool2
getrow nsdb0
130 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
1 ms
pool2
getrow nsdb0
53 ms
pool2
dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0
select 1
from dual
where 't' = acs_permission.permission_p(:object_id, :party_id, :privilege)
5 ms
pool2
dbqd.curriculum.tcl.misc-procs.curriculum::enabled_elements.element_ns_set_list: select nsdb0
select cee.element_id,
cc.curriculum_id,
cc.name as curriculum_name,
cee.url,
cee.external_p,
cee.name
from (select curriculum_id
from cu_curriculums
where package_id = :package_id
MINUS
select curriculum_id
from cu_user_curriculum_map
where user_id = :user_id
and package_id = :package_id) desired,
workflow_cases cas,
workflow_case_fsm cfsm,
cu_curriculums cc,
cu_elements_enabled cee
where cc.package_id = :package_id
and desired.curriculum_id = cc.curriculum_id
and cc.curriculum_id = cee.curriculum_id
and cas.object_id = cc.curriculum_id
and cfsm.case_id = cas.case_id
and cfsm.current_state = :state_id
order by cc.sort_key,
cee.sort_key
1 ms
pool2
getrow nsdb0
1 ms
pool2
releasehandle nsdb0
704 ms
(total)
Developer Information
3 database commands totalling 19 ms
page served in 203 ms
mailto:dotlrn@uni-hd.de
-- Does the above make a significant difference if webct is shut down?
No, Carl tried this a few days ago and saw no difference at all.
-- How are the config.tcl settings?
ns_log notice "nsd.tcl: starting to read config file..."
######################################################################
#
# Instance-specific settings
# These default settings will only work in limited circumstances
# Two servers with default settings cannot run on the same host
#
######################################################################
#---------------------------------------------------------------------
# change to 80 and 443 for production use
set httpport 80
set httpsport 443
# The hostname and address should be set to actual values.
#set hostname [ns_info hostname]
set hostname athena2.uni-heidelberg.de
#set address [ns_info address]
set address 129.206.100.143
set server "product"
set servername "Athena - dotLRN UNI HD"
set serverroot "/web/${server}"
#---------------------------------------------------------------------
# which database do you want? postgres or oracle
set database oracle
set db_name $server
if { $database == "oracle" } {
set db_password "itsgonenow"
} else {
set db_host localhost
set db_port ""
set db_user $server
}
#---------------------------------------------------------------------
# if debug is false, all debugging will be turned off
set debug false
set homedir /usr/local/aolserver
set bindir [file dirname [ns_info nsd]]
#---------------------------------------------------------------------
# which modules should be loaded? Missing modules break the server, so
# don't uncomment modules unless they have been installed.
ns_section ns/server/${server}/modules
ns_param nssock ${bindir}/nssock.so
ns_param nslog ${bindir}/nslog.so
ns_param nssha1 ${bindir}/nssha1.so
ns_param nscache ${bindir}/nscache.so
ns_param nsrewrite ${bindir}/nsrewrite.so
#---------------------------------------------------------------------
# nsopenssl will fail unless the cert files are present as specified
# later in this file, so it's disabled by default
ns_param nsopenssl ${bindir}/nsopenssl.so
# Full Text Search
#ns_param nsfts ${bindir}/nsfts.so
# PAM authentication
ns_param nspam ${bindir}/nspam.so
# LDAP authentication
#ns_param nsldap ${bindir}/nsldap.so
# These modules aren't used in standard OpenACS installs
#ns_param nsperm ${bindir}/nsperm.so
#ns_param nscgi ${bindir}/nscgi.so
#ns_param nsjava ${bindir}/libnsjava.so
if { [ns_info version] >= 4 } {
# Required for AOLserver 4.x
ns_param nsdb ${bindir}/nsdb.so
} else {
# Required for AOLserver 3.x
ns_param libtdom ${bindir}/libtdom.so
}
#---------------------------------------------------------------------
#
# Rollout email support
#
# These procs help manage differing email behavior on
# dev/staging/production.
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/acs/acs-rollout-support
# EmailDeliveryMode can be:
# default: Email messages are sent in the usual manner.
# log: Email messages are written to the server's error log.
# redirect: Email messages are redirected to the addresses specified
# by the EmailRedirectTo parameter. If this list is absent
# or empty, email messages are written to the server's error log.
# filter: Email messages are sent to in the usual manner if the
# recipient appears in the EmailAllow parameter, otherwise they
# are logged.
#ns_param EmailDeliveryMode redirect
#ns_param EmailRedirectTo mailto:somenerd@yourdomain.test, mailto:othernerd@yourdomain.tes
t
#ns_param EmailAllow mailto:somenerd@yourdomain.test,mailto:othernerd@yourdomain.test
######################################################################
#
# End of instance-specific settings
#
# Nothing below this point need be changed in a default install.
#
######################################################################
#---------------------------------------------------------------------
#
# AOLserver's directories. Autoconfigurable.
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Where are your pages going to live ?
#
set pageroot ${serverroot}/www
set directoryfile index.tcl,index.adp,index.html,index.htm
#---------------------------------------------------------------------
# Global server parameters
#---------------------------------------------------------------------
ns_section ns/parameters
ns_param serverlog ${serverroot}/log/error.log
ns_param home $homedir
ns_param maxkeepalive 0
ns_param logroll on
ns_param maxbackup 5
ns_param debug $debug
ns_param HackContentType 1
ns_param URLCharset utf-8
ns_param OutputCharset utf-8
ns_param HttpOpenCharset utf-8
ns_param DefaultCharset utf-8
#---------------------------------------------------------------------
# Thread library (nsthread) parameters
#---------------------------------------------------------------------
ns_section ns/threads
ns_param mutexmeter true ;# measure lock contention
# The per-thread stack size must be a multiple of 8k for AOLServer to run under M
acOS X
ns_param stacksize [expr 128 * 8192]
#
# MIME types.
#
# Note: AOLserver already has an exhaustive list of MIME types, but in
# case something is missing you can add it here.
#
ns_section ns/mimetypes
ns_param Default text/plain
ns_param NoExtension text/plain
ns_param .pcd image/x-photo-cd
ns_param .prc application/x-pilot
ns_param .xls application/vnd.ms-excel
ns_param .doc application/vnd.ms-word
#
# Tcl Configuration
#
ns_section ns/server/${server}/tcl
ns_param library ${serverroot}/tcl
ns_param autoclose on
ns_param debug $debug
#---------------------------------------------------------------------
#
# Server-level configuration
#
# There is only one server in AOLserver, but this is helpful when multiple
# servers share the same configuration file. This file assumes that only
# one server is in use so it is set at the top in the "server" Tcl variable
# Other host-specific values are set up above as Tcl variables, too.
#
#---------------------------------------------------------------------
ns_section ns/servers
ns_param $server $servername
#
# Server parameters
#
ns_section ns/server/${server}
ns_param directoryfile $directoryfile
ns_param pageroot $pageroot
ns_param maxconnections 5
ns_param maxdropped 0
ns_param maxthreads 5
ns_param minthreads 5
ns_param threadtimeout 120
ns_param globalstats false ;# Enable built-in statistics
ns_param urlstats false ;# Enable URL statistics
ns_param maxurlstats 1000 ;# Max number of URL's to do stats on
#ns_param directoryadp $pageroot/dirlist.adp ;# Choose one or the other
#ns_param directoryproc _ns_dirlist ;# ...but not both!
#ns_param directorylisting fancy ;# Can be simple or fancy
#
# Special HTTP pages
#
ns_param NotFoundResponse "/global/file-not-found.html"
ns_param ServerBusyResponse "/global/busy.html"
ns_param ServerInternalErrorResponse "/global/error.html"
#---------------------------------------------------------------------
#
# ADP (AOLserver Dynamic Page) configuration
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/adp
ns_param map /*.adp ;# Extensions to parse as ADP's
#ns_param map "/*.html" ;# Any extension can be mapped
ns_param enableexpire false ;# Set "Expires: now" on all ADP's
ns_param enabledebug $debug ;# Allow Tclpro debugging with "?debug"
ns_param defaultparser fancy
ns_section ns/server/${server}/adp/parsers
ns_param fancy ".adp"
#---------------------------------------------------------------------
#
# Socket driver module (HTTP) -- nssock
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/module/nssock
ns_param timeout 120
ns_param address $address
ns_param hostname $hostname
ns_param port $httpport
#---------------------------------------------------------------------
#
# OpenSSL
#
#---------------------------------------------------------------------
ns_section "ns/server/${server}/module/nsopenssl"
ns_param ModuleDir ${serverroot}/etc/certs
# NSD-driven connections:
ns_param ServerPort $httpsport
ns_param ServerHostname $hostname
ns_param ServerAddress $address
ns_param ServerCertFile certfile.pem
#ns_param ServerCertFile athena2.pem
ns_param ServerKeyFile keyfile.pem
ns_param ServerProtocols "SSLv2, SSLv3, TLSv1"
ns_param ServerCipherSuite "ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+E
XP"
ns_param ServerSessionCache false
ns_param ServerSessionCacheID 1
ns_param ServerSessionCacheSize 512
ns_param ServerSessionCacheTimeout 300
#ns_param ServerPeerVerify true
ns_param ServerPeerVerify false
ns_param ServerPeerVerifyDepth 3
ns_param ServerCADir ca
ns_param ServerCAFile ca.pem
ns_param ServerTrace false
# For listening and accepting SSL connections via Tcl/C API:
ns_param SockServerCertFile certfile.pem
#ns_param SockServerCertFile athena2.pem
ns_param SockServerKeyFile keyfile.pem
ns_param SockServerProtocols "SSLv2, SSLv3, TLSv1"
ns_param SockServerCipherSuite "ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SS
Lv2:+EXP"
ns_param SockServerSessionCache false
ns_param SockServerSessionCacheID 2
ns_param SockServerSessionCacheSize 512
ns_param SockServerSessionCacheTimeout 300
#ns_param SockServerPeerVerify true
ns_param SockServerPeerVerify false
ns_param SockServerPeerVerifyDepth 3
ns_param SockServerCADir internal_ca
ns_param SockServerCAFile internal_ca.pem
ns_param SockServerTrace false
# Outgoing SSL connections
ns_param SockClientCertFile certfile.pem
#ns_param SockClientCertFile athena2.pem
ns_param SockClientKeyFile keyfile.pem
ns_param SockClientProtocols "SSLv2, SSLv3, TLSv1"
ns_param SockClientCipherSuite "ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SS
Lv2:+EXP"
ns_param SockClientSessionCache false
ns_param SockClientSessionCacheID 3
ns_param SockClientSessionCacheSize 512
ns_param SockClientSessionCacheTimeout 300
ns_param SockClientPeerVerify true
ns_param SockClientPeerVerify false
ns_param SockServerPeerVerifyDepth 3
ns_param SockClientCADir ca
ns_param SockClientCAFile ca.pem
ns_param SockClientTrace false
# OpenSSL library support:
#ns_param RandomFile /some/file
ns_param SeedBytes 1024
#---------------------------------------------------------------------
#
# Database drivers
# The database driver is specified here.
# Make sure you have the driver compiled and put it in {aolserverdir}/bin
#
#---------------------------------------------------------------------
ns_section "ns/db/drivers"
if { $database == "oracle" } {
ns_param ora8 ${bindir}/ora8.so
} else {
ns_param postgres ${bindir}/nspostgres.so ;# Load PostgreSQL driver
}
#
# Database Pools: This is how AOLserver ``talks'' to the RDBMS. You need
# three for OpenACS: main, log, subquery. Make sure to replace ``yourdb''
# and ``yourpassword'' with the actual values for your db name and the
# password for it, if needed.
# AOLserver can have different pools connecting to different databases
# and even different different database servers.
#
ns_section ns/db/pools
ns_param pool1 "Pool 1"
ns_param pool2 "Pool 2"
ns_param pool3 "Pool 3"
ns_section ns/db/pool/pool1
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}
ns_section ns/db/pool/pool2
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}
ns_section ns/db/pool/pool3
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}
ns_section ns/server/${server}/db
ns_param pools "*"
ns_param defaultpool pool1
ns_section ns/server/${server}/redirects
ns_param 404 "global/file-not-found.html"
ns_param 403 "global/forbidden.html"
#---------------------------------------------------------------------
#
# Access log -- nslog
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/module/nslog
ns_param debug false
ns_param dev false
ns_param enablehostnamelookup false
ns_param file ${serverroot}/log/${server}.log
ns_param logcombined true
ns_param extendedheaders COOKIE
#ns_param logrefer false
#ns_param loguseragent false
ns_param maxbackup 1000
ns_param rollday *
ns_param rollfmt %Y-%m-%d-%H:%M
ns_param rollhour 0
ns_param rollonsignal true
ns_param rolllog true
#---------------------------------------------------------------------
#
# nsjava - aolserver module that embeds a java virtual machine. Needed to
# support webmail. See http://nsjava.sourceforge.net for further
# details. This may need to be updated for OpenACS4 webmail
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/module/nsjava
ns_param enablejava off ;# Set to on to enable nsjava.
ns_param verbosejvm off ;# Same as command line -debug.
ns_param loglevel Notice
ns_param destroyjvm off ;# Destroy jvm on shutdown.
ns_param disablejitcompiler off
ns_param classpath /usr/local/jdk/jdk118_v1/lib/classes.zip:${bindir}/
nsjava.jar:${pageroot}/webmail/java/activation.jar:${pageroot}/webmail/java/mail.
jar:${pageroot}/webmail/java
#---------------------------------------------------------------------
#
# CGI interface -- nscgi, if you have legacy stuff. Tcl or ADP files inside
# AOLserver are vastly superior to CGIs. I haven't tested these params but they
# should be right.
#
#---------------------------------------------------------------------
#ns_section "ns/server/${server}/module/nscgi"
# ns_param map "GET /cgi-bin/ /web/$server/cgi-bin"
# ns_param map "POST /cgi-bin/ /web/$server/cgi-bin"
# ns_param Interps CGIinterps
#ns_section "ns/interps/CGIinterps"
# ns_param .pl "/usr/bin/perl"
#---------------------------------------------------------------------
#
# PAM authentication
#
#---------------------------------------------------------------------
ns_section ns/server/${server}/module/nspam
ns_param PamDomain "aolserver"
ns_log notice "nsd.tcl: finished reading config file."
-- What are the authentication, kernel and main site parameter settings under /acs-admin/?
Are there particular values you're interested in? I tried to copy/paste the pages but the values in the edit boxes don't copy. I won't type them all up unless you really want to see them all...
How many authorities exists for your installation? Does it make a difference if you deactivate your URZ Heidelberg or Extern authority?
I haven't tried this; I don't know much about external authentication and it sounds like something I probably can't do during the day. But since virtually every page in the site is slow, this is an unlikely culprit, isn't it?
1. sample request info:
It says 2215 ms for request duration, 2146.0 ms for the /dotlrn/www/index.adp page with a total of 704 ms for database stuff.
Mannheim:
49 database commands totalling 677 ms
page served in 1091 ms
Question: The database stuff seems equivalent. So where is the 1sec loss in Heidelberg?
2. config.tcl
Should be the problem since as far as I know at present you don't have many active connections anyway:
Heidelberg:
ns_param maxconnections 5
ns_param maxdropped 0
ns_param maxthreads 5
ns_param minthreads 5
Mannheim:
ns_param maxconnections 100
ns_param maxdropped 0
ns_param maxthreads 50
ns_param minthreads 50
ns_param threadtimeout 3600
Database settings:
Heidelberg (3 x)
ns_param connections 5
Mannheim (3 x)
ns_param connections 10
3. authentication, kernel and main site parameter settings
I don't know yet. Just wanted to compare all of them. Can you save the html pages and send them to me? I will convert them and post it for you (leaving away the heidelberg info stuff) - if you like.
4. authorities
I had performance changes with multiple authorities enabled. Just give it a try.
+57.1 ms: Applied transformation from /web/product/www / dotlrn/index -> ? - 7.7 ms
+71.2 ms: Served file /web/product/packages/dotlrn/www/index.adp with adp_parse_ad_conn_file - 2937.4 ms
+3010.5 ms: Applied GET filter: (for /dotlrn/index ds_trace_filter) - 10.5 ms
returned filter_ok
So it's even slower this time, but the time spent in the database is still only 651 ms, 53 ms *less* than last time even though the overall time is roughly 800 ms longer.
The difference is all in serving the file, but there's nothing here to tell us why or how it took so much longer. I imagine that's where the difference is between our numbers and yours, too.
Hmm. Will have to think about this. Of course, if my hypothesis is correct and the system is out of RAM, then *everything* will be running somewhat slowly, so it's still possible that this is the problem, that nsd is just puttering along.
As far as the connections go, ideally those numbers would be increased but I don't think this system can handle any more connections, so I think it best to leave that as-is until we figure out the problem.
i will get back to you on the other questions.
It contains not quite four hours of data so it's not exactly a definitive report, but it's a start.
I only looked it over briefly and didn't make it all the way through (I have a meeting to go to) but the only thing that jumped out at me is that the memory usage in the shared pool is very low. It was 39% when I checked it last week so I cut down the shared pool almost in half, and it has only made it up to 40%. Obviously some more trimming could be done here (80% utilization is ideal) but we're only talking about roughtly 130 MB at this point so it's not enough to make a huge difference.
I'm not paying a huge amount of attention to the lists of slow SQL because the application is very nearly uniformly slow. We know that performance of .LRN sites is not always this bad, so I think we should be looking for more global problems and not getting bogged down in tuning individual queries (yet).
If anyone spots anything I missed or has a different interpretation, please let me know!
I cut down the size of the Oracle SGA radically, from roughly 930 MB to about 46 MB. I did this with no regard at all for formulas; I just grabbed the numbers off of one of my Linux boxes, which runs a fairly busy Oracle site.
This is not an entirly fair test, because Solaris systems don't fully recover from having gone into swap without a reboot. But I did improve the memory situation; after nsd has been running a while things look like this:
Memory: 2048M real, 1041M free, 730M swap in use, 4855M swap free
There's still too much swap in use for my taste, but as I said, that's not going away without rebooting the system.
The good news, which is also the bad news, is that this did not change the site performance one iota. It's no worse than it was, but it's no faster either. So we just reclaimed a bunch of space (though perhaps a bit too drastically) but it didn't help either. Keeping in mind that a reboot still might halp us out, it looked like time to move on to other ideas.
I still think that this is a system or database problem, not a site problem, simply because the performance is so uniformly bad. So instead of profilng the application, I took a known slow query (from /dotlrn/admin/users) and ran it in sqlplus, while running a variation on iostat at the same time. I got these results (edited to remove data we don't care about):
athena:/> iostat -xMne 1 60 extended device statistics ---- errors --- r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 1.4 2.0 0.1 0.0 0.2 0.1 56.5 19.4 0 2 40 0 0 40 c3t0d0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 7.8 0 1 40 0 0 40 c3t0d0 0.0 26.0 0.0 0.2 0.0 0.3 0.0 10.2 0 26 40 0 0 40 c3t0d0 0.0 107.0 0.0 0.8 0.0 1.0 0.0 9.2 0 98 40 0 0 40 c3t0d0 0.0 125.0 0.0 1.0 0.0 1.0 0.0 7.7 0 97 40 0 0 40 c3t0d0 0.0 144.0 0.0 1.1 0.0 1.0 0.0 6.9 0 95 40 0 0 40 c3t0d0 0.0 139.0 0.0 1.1 0.0 0.9 0.0 6.4 0 90 40 0 0 40 c3t0d0 0.0 141.0 0.0 1.1 0.0 1.0 0.0 6.7 0 95 40 0 0 40 c3t0d0 0.0 134.0 0.0 1.0 0.0 1.0 0.0 7.2 0 92 40 0 0 40 c3t0d0 0.0 149.0 0.0 1.2 0.0 1.0 0.0 6.5 0 97 40 0 0 40 c3t0d0 0.0 144.0 0.0 1.1 0.0 1.0 0.0 6.8 0 97 40 0 0 40 c3t0d0 0.0 140.0 0.0 1.1 0.0 1.0 0.0 7.4 0 96 40 0 0 40 c3t0d0 0.0 147.0 0.0 1.1 0.0 1.0 0.0 6.6 0 97 40 0 0 40 c3t0d0 0.0 156.0 0.0 1.2 0.0 1.0 0.0 6.2 0 97 40 0 0 40 c3t0d0 0.0 136.0 0.0 1.1 0.0 1.0 0.0 7.3 0 96 40 0 0 40 c3t0d0 0.0 108.0 0.0 0.8 0.0 1.0 0.0 9.1 0 98 40 0 0 40 c3t0d0 0.0 92.0 0.0 0.7 0.0 0.9 0.0 9.4 0 87 40 0 0 40 c3t0d0 0.0 45.0 0.0 0.4 0.0 0.6 0.0 14.4 0 37 40 0 0 40 c3t0d0 0.0 108.0 0.0 0.8 0.0 1.0 0.0 9.1 0 98 40 0 0 40 c3t0d0 0.0 112.0 0.0 0.9 0.0 1.0 0.0 8.7 0 98 40 0 0 40 c3t0d0 0.0 106.0 0.0 0.8 0.0 1.0 0.0 9.7 0 98 40 0 0 40 c3t0d0 0.0 108.0 0.0 0.8 0.0 1.0 0.0 8.9 0 97 40 0 0 40 c3t0d0 0.0 109.0 0.0 0.9 0.0 1.0 0.0 9.0 0 98 40 0 0 40 c3t0d0 0.0 111.0 0.0 0.9 0.0 1.0 0.0 9.3 0 98 40 0 0 40 c3t0d0 0.0 44.0 0.0 0.3 0.0 0.4 0.0 8.8 0 39 40 0 0 40 c3t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 40 0 0 40 c3t0d0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 12.3 0 2 40 0 0 40 c3t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 40 0 0 40 c3t0d0This device is the external disk array, which has both Oracle and /web on it.
There are a few iinteresting things to note here.
One is that there is basically no data being read from the array (r/s). This is good, becuase it means that all the data used for this query came from memory (we hope it didn't come from swap :).
Another is that a fair number of disk writes are happening (w/s). This is because there are a lot of log files being written to - redo, rollback, archive, trace files, web server logs... and they are all on this one RAID array.
The last interesting thing to note is that there are 40 software errors (s/w) being reported by the disk array. That's 40 total, probably since the last reboot, which is not a whole lot but it's 40 more than there should be. This is probably not important, but it might hint at a problem with one of the disks in the array.
The next thing to try here would be to start moving log and data files that are written to frequently to the internal disk, except it doesn't have a whole lot of room and I'm not sure I want to start doing that to a production system if I don't have to.... I'm going to see if we can get this recreated on another system, without the disk array, and see what happens.
I'm also going to give Oracle some of it's SGA back; a very short statspack snapshot shows we're now using 95% of the shared pool, which is now too high.
The saga continues....
Initially -- not knowing how the array is configured or it's state -- I thought that slow disk writes and contention might be contributing to the problem. RAID 5 arrays are slow on writes and degraded arrays (those with failed or missing drives) are too, but neither look to be a factor here.
It's a perplexing problem -- we're running several busy Oracle sites, one with a database in excess of 10GB and over 15 million hits per month on less hardware with much better performance.
But you may well be wasting your time. One, that problem is reasonably likely to be peculiar to your particular installation on that particular Sun box. Two, even if you fix the problem, and especially given that you're running a bunch of other software on the same low-end Sun box, that box is likely to still be far too wimpy to handle the loads you expect.
So why don't you just buy a dual Opteron or dual Xeon Linux box with 4 to 8 GB of RAM and a bunch of fast RAID disks, set that up, and do whatever further debugging and tuning you need to there? Maybe the mysterious performance problem will not reappear, which would be a nice bonus. If it does reappear on the new machine, that also would tell you something and might help your debugging. But the main point is that the few anecdotal reports we have from current high-volume dotLRN users seem to say that your current shared Sun box is unlikely to meet your needs, and that you're going to need a new machine anyway.
Of course, I suspect Mike and Janine know that, so what's the deal? No money in the hardware budget currently to buy a Linux box? Do you really think that shared Sun 280R will meet your client's needs? Or what?
Time constraints? Getting a new machine up and running going to take quite some time, of course. (Even longer if the customer has bureaucratic purchasing rules.) But Furfly already has various other Oracle installations up and working, right? So how about setting up a Heidelberg Dev site on an entirely different machine, using known-good hardware and a known-good Oracle instance? If the mysterious problem re-appears there as well, then you know with about 99% certainty that it's not the hardware or Oracle, that the mysterious problem has got to be in your site OpenACS, dotLRN, or AOLserver code or configuration.
It does indeed look like this has to do this particular setup and has nothing to do with .LRN/OpenACS. We just did a comparison on a MUCH SMALLER Sun box (with cob webs and all) and .LRN was faster than what we are experiencing now.
We are moving forward and will report when we find out the exact problem for posterity.
Carl
P.S. Mike wrote, "we're running several busy Oracle sites, one with a database in excess of 10GB and over 15 million hits per month on less hardware" and I am sure we can do the same thing with a .LRN site, it is just a question of some of the .LRN users cooperating on making it happen with gradual improvements over time.
maxconnections, maxthreads, and minthreads all set to 5 also seems low, but if this is just for the Dev server and you plan to bump those up for Production then that's probably ok for now.
The 2.2 or 2.9 s seconds shown above for the login page is mostly meaningless, as the time is all in adp_parse_ad_conn_file, which is normal on the very first hit of that page for the thread. The real question is how often does hitting that page give you the slow 2 s adp_parse_ad_conn_file time? Overall, it should be a very low percentage of times.
Normally, after you restart the server, it should run adp_parse_ad_conn_file once per page per thread, only, and then never again. But if your AOLserver is constantly creating and destroying new threads (because it's misconfigured), then adp_parse_ad_conn_file could be sucking up lots and lots of time - much more than just the 2 s per hit you saw on the login page, some pages can take 10 or 20 s or more, especially with slow Sparc CPUs.
That's all AOLserver Tuning 101 of course, but it is an easy mistake to make. From painful experience, I am very suspicious of your 120 s threadtimeout. I suspect that all 5 of your AOLserver threads are being killed and restarted every two minutes, which is an absolute performance killer - you really want to be sure you've ruled that out.
my feeling - after installing OpenACS and dotLRN more than a dozen time is that the problem is from OpenACS itself.
So one suggestion! Just to make sure that I am wrong! Please install on the same box a clean instance of OpenACS on another port using the same files on the machine but a new database service1 without any users batch synch'ed.
If I am right this instance will run very fast as lightning. And if it does there was a misconfiguration with the OpenACS params. Soplease post your kernel, main site and authorities params for a quick check (maybe you can simply make screenshots to save you time).
Greetings,
Nima
As Al mentions, Janine's wearing her MIT, not Furfly, hat on this one and presumably Mike took a look and chimed in as a personal favor.
This is very curious. If it takes more than two weeks to solve a bunch of us will probably spend our nights and mornings huddled around the box trying to figure out what's going on (since we'll be in Heidelberg).
I've tried cutting down Oracle's SGA so that the system is no longer using swap. Didn't help. I've tried moving the temp datafile, which was causing a lot of disk activity on the external RAID array, to the internal drive. I've tweaked the net8 files, and looked at things every which way. Found a few things not quite right, but nothing to fix the problem.
One major issue I still need to resolve is that the Oracle is version 8.1.7.0. I'm going to installt he 8.1.7.4 patchset as soon as someone in Germany can make the installation CD available, and I'm crossing my fingers that it will help.
I took the query from /dotlrn/admin/users and tweaked it every which way. What I found was that it's slow if it has to transfer rows from the database, even if it's only to do an order by. If there's no ordering and it's only returning a count then it's lighting fast, even with the two permission calls. Does this ring a bell for anyone?
I have no doubt that we will need to tweak the application to get this working, but until I can get decent performance out of sqlplus it seems rather pointless to try.
Yet the only page load times I've seen mentioned above are 2.2 and 3.0
s, with 2.1 and 2.9 s (respectively) of that time taken up in
adp_parse_ad_conn_file
. But adp_parse_ad_conn_file
should be a one time per page per thread overhead only, so taking that
out, we should left with about a 0.1 s or so page load time - quite
respectable!
So I guess those two examples must not be representative of the overall problem? In which case, just what does the overall performance problem look like from the users' point of view - just how slow are these pages really?
Janine, please turn on timed_statistics (need to bounce the instance, about 1% overhead) and don't worry about frequent snapshots. Each snapshots takes about a blocks to be written - this just pales in light of the expected load. If you put perfstat into its own tablespace you could measure it and you would NEVER worry about this again.
With timed_statistics=on all the empty columns will be filled with values and we will see on which wait event Oracle is losing time.
If you think it is the RAID or the Oracle instance which are slacking do this: create a a few files (large, mid-sized, small ones) and copy them around from a script. Measure the performance. Do the same with a few tables and a few access paths (table scan, index access, small commits) and measure the performance. What is the result?
A question to the Heidelberg people: is this the same machine that serves the WebCT production system? Were/Are you happy with its performance?
Don't ignore the SQL results!
Look how much acs-service-contracts queries are there. It almost looks like a denial-of-service attack: one query is executed 84.000 in about 4 hours. Another one just looks whether there is one service contract - probably just to be able to gracefully tell the user "service-contract foo$$%$%"bar doesn't exist.
What about aggressive caching for service contracts (and replacing it for the next release - I don't like the package anyway because I think it is a complexifying replacement for tcl namespaces that is ALSO expensive)
What is this query about: select dotlrn_communities_all.*,
dotlrn_community.url(... ? It is *extremely* expensive.
The next query in the ordered by gets is also extremely expensive: what does it do?
Number is probably using cc_users.
Can you at least try to cache the service-contracts and then take snapshots during the day when there is at least some activity on the system?
Just to mention one difference.
How are they expensive? At startup time procs are built for each live method for each contract implementation and these are called directly when you invoke a method.
So there's a little startup time but not much else.
The main problem with service contract is that they're hard to change once they're defined...
We will make sure to get any changes that where made on our server back into the source and into the documentation as soon as they have been peer-reviewed.
Thanks for all the help everyone (special thanks to Janine/Sloan/Collaboraid/Dirk).
Will post the details soon here.
Did you come from 3.3, or one of the 3.5 series?
--cro
just a short report regarding current performance status at University of Heidelberg.
On 21th October we moved AOLServer from our Sun, which hosted Webserver and Oracle at that moment, to a linux box, so our current infrastructre is:
- Front-End:
SuSe Linux 9.0
Processor: Athlon 1.8 GHz
Memory: 1.5 Gb
Network: Ethernet, 100Mb/s
DB-Client: Oracle
AOLServer: 4.0.8 (nsopenssl v3_0beta23 / tcl. 8.4.4)
OpenSSL: OpenSSL 0.9.7d 17 Mar 2004
MTA: Postfix 2.0.14
- DB-Server (as mentioned above):
Solaris 2.8 on Sun Fire 280r
Memory: 2 Gb
Disks: 2 * 36 Gb, 1 * 200 Gb raid
Network: Ethernet, 100Mb/s
DB-Server: Oracle 8.1i (patched)
Additional software running until end of year: webct
- Some facts about our dotLRN-installation:
40282 ACS-Users
2580 dotLRN-Users
22 class instances (current semester, about 40 total), 10 communities / 72 subcommunities
162641 ACS-objects, 114108 ACS-permissions and 11129 fs-objects
- Performance relevant AOLServer configuration parameters:
maxthreads 25
minthreads 20
(Changed minthreads != maxthreads, because there seem to be still some memory leakage issues)
threadtimeout 3600
stacksize 512 Kb
db-pool connections: first 20, second 10, third 5
keepalivetimeout 5
maxkeepalive 100
maxconnections 100
Some objective measurements of page response times before and after migration (calculated by measure-resonstimes.sh):
- /dotlrn
before: 2894 ms
after: 841 ms (internal measure by developer support: 80-100 ms)
- /dotlrn/calendar/cal-item-new
before: 2380 ms
after: 633 ms
- /dotlrn/manage-memberships:
before: 3597 ms
after: 1834 ms
- /dotlrn/classes/3520praktischeinformatik/3520urztest/3520urztest/
before: 7575 ms for class start page and 4589 ms for class file-storage
after: 3093 ms for class start page and 1335 ms for class file-storage
(1694 ms for class start page after removing subgroup- and homework-portlet!)
Alltogether, each page became about twice faster, although there might be still enough things to tune...
(E.g.: For very large query result sets AOLServer seems to need much more (exponential) time to parse the template via templating system compared to very fast output by manual ns_write commands.)
But whatever information you can give us would be useful.
How much space are you setting up for ns_cache? Have you monitored performance to make sure it's large enough to be caching everything?
Yes, to be honest, with "exponential" I went really over the top.
This kind of behavior was observed, when we ran AOLServer3.3 on Sun Solaris. There we often had, regardless of special pages/nodes, statistics like less than 800ms for db queries, but more than 10000 ms total time for request processor.
In avoidance of any severity expression :), I noticed following performace facts:
- Requesting pages not belonging to or at least not portal-rendered by dotLRN are running faster, i.e. db query time and request processor delivery time are getting very close to each other.
This make sense, because rendering portals require "some" extra steps to be done.
- Especially the memberportlets of dotLRN-(sub-)groups have some weird statistic values (ok, about 200 members in this example):
25 database commands totalling 607 ms
page served in 12360 ms
Although statistics for subcomm's member administration page, which additionally contains super comm's users not yet included in subcommunity, shows statistics like:
20 database commands totalling 5333 ms
page served in 9805 ms
I wonder, if this behavior may be caused by nested loops, so it would be no problem of templating system itself (Maybe, should do a diff to dotLRN 2.1-queries... upgrading to 2.1 soon :).
Regarding templating system, I made a simple performance comparison performance by just displaying some information for a set of dotLRN users. For the first check I used ns_write output and for the second one templating system with multirow. Results (manually measured by clock):
- 2500 Users:
a) ns_write: 7 seconds total (including db query)
b) template: 10 seconds total
- 5000 Users:
a) ns_write: 20 seconds
b) template: 30 seconds
- 7500 Users:
a) ns_write: 42 seconds
b) template: 62 seconds
Maybe, I have to consider that some extra seconds are caused, because templating system first completely builds html result before sending it back to the browser, so ns_write has a little head start.
Don, you mentioned space set up for ns_cache. Do you mean Kernelparameter Memoize-MaxSize? It was 200000 and I set it to 300000 not knowing if this is an reasonable value.
ns_cache stats says:
Cache Max Current Entries Flushes Hits Misses Hit Rate
util_memoize 300000 299932 2229 5685 2222425 77774 96%
secret_tokens 32768 4080 102 0 2326 102 95%
nsfp:product 5120000 1364032 61 0 10550 61 99%
ns:dnshost 100 0 0 0 0 0 0%
ns:dnsaddr 100 1 1 0 7 1 87%
Is this MaxSize-Parameter limited by ns_configured StackSize (right now 512 Kb) or is this parameter independent?
What about nsv_buckets? Are those performance relevant?
nsv:7:product 17 3046879 44932 1.47468934605
nsv:6:product 18 1743471 221 0.0126758632636
nsv:5:product 19 33300250 120067 0.360558854663
nsv:4:product 20 455283 15 0.00329465409427
nsv:3:product 21 5668477 29275 0.516452655625
nsv:2:product 22 1415574 1310 0.0925419653088
nsv:1:product 23 763659 63 0.00824975545368
nsv:0:product 24 306966 6 0.00195461386603
ns:cache:util_memoize 81 2337795 4334 0.185388368099
Don't know, if mutex locks still use them...
Thanks for your answers & help and sorry for this exaggerated, not true performance severity statemant (dreaming for O(log(n)) 😊 ).
Martin
P.S.: Just one O(n^2) left: Our logger installation:
Only 184 entries, but about 100 seconds to display index page... but that's really a problem of logger itself.
Your first order of business is to determine where and how AOLserver is spending its time, and so far I don't think you've done that. You posted your AOLserver thread settings above, good.
Now, find a particularly slow page. Hit it, and look at the Developer Support data. Very Important: Note whether or not this was the first time this thread served this page. Developer Support currently doesn't tell you this directly, so this isn't quite as simple as it could be, but by looking at the Developer Support info and/or the AOLserver log, you should be able to figure it out.
Now hit the same page again, and get it to run in a Thread which has served this same page before. Compare the Developer Support numbers between the 1st-time-in-this-thread and Nth-time hits on that page. This is key.
For all hits other than the 1st hit per thread per page, the
page should be fast. If it is not, that is interesting and
we want to know why. If only the 1st hit per thread per page
is slow, and nearly all the time is being taken up in
adp_parse_ad_conn_file
, then that's normal.
That's why Don was asking you about ns_cache, etc. above. If your
cache isn't big enough, presumably the cached compiled Templating
System pages might get thrown out, and then you'd end up running the
(expensive) adp_parse_ad_conn_file
stuff over and over
again many times per page per thread - not good.
Yes, nsv_buckets can certainly be performance relevent, but it's very unlikely that your slow pages are being caused by mutex contention for the nsv buckets. If you want to check, make sure you have "ns_param mutexmeter 1" in "ns_section ns/threads" in your AOLserver config file, then use the AOLserver nstelemetry page to check for lock contention.