Forum OpenACS Development: TCL Parse Optimization

Posted by Rafael Pastor Vargas on 10/23/08 08:37 PM

Hi all, we are profiling the .LRN optimization for UNED and we 've detect that the tcl parser for adp pages is a little slow. Any of us have been working in this feature?

Regards,
Rafael

PD: Maybe this is a question for OpenACS forums, but it is interesting for .LRN community, so i put this here...

2: Re: TCL Parse Optimization (response to 1)

Posted by Dave Bauer on 10/23/08 08:47 PM

Hi,

Can you give a little more detail on exactly what you found to be slow and how you tested it?

Thanks!

4: Re: TCL Parse Optimization (response to 2)

Posted by Marcos Serrano on 11/11/08 11:15 AM

Here, at Innova-UNED, we have been testing some uses of the parser and collected the results.

First of all we needed to know if some of the problems came due to our development over dotLRN or the dotLRN code itself. To test this we use the Test-servers and several other internal instances.

These are the differences between the machines:

--
dotaLF-1 (this is the codename of the Innova development)

Debian Etch 4.0
aolserver: v4.0.10
TCL: v8.4.12
XOTcl: v1.6.1
libthread: v2.6.5
RAM: 16G
RAM bus speed: 667MHz
2 XEON CPUs Quadcore 2,33Ghz
CPU bus speed: 1333MHz

This instance has an Oracle Database with nearly 130.000 users, and a 62G content-repository.

dotaLF-2

Debian Etch 4.0
aolserver: v4.0.10
TCL: v8.4.12
XOTcl: v1.5.6
libthread: v2.6.5
RAM: 32G
RAM bus speed: 400MHz
4 XEON CPUs Dualcore 3,4Ghz
CPU bus speed: 800MHz

This instance has an Oracle Database with nearly 130.000 users, and a 62G content-repository.

dotLRN-1

Debian Sarge 3.1
aolserver: v4.0.10
TCL: v8.4.9
XOTcl: v1.5.3
libthread: v2.6.1
RAM: 1,2G
RAM bus speed: 133MHz
2 Pentium III CPUs 1,3Ghz
CPU bus speed: 133MHz

This instance has an Oracle Database with 9 users, and a 2,2M content-repository.

We started with some web requests with the developer-support tool activated and checked how many time the parser and the database took.

The URLs testeds were like this ones:
/
/dotlrn/
/dotlrn/communities
/dotlrn/clubs/innova/forums/message-view?message%5fid=10399941
/dotlrn/?page_num=1
/dotlrn/?page_num=2
/dotlrn/clubs/pruebasdeinnova/one-community?page_num=0
/dotlrn/clubs/pruebasdeinnova/one-community?page_num=1
/dotlrn/clubs/pruebasdeinnova/one-community?page_num=2
/dotlrn/clubs/pruebasdeinnova/one-community-admin
/dotlrn/clubs/pruebasdeinnova/community-edit
/dotlrn/clubs/pruebasdeinnova/one-community-portal-configure
/dotlrn/clubs/pruebasdeinnova/member-email
/dotlrn/clubs/pruebasdeinnova/members
/dotlrn/clubs/pruebasdeinnova/uforums/admin/permissions?object_id=16647427

Repeated several times with an external script.

Here are some of the results:

dotaLF-1
286 web requests:
Total time average: 554s
Parser time average: 289s
Database time average: 138s

dotaLF-2
286 web requests:
Total time average: 1100s
Parser time average: 646s
Database time average: 271s

dotLRN-1
170 web requests:
Total time average: 509s
Parser time average: 482s
Database time average: 70s

Besides these times, using the platform we've noticed a slow behavior, with 3-4s request when only 20 users appeared in the request-monitor (logged in the last 10 minutes). Are there any other tools we could use to test the performance in dotLRN?.

Interesting things are that parse time is 50-60% of the total time in the dotaLF instances. That's something we have to analyze properly.

We miss some testing in a clean dotLRN instance with a huge database, like 100.000 users and 50-100G content-repository. Is there anyone that could post some test like those ones in their instances?.

Has anyone tested the parser so far in big dotLRN instances? in order to know if the problem is the platform scalability or some other problem we haven't found.

6: Re: TCL Parse Optimization (response to 4)

Posted by Gustaf Neumann on 11/12/08 09:13 AM

Marcos, how did you measure "parser time"? The developer support lists in the toolbar e.g. the page_serve_time and db-time, where the first is essentially measured in [clock clicks -milliseconds] since request start. This time includes every computation necessary triggered by the request (time in Tcl and in aolserver). So, i don't see, how the the parse time can be seen by looking at the developer support. Obviously, you did something to get the aggregated values. What did you measure exactly?

7: Re: TCL Parse Optimization (response to 6)

Posted by Marcos Serrano on 11/12/08 10:02 AM

The times we took came from the developer-support request info (/ds/request-info?request=XXX). The ones below are an example.

"Parser time" is the time showed in the request processor part:

--
Served file /web/dev/packages/dotlrn/www/index.adp with adp_parse_ad_conn_file - 1319.0 ms
--

Total time, in the parameter section:

--
Request Duration: 1519 ms
--

And the "Database time" is the total duration of the database operations, at the end of the page:

--
725 Total Duration (ms)
--

If those are not the parameters we look for, there is another tool we can use to find out where the bottleneck could be?.

We are worried mainly because of the difference between the "total time" and the "parser time" is too big.

8: Re: TCL Parse Optimization (response to 7)

Posted by Gustaf Neumann on 11/12/08 10:53 AM

The name of the proc is misleading, since it really executes the tcl/adp pair. Especially for /dotlrn the time depends on many factors, e.g. on what's portlets are on the start page, and how many memberships a user has. Another important spot are the master-templates (if you concerned about speed, try to avoid db-requests in the master templates).

On our production system, we have currently 763 users active, and i get for my dotlrn start page the following output

+0.1 ms: Served file /var/www/production/packages/dotlrn/www/index.adp with adp_parse_ad_conn_file - 1.7 ms

from the line you are looking at. We have essentially a 5.3.0 kernel with many local changes and optimizations (last time i counted, there were 15000 differences). This line is certainly no good starting point for optimizations.

My recommendation is to look at the most frequently used pages on your system (request-monitor/stat-details) and optimize the top 10%. For these pages, try to reduce the SQL queries (get rid of unneeded features, use caching) and optimize these.

3: Re: TCL Parse Optimization (response to 1)

Posted by Dave Bauer on 10/23/08 08:48 PM

I moved this to OpenACS Development. That is the proper forum for this. I realized we had "move to other forum" feature :)

5: Re: TCL Parse Optimization (response to 1)

Posted by Dave Bauer on 11/11/08 04:41 PM

Its not clear exactly what developer support is measuring.

When an ADP is parsed the entire ADP is converted to a Tcl script. In addition the tcl scripts for each Tcl/ADP pair must be interpreted. I am not sure at which point includable pages are interpreted also.

Checking different machines and different database sizes should not affect the speed of parsing the ADP.

9: Re: TCL Parse Optimization (response to 1)

Posted by Marcos Serrano on 11/19/08 10:04 AM

Gustaf, Dave, thank you for your answers and advices. We are making progress in detecting some bottlenecks using them.

I hope we could come back to the forum soon with some of the results, so it could help other people with the sames problems.

10: Re: TCL Parse Optimization (response to 9)

Posted by Don Baccus on 11/24/08 03:28 PM

The template processing code caches the compiled byte-code for each template file it encounters.

One problem with .LRN's "new-portal" packages - used heavily in .LRN - is that it dynamically generates a template string, which is then parsed. The result of this is not cached ... the portlet code it includes should be cached, but the execution of the wrapper and I believe the master (which in .LRN is a bit complex) won't be.

Just more detail to consider.