Forum OpenACS Q&A: Re: Lazy site node caching

Collapse
4: Re: Lazy site node caching (response to 1)
Posted by Malte Sussdorff on
The lazy site node caching is now committed to HEAD (acs-tcl). It is tested on a SINGLE postgres driven website, no cluster, no oracle. => I would not be surprised to find problems.

I will make a slow rollout to other testing sites of mine, and start working on a cluster soon. But if Agustin could test this as well I would be delighted.

It should get rid of your need to synchronize the site node cache between the cluster nodes EXCEPT when renaming or deleting a site_node. But for that you could probably write a quick script that calls all other servers and removes the specific site_node from the array.

Collapse
Posted by Jose Agustin Lopez Bueno on
Hi, Malte!

We are testing it and I get the first problem. The function
dotlrn::is_package_mounted is not detecting the attach mounted below dotlrn package in the OpenACS start process.
Next is the error log:

[13/Mar/2007:12:56:41][14966.16384][-main-] Notice: Loading packages/dotlrn/tcl/dotlrn-init.tcl...
[13/Mar/2007:12:56:41][14966.16384][-main-] Notice: dotlrn-init: starting...
[13/Mar/2007:12:56:41][14966.16384][-main-] Notice: dotlrn-init: attachments being automounted at /dotlrn/attach
[13/Mar/2007:12:56:41][14966.16384][-main-] Notice: dotlrn::mount_package: object_type apm_package url /dotlrn/ object_id 2121 instance_name dotLRN
package_type apm_application package_id 2121 name dotlrn node_id 2120 has_children_p 1 directory_p t package_key dotlrn pattern_p t parent_id 498
NOTICE: adding missing FROM-clause entry for table "acs_object_id_seq"
CONTEXT: PL/pgSQL function "acs_object__new" line 17 at SQL statement
PL/pgSQL function "site_node__new" line 23 at assignment
[13/Mar/2007:12:56:41][14966.16384][-main-] Error: Ns_PgExec: result status: 7 message: ERROR: duplicate key violates unique constraint "site_nodes_un"
CONTEXT: SQL statement "INSERT INTO site_nodes (node_id, parent_id, name, object_id, directory_p, pattern_p) values ( $1 , $2 , $3 , $4 , $5 , $6 )"
PL/pgSQL function "site_node__new" line 35 at SQL statement

[13/Mar/2007:12:56:41][14966.16384][-main-] Error: dbinit: error(pizarradb.uv.es:5433:openacsdb_5_2_desa,ERROR: duplicate key violates unique constraint
"site_nodes_un"
CONTEXT: SQL statement "INSERT INTO site_nodes (node_id, parent_id, name, object_id, directory_p, pattern_p) values ( $1 , $2 , $3 , $4 , $5 , $6 )"
PL/pgSQL function "site_node__new" line 35 at SQL statement
): '

select site_node__new(NULL,'2120','attach',NULL,'t','t',NULL,NULL)
...

Regards,
Agustín

Collapse
Posted by Jose Agustin Lopez Bueno on
Other point.

If we generate one community the speed is more fast
but we can not access to that group since one server
restart.

(we are doing the tests in a cluster with only
one member)

Agustín

Collapse
Posted by Jose Agustin Lopez Bueno on
This patch for acs-tcl/tcl/site-nodes-procs.tcl
(line 551) in function site_node::get_from_url
resolve create new community problem (the new community
is not show without a server restart):

if {[catch {nsv_get site_nodes "${new_url}/"} result] == 0} {
set node_id ""
} else {
if {$new_node(has_children_p) && [lsearch $acs_subsite_dir_list $name] == -1} {
set node_id [db_string node_id "select node_id from site_nodes where parent_id = :parent_id and name=:name" -default ""]
ns_log Debug "Loading from the database $test_url $name $parent_id"
} else {
set node_id ""
}
}

Collapse
Posted by Malte Sussdorff on
Simple answer: .LRN sucks. More correct answer:

.LRN makes use of the NSV Array directly. I fixed the issues during startup, but the fact remains that .LRN has rewritten site nodes in dotlrn/tcl/site-nodes-procs. These procedures should not be in the .LRN package in the first place. But they are there and I don't have the time (neither the client the budget as he isnt using .LRN) to fix this at the moment.

Here is the fix for .LRN initialization

===================================================================
RCS file: /cvsroot/openacs-4/packages/dotlrn/tcl/applets-procs.tcl,v
retrieving revision 1.20
diff -r1.20 applets-procs.tcl
38c38
< if {[nsv_exists site_nodes "[get_url]/"]} {
---

if {[site_node::get_node_id -url "[get_url]/"] ne ""} {
Index: tcl/dotlrn-procs.tcl
===================================================================
RCS file: /cvsroot/openacs-4/packages/dotlrn/tcl/dotlrn-procs.tcl,v
retrieving revision 1.75
diff -r1.75 dotlrn-procs.tcl
108d107
< FIXME: refactor
110,122c109,115
< set dotlrn_ancestor_p 0
< set package_list [nsv_array get site_nodes "[get_url]/${package_key}*"]
<
< for {set i 1} {$i < [llength $package_list]} {incr i 2} {
< array set package_info [lindex $package_list $i]
<
< if {[site_node_closest_ancestor_package -default 0 -url $package_info(url) [package_key]] != 0} {
< set dotlrn_ancestor_p 1
< break
< }
< }
<
< return $dotlrn_ancestor_p
---
set site_node [site_node::get_node_id -url "[get_url]/${package_key}/"]
if {$site_node eq ""} {
return 0
} else {
return 1
}
Collapse
Posted by Jose Agustin Lopez Bueno on
Ok.

Your code works. At this moment all works except:

-Some portles displaying the msg (example):
Error in include template "/var/lib/aolserver/oacs_5_2/packages/lorsm/lib/user-lorsm": can't read "url_by_node_id(17673735)": no such element in array

-xotcl:
site_node::get "must pass in either url or node_id"
while executing
"error "site_node::get \"must pass in either url or node_id\"""
(procedure "site_node::get" line 4)
invoked from within
"site_node::get -url $mount_url"
(procedure "::Generic::package_id_from_package_key" line 4)
invoked from within
"::Generic::package_id_from_package_key xotcl-request-monitor"

I will like resolve these problems before test the performance in our true cluster.

Any pointer?
Agustín

Collapse
Posted by Malte Sussdorff on
package_id_from_package_key makes a round trip to the site node cache to figure out the package_id. Not sure why, but thats how it is. You can easily circumwent this by changing the way xotcl requests the package_id or by running site_node::update_cache with the node_id of the object = package_id from the request processor on initialization of the request processor.

I dont have a checkout of LORSM from 5.2, but in HEAD I could not find anything which could cause this behaviour. If you could specify the error a little bit larger (not only this message but the whole text from the error log) then i might be able to help more.

In general there should be no direct nsv call to the site node. In HEAD it is done using site_node::get_url_from_object_id and this should work just fine (at least the procedure should be unable to fail), but maybe you have a culprit there.

Hi Malte!

We are working in our production cluster with your code
with some small mods. The speed are very increased.
Thanks!

We have since 400 concurrent connections.
If anybody want to know how is our system:

http://aulavirtual.uv.es/ficheros/view/imagenes%5C/CLUSTER_AulaVirtual_pub.gif

NOTES:

Another patch (line 360, site-nodes-procs.tcl):
db_foreach $query_name {} {
if {$parent_id eq ""} {
# url of root node
set url "/"
} else {
# append directory to url of parent node
if { [info exists url_by_node_id($parent_id)] } {
set url $url_by_node_id($parent_id)
} else {
set url [db_string snid {select site_node__url(:parent_id)} -default ""]
}
append url $name
if { $directory_p eq "t" } { append url "/" }
}

Regards,
Agustin

THis is excellent news Agustin. If you have the time maybe you can patch the file in CVS HEAD directly and upload it? This would be great.

.LRN Honchos. Is it okay to include the patch above in 5.3 or should I only commit this to the HEAD version of .LRN?

Collapse
Posted by Gustaf Neumann on
changed lookup of package_id from package key such it does not need the site nodes in xotcl-core (cvs HEAD).