Forum OpenACS Q&A: error in rp_handler: can't read "tcl_url2file

Hi

We have a strange issue and I'm not doing very well in resolving it. Now of course I realise that we're running a very old version of OpenACS, but I think this area of the requst processor hasn't fundamentally changed much.

Firstly, I'd like some feedback on whether the following is actually a problem. We're seeing a lot of these in the error log:

error in rp_handler: can't read "tcl_url2file(/forms/admin/question-add)": no such element in array

Is this actually a problem? If it is, what is the cause? Looking at rp_handler, I can see that it's this line that's triggering the error, but I don't understand why it would fail:
set file $tcl_url2file([ad_conn url])

Secondly, the behaviour we are seeing is a form, that when submitted, seems to submit, but then it does not actually advance to the next page. This particular page is using the ACS Templating "wizard" functionality (which I'm not familiar with, I must admit). What is interesting is that this is only happening on one installation, and only intermittently. None of our other installations have reported the issue.

I enabled ds logging and this was the output:
Request Processor

+002.0 ms: rp_filter: setting up request: GET /forms/admin/question-add wizard_stepwizard5=2&wizard_name=wizard5&survey_id=10047281§ion_no=1&page_no=1&jumpto_id=9433&show_db_fields_p=f&show_logic_fields_p=t&show_alias_fields_p=f&show_trigger_fields_p=t&question_no=3&grouped_question_no_id=9433&after=3&mode=beside&presentation_type=label&required_p=f&active_p=t&n_responses=&multiple_response_p=f&wizard_visitedstepwizard5=
+107.0 ms: rp_filter: return filter_ok
+108.0 ms: Invoking preauth filter rp_dummy_proc
+108.0 ms: Applied GET filter: (for /forms/admin/question-add rp_dummy_proc) - 0.0 ms
returned filter_ok
+108.0 ms: Done invoking preauth filter rp_dummy_proc (returning filter_ok)
+109.0 ms: rp_handler: handling request: GET /forms/admin/question-add?wizard_stepwizard5=2&wizard_name=wizard5&survey_id=10047281§ion_no=1&page_no=1&jumpto_id=9433&show_db_fields_p=f&show_logic_fields_p=t&show_alias_fields_p=f&show_trigger_fields_p=t&question_no=3&grouped_question_no_id=9433&after=3&mode=beside&presentation_type=label&required_p=f&active_p=t&n_responses=&multiple_response_p=f&wizard_visitedstepwizard5=
+109.0 ms: error in rp_handler: can't read "tcl_url2file(/forms/admin/question-add)": no such element in array
+109.0 ms: Trying rp_serve_abstract_file /web/aimsprotoff/www/forms/admin/question-add
+109.0 ms: File /web/aimsprotoff/www/forms/admin/question-add: Not found
+109.0 ms: Applied transformation from /web/aimsprotoff/www / forms/admin/question-add -> ? - 1.0 ms
+109.0 ms: Trying rp_serve_abstract_file /web/aimsprotoff/packages/smart-forms/www/admin/question-add
+110.0 ms: Served file /web/aimsprotoff/packages/smart-forms/www/admin/question-add.adp with adp_parse_ad_conn_file - 365.0 ms
+475.0 ms: Invoking trace filter ds_trace_filter
+475.0 ms: Applied GET filter: (for /forms/admin/question-add ds_trace_filter) - 1.0 ms
returned filter_ok
+477.0 ms: Done invoking trace filter ds_trace_filter (returning filter_ok)

Any suggestions?
Brian

Collapse
Posted by Gustaf Neumann on
I have never seen this error, but that can be a question of configuration options. What I read from the the code, the variable tcl_url2file() is used to cache the mapping from a URL of the requestor to a file only when PerformanceModeP is set (ignoring virtual host mappings, etc.).

My suggestion is to check on the installations where the issue shows up, whether this is true. If so, turn PerformanceModeP (parameter of ACS Kernel) off. If this helps, check, whether there is any noticeable difference in performance (i doubt it).

Collapse
Posted by Brian Fenton on
Thanks Gustaf

I can confirm that turning off PerformanceModeP appears to fix the problem we were having with the form not advancing to the next page. But the client won't be happy with turning that off, as their performance was considerably worse with it switched off, so I need to come up with a solution to this. I have reproduced in on other installations too, so it seems to be a general issue when PerformanceModeP is switched on.

Since as long as I can remember, we always have PerformanceModeP turned on for production installations. I thought that was the official OpenACS policy?

In fact, in the case of this installation, PerformanceModeP was somehow turned off for a period, the client complained of slow performance, and we switched it back on. The performance improved considerably, but now the client is complaining of this occasional strange behaviour in the form using the "wizard" functionality.

Looking at rp_handler, I am a little confused about the logic of the use of tcl_url2file array. Whether PerformanceModeP is on or off, it always writes the file to tcl_url2file([ad_conn url]), which seems pointless when PerformanceModeP is off. But that's an aside really to my issue.

When PerformanceModeP is on, it tries to read the cached value back from tcl_url2file([ad_conn url]). Naturally, the first time, it's not going to find the value, so I don't understand why it reports an error - maybe just for debugging purposes?.
What is really confusing me is that sometimes the error we see is "no such element in array", but also often we see "no such variable", which doesn't make sense to me. Could that be an issue with the interaction of the "global" command, arrays and threads?

[25/Sep/2017:12:36:51][2034.754395456][-default:0-] Notice: RP (310 ms): error in rp_handler: can't read "tcl_url2file(/admin/site-map/)": no such variable
[25/Sep/2017:12:37:50][2034.756508992][-default:1-] Notice: RP (134 ms): error in rp_handler: can't read "tcl_url2file(/admin/site-map/)": no such element in array

We're seeing these errors even on repeated calls to the URL, where you would expect that the 2nd and subsequent calls find both the array and they key value.

(I see in the most recent OpenACS version it has been changed from global to the :: namespace - I tried changing our version to use this, but unfortunately it exhibits the same behaviour).

I'm going to focus now on the wizard tool to see if there is an obvious fix to it, but would appreciate any suggestions to resolve the tcl_url2file behaviour.

thanks
Brian

Collapse
Posted by Brian Fenton on
Sorry, I should have mentioned that this is an AOLserver 4.5 system running on Linux.

I have also just tested it on one of our Windows Naviserver installations, and while it seems to be much less frequent, there still are a number of the same errors (both types) in the error log.

On a clean OpenACS 590 Windows Naviserver install, there is only one error, which is expected, the first time the tcl_url2file is accessed:

[25/Sep/2017:16:39:45][6356.718][-conn:openacs:2-] Notice: RP (37 ms): error in rp_handler: can't read "tcl_url2file(/admin/site-map/)": no such variable

Collapse
Posted by Andrew Piskorski on
Brian, do you have the DebugP or LogDebugP ACS Kernel parameters set to true? If so, turn them off, and those Notice messages from rp_handler should go away.

tcl_url2file is a per-thread global array variable, so given that you get it that "no such variable" error at all, getting it more than once makes sense. AFAICT, tcl_url2file is only used 3 times total anywhere in the OpenACS code, all of them in rp_handler. And there, the ONLY time it is read from, is inside a catch. However, it does call rp_debug with any of the error messages caught, which logs the Tcl error as a Notice (not an Error), but only if the DebugP or parameters were true at NaviServer startup time. (If they were false, the rp_debug proc is defined to do nothing.)

Gustaf made some fixes to that rp_handler code in 2016 and 2017, but it looks like the same basic design, including the catch, had been present at least since OpenACS 5.2.3 10+ years ago, and probably longer than that. I don't understand why rp_handle is (apparently) designed to rely upon that catch, but it's been that way a long time.

Collapse
Posted by Brian Fenton on
Thanks Andrew, I appreciate the response.

The error happens even when DebugP or LogDebugP parameters are false. This is because the call to rp_debug forces the debug_p parameter to "t" i.e.
rp_debug -debug t "error in rp_handler: $errmsg"

Thanks for explaining that tcl_url2file is per-thread - I suspected something like that was going on. Makes you wonder if global array was the best choice for something like this. But at least I understand now that those errors are not necessarily a problem.

The problem we've been looking into is in the interaction of the PerformanceModeP and the "wizard" (see https://openacs.org/api-doc/proc-view?proc=template::wizard::create&source_p=1 ). I'm still trying to fully understand it, but it appears that the wizard is doing some kind of internal redirect that messes up the PerformanceModeP caching, and sometimes the form submission doesn't advance to the next page.

My issue now is that our clients claim they see a marked improvement in performance with PerformanceModeP switched on, so it's not as simple as just switching it off.

Brian

Collapse
Posted by Brian Fenton on
Some small progress made here. If I rewrite rp_handler, so that the tcl_url2file is using an ns_set instead of the global variable, it resolves the cache misses. Probably an extra check to nsv_exists before the nsv_get would clean up the "no such variable" error that shows up on the first hit of a new URL, but this actually could be useful to detect missing files.

My changes were this when reading:
set file [nsv_get tcl_url2file [ad_conn url]]
set path_info [nsv_get tcl_url2path_info [ad_conn url]]

and this when setting:
nsv_set tcl_url2file [ad_conn url] [ad_conn file]
nsv_set tcl_url2path_info [ad_conn url] [ad_conn path_info]

Please correct me if I have misunderstood here that ns_set/ns_get is a better option in this case than the global variable? I'm not very familiar with its performance and reliability across multiple threads versus global variables.

Unfortunately, I still haven't debugged the black magic that is the wizard functionality. So this continues to be an issue. Worst case scenario I will have to put a hack into rp_handler not to use PerformanceModeP for URLs that use the wizard - ugly I know.

Collapse
Posted by Gustaf Neumann on
As Andrew points out, the problem is actually no technical problem, but a reporting/debugging problem: The catch was most likely introduced to filter the cases, where the variables are not set; in this cases the logic falls back to the non-cached case - which is fine. Probably, the implementer was just lazy to test for the cached variables separately.

By reading your first posting, i realized, that you were just concerned about the entry in the log file, but you noticed no erroneous behavior.

To get rid of this wrong (debug) message, i made the change [1] to the request-processor-procs.tcl on HEAD.

all the best
-gn

[1] http://fisheye.openacs.org/browse/OpenACS/openacs-4/packages/acs-tcl/tcl/request-processor-procs.tcl?r1=1.119&r2=1.120

Collapse
Posted by Andrew Piskorski on
The error happens even when DebugP or LogDebugP parameters are false. This is because the call to rp_debug forces the debug_p parameter to "t" i.e. rp_debug -debug t "error in rp_handler: $errmsg"
No, I don't think so. (Did you actually try it?) Because when the DebugP or LogDebugP parameters are false at server startup time, the rp_debug proc literally does nothing - it is defined to immediately return. From reading the code, what you are talking about with "rp_debug -debug t" only applies when NaviServer was started up with one of the DebugP or LogDebugP parameters set to true.

Btw, I don't personally like this style of conditionally defining entirely different procedure definitions. But in this case, maybe it was done for performance.

Collapse
Posted by Andrew Piskorski on
Brian, on your "wizard" problem, I have no idea, I've never used that stuff.

In your Request Processor log above, the "Applied transformation from /web/aimsprotoff/www / forms/admin/question-add" stuff is surprising and mysterious to me, but then I'm not familiar with the guts of the Request Processor.  So that behavior might be completely normal and have nothing to do with your wizard problem.

Collapse
Posted by Brian Fenton on
Dear Gustaf

thanks for your updated code - that will definitely remove the error log messages. Once I understood the messages weren't a problem, I didn't mind them appearing in the log - in fact they actually could be useful to detect missing files.

A couple of questions for you:
1.  does putting the tcl_url2file array into the :: namespace guarantee consistency across all threads (in the same way that nsv_set does in my previous message)? It doesn't seem to on AOLserver, but maybe it does on Naviserver. Would you please take a minute to explain the benefits of this over nsv_set/nsv_get?

2. Whether PerformanceModeP is on or off, it always caches these values:
    set ::tcl_url2file([ad_conn url]) [ad_conn file]
    set ::tcl_url2path_info([ad_conn url]) [ad_conn path_info]

Seeing as this proc is called on every single request, wouldn't it be a boost to only do this when PerformanceModeP is on? i.e.
    if { [rp_performance_mode] } {
      set ::tcl_url2file([ad_conn url]) [ad_conn file]
      set ::tcl_url2path_info([ad_conn url]) [ad_conn path_info]
    }

thanks
Brian

Collapse
Posted by Brian Fenton on
Hi Andrew

sorry I should apologise. I'm running a much older version of OpenACS where the rp_debug proc isn't defined conditionally. That explains the confusion. I should have checked that the latest version had changed.

Brian

Collapse
Posted by Gustaf Neumann on
Question 1: does putting the tcl_url2file array into the :: namespace guarantee consistency across all threads (in the same way that nsv_set does in my previous message)?

no, global Tcl variables are not shared, i have left that in the way, the original implementer of the performance-mode designed it (ages ago). actually, this variables are per-request only, since these are in the global Tcl namespace, so the only caching is within a single request. Not sure, if this was the intention. I would think that a non-global Tcl namespace would be more proper (but i have not studied the so-called performance mode in detail). In general, Tcl variables have some advantages over nsvs. When using nsvs, locks are required, while in the case for Tcl variables, parallel running requests are not influenced and can run independently on e.g. different cores without any need of context switches etc.. So, if one is really concerned about performance and scalability, Tcl variables are better scalable

Question 2: Whether PerformanceModeP is on or off, it always caches these values .. Seeing as this proc is called on every single request, wouldn't it be a boost to only do this when PerformanceModeP is on?

I would not expect a "boost", rather that the difference will be hardly noticeable. On openacs.org all timings are in the range of a single microsecond or below (having maybe time granularity issues). The following commands measure in the ds/shell the basic involved commands, without taking multi-threading in account (no contention on the involved mutex). The slowest call is actually "ad_conn", which is repeated in the code, The biggest improvement would be to avoid the redundant calls to ad_conn.

lappend _ [time  {if { [rp_performance_mode] } { set ::tcl_url2file(/) /foo} } 100000]
lappend _ [time  {set ::tcl_url2file(/) /foo} 100000]
lappend _ [time  {nsv_set urlpath_file / /foo} 100000]
lappend _ [time  {ad_conn url} 100000]

set _
{0.30203 microseconds per iteration} {0.29609 microseconds per iteration} {0.33538 microseconds per iteration} {1.22374 microseconds per iteration}
You mentioned that "the client complained of slow performance" when performance mode is off. Do you have some data backing this? It can't be due to the urlpath mapping.
Collapse
Posted by Brian Fenton on
Thanks Gustaf, it's all starting to make more sense now. I can see that my approach was quite naive. Very interesting that ad_conn is the slowest part.

The performance issue that our client reported, that was resolved by switching on PerformanceModeP, was a page that had a lot of Javascript that made large numbers of AJAX calls (each to the same URL). This page was very slow to load (up to 4 minutes), and after switching on the parameter, it loaded in less than 10 seconds. However, I have just learned from another developer that actually there was a bug in that page that caused a multiplier effect in the number of AJAX calls, so that e.g. instead of 4 or 5 calls, there were 100s. This has now been fixed, so probably PerformanceModeP will no longer have the same strong effect, but still it makes an interesting case for the benefits of PerformanceModeP.

Of course, the downside to PerformanceModeP seems to be its interaction with this "wizard" tool. It is natural for people to wish for the optimum performance, but not at a price to functionality.

thanks again for the help
Brian

Collapse
Posted by Andrew Piskorski on
actually, this variables are per-request only, since these are in the global Tcl namespace, so the only caching is within a single request. Not sure, if this was the intention.
In older versions of AOLserver, I recall that Tcl variables were NOT cleaned up when the connection ended. They remained behind, and if you wrote code that attempted to re-use an old variable, sometimes it would work, if you happened to get a conn thread with that variable already present, and sometimes it would fail, when you got a different conn thread. Does that match your memory of how things worked back then, Gustaf?

That per-thread tcl_url2file caching approach dates back at least to ArsDigita's ACS 4.2 from early 2001, and probably all the way back to ACS 4.0 or even earlier. (As I recall, the abstract path handling was introduced late in the ACS 3.x series, probably sometime in the year 2000.) Thus, the tcl_url2file cache was designed for use with AOLserver version 3.3.x or earlier, and possibly even intended to be backwards compatible with AOLserver 2.3.3 as well. Likely its original author expected that the tcl_url2file array variable would remain behind in each conn thread, for use as a cache across multiple client requests.

Collapse
Posted by Gustaf Neumann on
i am not so long with AOLserver to know this. The documentation section mentioning the purge of global Tcl variables was not changed for the last 17 years [1] (i.e. early 2001). One of the oldest AOLserver article from 2000 mentions already nsv as an instrument to share global variables [2].

Nevertheless, i fully trust Andrew's historical explanation about the intention of the original design. This shows, that this aspects of the performance mode are broken since at least 17 years and nobody has missed it. My suggestion is either to remove the ::tcl_url2file global variable, or to "do it right", including handling of host-node mappings etc. Currently, i personally would have time for just the first option.

-g

[1] https://github.com/openacs/openacs-core/blame/master/packages/acs-core-docs/www/xml/developers-guide/programming-with-aolserver.xml#L46

[2] http://www.tldp.org/LDP/LG/issue58/washington.html

Collapse
Posted by Andrew Piskorski on
Or, as you suggested earlier, just move the tcl_url2file array into a Tcl namespace? Since namespaces are not cleaned up on connection close, that should restore the original intention of the caching code.
Collapse
Posted by Gustaf Neumann on
i am afraid that it will cause more harm than good since it can break running installations, leading to probably deniable performance gains (when i interpret ds-output correctly, this is 0.1-0.3ms on openacs.org) ... it needs more work (testing, documenting limitations, etc.)