Forum OpenACS Q&A: ns_returnredirect breaks with nsvhr and IE?

This is very strange--I changed nothing in my configuration either in IE or AOLServer, but IE abruptly stopped honoring ns_returnredirects, complaining, "The page cannot be displayed." I upgraded my nsvhr patches to Jerry's v.6 but it is still happening. Here's what I know:
  • the problem is related to nsvhr. If I go directly to the site's port, bypassing the master server, it works fine
  • ns_returnredirect by itself works fine. It is also okay with some forms of computation before the ns_returnredirect, but set_form_variables followed by ns_returnredirect gives this error
  • I could find no ns_writes or ns_returns before the ns_returnredirect (I checked because I know ad_return_error likes to return code 500, which IE interprets with the same "page cannot be displayed" message.)
NN 4.7 and 6.1 both work fine, but IE 5 and 6 do not.
Collapse
Posted by David Walker on
Put "ReturnHeaders" into your script just prior to the
ns_returnredirect and then you'll be able to see the headers
returned.  Look at the Location: header to see what path you are
being redirected to.

Personally I've hacked ns_returnredirect on my server to use the
name sent with the Host: header instead of whatever hostname the
server thinks it should use.

Collapse
Posted by Jerry Asher on
Yes, when I've seen similar it's been due to ns_returnredirect (I think) not returning a full, valid, URL complete with host (and I believe it is supposed to.)  I believe ad_returnredirect fixes that.

Alternatively, I've been seeing a bug in IE (and it's definitely a bug) that's not present in Mozilla, or Opera, or Konqueror, where really big POSTs (or something about certain POSTs) cause IE to retry the POST once, and then issue a GET on the damn thing.  It's truly bizarre.  Sadly, it does seem related to nsvhr when using the nsunix module, and it's one reason I want to migrate nsunix functionality from nsunix itself to some patches to nssock (I believe nssock has a workaround, the "graceful-closewait")

Collapse
Posted by Jonathan Ellis on
ReturnHeaders gives the same result. Somehow something is writing something back to the browser that confuses it. I narrowed down the part of set_form_variables that was causing this to the ns_getform call, and the part of ns_getform that makes it break is
    set n [ns_set size $_ns_form]
    for {set i 0} {$i < $n} {incr i} {
        set key [ns_set key $_ns_form $i]
        set value [ns_set value $_ns_form $i]

        set value [ns_set value $_ns_form $i]
        set newvalue [encoding convertfrom $encoding $value]
        ns_set put $newform $key $newvalue
    }
Any of the ns_set calls here (size, key, value) break the ReturnHeaders and redirect. Removing all of them makes the redirect work fine. (Of course, that renders ns_getform completely useless.)
Collapse
Posted by Jonathan Ellis on
Jerry, I saw something that started me looking for ad_returnredirect over at the aD bboards, I think, but I couldn't find it in the 3.2.5 distribution so I dropped it.

Do you have it backported, so I don't have to?  It's pretty straightforward, but it calls ad_conn, which is in another file and probably calls other stuff (what's wrong with ns_conn?) and if it's been done I don't want to reinvent the wheel.

Collapse
Posted by David Walker on
I bet the reason that piece of code is causing this problem is
because of what Jerry said about IE trying POST but then trying a
GET instead.  If that's so then you should receive a similar error
when accessing the target page from a browser without passing it a
query string or POSTing to it.

Are you redirecting the target of a POST?

Collapse
Posted by Jonathan Ellis on
Yes, I'm redirecting the target of a post.  And it does seem to be a size issue; the form is dynamically generated and smaller versions of the form are redirected just fine by the action page, but larger ones (not THAT large, either -- 1.3 K, about) exhibit the problem I described.  ad_returnredirect doesn't help, unfortunately.  Although util_ReturnMetaRefresh DOES work, but this is so ugly I'd rather not do it.
Collapse
Posted by Jerry Asher on
I'm curious as to why ad_returnredirect doesn't work. It looks to me as though it should be calling util_ReturnMetaRefresh for you.

Intriguingly, I do see this on my system from time to time, and I have Rob Mayoff's patch (http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg_id=0006Wr) that is supposed to keep this from happening.

Collapse
Posted by Jonathan Ellis on
The patch you mention looks like it's meant to solve a different problem -- that IE, when redirected to another page, "loses" its form data (by issuing a GET instead of a POST).  The problem here is that it doesn't redirect at all...

As to ad_returnredirect, it looks like it is trying to solve the same problem as the ns_form patch.  (Does anyone know if the ad_returnredirect code is outdated in this case?)  When my form action page gets to the ad_returnredirect, the content type is "application/x-www-form-urlencoded", not "multipart/form-data."

Collapse
Posted by David Walker on
Are you sure it doesn't redirect at all?  The symptom you described
is normally caused by attempting to retrieve a page that normally
expects form data without giving it any.

Don't depend on the URL in the address bar.  IE sometimes shows the
origin URL rather than the target URL when a redirect happens.

If you haven't done so already, disable "Friendly HTTP error
messages" in IE as well.

Collapse
Posted by Jerry Asher on
If you set debug on, and dev on in your config.tcl, your log should fill with copious obscure details of what is going on, including hex dumps of most i/o in and out of your box.

I use that to try and pin down exactly what the browser and server are doing.  That's how I've picked up on the oddities where IE can retry a POST, without the posted data, and even turn a POST into a GET.

I don't have a solution right now.  Since IE has something like a gazillion market share, I obviously want to find solution and/or a work around.  I guess I hear you saying the meta refresh will work.  Ugly as that is, it's probably the way to go for now.

Alternatively, I have some strategies involving more development of nsunix: 1) merge nsunix with nssock, that is, let nssock "accept" connections on a unix domain socket and after they are accepted handle everything else in a standard nssock sort of way, and 2) implement the nssock graceful closewait in nsvhr and nsunix as needed.  I'd be happy to help you attempt either of these.  My recommendation, if it works, is to do the meta refresh.  I think it's ugly too, but scanning the forums at aD and OpenACS, there is a long history of folks using that technique due to IE returnredirect oddities.

Collapse
Posted by David Walker on
I use a couple of workarounds.

For small forms I use the function that returns the entire form as query string.

For larger forms I like to include both files in one so that no redirection is needed (vt_include is an include function I wrote):

if {some_condition} {
	set page page2
}
switch $page {
	page1 { vt_include page1.tcl }
	page2 { vt_include page2.tcl }
}
I thought something like this would work because I want to start at page 1 (the page with the form), have the form action go to page 2, then when form processing is done have page 2 kick me back to page 1.  So all I have to do is have the form action for page 1 be page 1, and have it source page 2 when ns_getform is nonempty.  Right?  Wrong.  I forgot that something is being written to the client during the ns_getform itself, so this gives the same "The page cannot be displayed" error as the other.  (Yes, "friendly errors" is off.)
Okay now this is wierd: I tried the above with dev=true and it worked.  Turned dev off, it broke.  Dev back on, it's fine.

There's something screwed up in AOLServer's guts is the bottom line.  For the moment though if I make my form var names short enough I am OK without even having to meta-refresh unless I deliberately create unusually huge test cases.  For normal use I am satisfied.

Collapse
Posted by David Walker on
Put <input type=hidden name=submitted value=true> in your page1. then, after the set_the_usual_form_variables put in a line something like
if {$submitted} {
	source /path/to/page2
	return
}
Collapse
Posted by Jerry Asher on
Okay now this is wierd: I tried the above with dev=true and it worked. Turned dev off, it broke. Dev back on, it's fine.
I've seen this too, and been puzzled by that, and determined it's not completely true (about 90% of the time.) The hypothesis I have draw from that is there is some timing thing going on: the act of having AOLserver dump it's buffers in hex (which is what dev does within nsunix), slows stuff down enough for client and server to remain in sync. Fitting that hypothesis is the knowledge that nssock's graceful-closewait is there in large part because IE is known to have funky timing issues with socket shutdowns and close.

It's not so much AOLserver's guts that are screwed up as something I just don't understand yet about IE, or TCP/IP, and nsunix. As I said, it's a goal of mine to merge the nsunix functionality into nssock itself, but that's a far off goal for now. Easier would be to add the graceful closewait.

But what do you mean when you write:

  1. I forgot that something is being written to the client during the ns_getform itself What is being written to the client? ns_getform isn't writing anything to the client.
  2. For the moment though if I make my form var names short enough I am OK What makes you think this is related to the length of form var names?
the ns_getform comment refers to what I said above when I was trying to trace exactly what makes IE give up and bail.  You're right; ns_form isn't writing anything, but it seems to be the culprit in whatever timing issues are involved.  (So playing around with hidden variables isn't going to help that.)

The problem isn't (directly) related to the length of form var names, but it is related to the size of the form data submitted, so if I have 40 dynamically generated select boxes in my form, it makes a significant difference when I use a 10 character name for each box than when I use a 20 character name.  With the longer name, the max form elements I could have before seeing this problem was about half what it is now.

Collapse
Posted by David Walker on
ns_getform is a red herring.  It's just causing that error because a
page was requested with no form data.  Try to access page 2 directly
(using GET) and you should get the same error.

What errors are showing up in your error log during these times?

There are no errors in my errorlog.  There is data in the form, and ns_getform returns it correctly, BUT it causes a problem with IE as we've been over.
Collapse
Posted by David Walker on
ns_getform does not return it correctly. You said that that is the point where your script errors out. Try wrapping the set_form_variables in a catch {} statement.

ReturnHeaders
if {[catch {
	set_the_usual_form_variables
}]} {
	variable errorInfo
	ns_write 
"<html><body>$errorInfo</body></html>"
}
You misunderstand.  My script does not error out; the browser does.  My script has always worked fine with no errors except that by the time it tries to redirect the browser is already confused.  It is ns_getform that is causing the confusion, but not by throwing an error or returning incorrect data.
Collapse
Posted by Bob OConnor on

Wow, this is a hot thread and I was also going to start a thread describing the same problem and happy to see our experts jump in. I'm using AOLserver/3.3.1+ad13 and Jerry's patches...

It is an occasional and annoying problem that almost getting consistant: IE5.5 returns: "Cannot find server or DNS Error" but NN4.x NEVER returns an error.

I'm using alot of
file1 (form)-> file2 (process)
and at the end file2 does
ns_returnredirect "file1"

So I'm hoping for a solution that allows continued use of ns_returnredirect. Thank you. -Bob

Collapse
Posted by David Walker on
OK.  I just don't see how that function could have any direct effect
on a page being displayed or what is returned to the browser without
the function itself erroring.
To sum up: it appears that your choices at this moment are
  • Make sure your form data is small enough (by using smaller names for your form elements or other methods) that you don't have this problem
  • use meta-refresh instead of returnredirect
  • implement the nssock graceful closewait in nsvhr and nsunix as needed, which would involve some AOLServer hacking
Option 1 was adequate for me. (I don't have time for option 3 right now. :)
Collapse
Posted by Bob OConnor on

Jonathan, From your Sum up:

1 Make sure your form data is small enough

This doesn't work for me. When you have a big text box like this one for sending lengthy posts to forums or when our UBER IE5.5 users send longish letters to groups of our users.

3 implement the nssock graceful closewait in nsvhr and nsunix as needed, which would involve some AOLServer hacking

Not something I'm up to!

2 use meta-refresh instead of returnredirect

Ok, I'm left with door #2! So how do I use util_ReturnMetaRefresh as a replacement for ns_returnredirect ?

I find nothing in openacs.org/doc/...(and the proc search here is broken (500) so I used my own 3.2.4 /doc/ directory.

-Bob

you can find util_ReturnMetaRefresh in this patch which also contains ad_returnredirect. If you take the line in ad_returnredirect
  if {[string match *multipart/form-data* [string tolower $type]]} {
and add to it so it becomes
  if {[string match *multipart/form-data* [string tolower $type]]
      || [string match *application/x-www-form-urlencoded* [string tolower $type]]
     } {
it will automatically use meta-refresh for IE users, and continue to use returnredirect for others. If you are motivated you could find where the cutoff in form size is that causes IE to misbehave, and adjust this to only use meta-refresh when the form is too large. I'm curious, too.
Collapse
Posted by Jerry Asher on
I believe the answer is that instead of calling ns_returnredirect, you just call util_ReturnMetaRefresh.

Then the browser just gets back "standard html output", but instead of getting a server specified redirect, the page has a meta tag that redirects the browser. That appears to work well with IE.

By the way, while I definitely want to get this fixed, IE has well known problems with redirects and confusing POSTs and GETs and retrying when it shouldn't. It's hit a bunch of people, including AOL (the following quotes came from a conversation just last week on a similar problem that wasn't nsvhr induced.)

Hi AOLserver folks,

we're seeing a AOLserver problem in production on AOL.COM. Unfortunately, they're running a very outdated version (2.3.3), just wanted to make sure you guys fixed this problem in later releases, here it is.

Magic Carpet is POSTing to www.aol.com, which in turn issues a redirect. This causes IE5.5 and IE6 to display a "page not found" error in certain circumstances. Weird circumstances. Turns out it relates to the number of packets sent by the browser:

We have following events sequence: 1) IE sends first request TCP packet to aol.com server with HTTP header: POST /index.adp HTTP/1.0 ...... Content-Length: 393 .... 2) AOL.com server *immidiatelly* replies with HTTP 302 redirect 3) IE continues request and sends second TCP packet with HTTP body: siteId=aolcomprodStage&siteState..... 4) IE starts listening for reply from AOL.com server 5) IE failed to recieve data from AOL.com server (because reply was already sent on step 2) and shows error page to user

So the AOL server bug is that redirect is done *before* recieving HTTP body for POST request. If you turn on a proxy for the web browser, the problem goes away, because this way the packets come back in order.

Again, we're aware that AOLserver version 2.3.3 is outdated -- but can you guys confirm the problem will go away if we update to a more recent release? Any help apprechiated.

-- Mike

Mike Schilli mschilli1@aol.com Magic Carpet Engineering

Also, here's another snippet showing how rubylane does the meta refresh only when needed (but ad_returnredirect is very similar)
Here's the fix we use.  What we found is that if a redirect follows a POST,
then MSIE will ignore the arguments present on the redirect.

JIm

proc rl_returnredirect {location} { global __did_ns_return rlfont global __trace_endtime

if {[info exists __did_ns_return]} { rl_log error "ignoring 2nd ns_returnredirect in [ns_conn url]" return } else { set __trace_endtime [ns_time] __rl_writeallcookies

# This is a huge hack - MSIE 5 doesn't correctly handle a redirect after # a post; they don't send any arguments that came with the redirect

if {[string first msie [string tolower [ns_set iget [ns_conn headers] "user-agent"]]] >= 0 && [string compare [string tolower [ns_conn method]] "post"] == 0 && [string first "?" $location] >= 0} { ns_return 200 text/html "<html><head><meta http-equiv="Refresh" Content="0; URL=$location"></head><body>Please wait. If this does not automatically refresh, <a href="$location">Click here to continue</a></font></body></html>" } else { aol_ns_returnredirect $location } set __did_ns_return 1 } }

# NOTE: next rename fails when file is sourced, 2nd rename fails # on boot; this is correct catch {rename ns_returnredirect aol_ns_returnredirect} catch {rename ns_returnredirect {}} rename rl_returnredirect ns_returnredirect

I believe the answer is that instead of calling ns_returnredirect, you just call util_ReturnMetaRefresh.

Sure, but since meta refreshing is ugly, I suggested hacking ad_returnredirect so that (1) non-IE browsers aren't penalized for MS being dumb, and (2) even IE only has to meta-refresh when it's necessary to prevent buggy behavior.