Forum OpenACS Q&A: How to return 304s for ADP pages?

Collapse
Posted by James Thornton on
Is there a way to tell AOLserver to reply with a 304 for ADP pages if the file hasn't been modified?

I have a site with 40K+ ADP pages that are static except for a few ADP templating tags. Since Google uses the If-Modified-Since header, replying with a 304 would save some bandwidth.

I tried posting this to the AOLserver list, but it doesn't appear that it worked.

Collapse
Posted by carl garland on
You could put something like this in a filter
set mtime [ns_httptime [file mtime [$filename]
set none_header [ns_set iget [ns_conn headers] If-None-Match]
set since_header [ns_set iget [ns_conn headers] If-Modified-Since]

if {$none_header == $mtime || $since_header == $mtime} {
    ns_return 304 html/text
} else {
    ns_set put [ns_conn outputheaders] ETag $mtime
}

Collapse
Posted by Steve Manning on
<blockquote>I tried posting this to the AOLserver list, but it doesn't appear that it worked.
</blockquote>

I had an issue posting to the AOLServer list since they introduced their reverse lookup. The lists mailserver bounced it with a 450 error because the domain I'm posting from isn't holding the mx record for the e-mail address I'm using (manning.net is a Netidentity postbox).

Might be the same issue for you. I solved it by posting through Netidentity's SMTP server but its a pain in the behind to have to remember to switch accounts before posting.

    Steve

Collapse
Posted by James Thornton on
Since the nsd.tcl's
    ns_param   checkmodifiedsince   true  
appears to not work for ADP pages, and I haven't found a way to enable nsd's checking of the Not-Modified-Since header for ADP pages, I took Carl's suggestion and wrote a proc to do it.

The site I wrote it for is OpenACS 3.x so I am invoking it at the top of ad_serve_adp_page:

# code from ad_serve_adp_page
if {[jt_return_304_p]} {
	ns_return 304 text/plain ""
	return
}
Here's the code for jt_return_304_p...
proc jt_return_304_p {} {

    set return_304_p 0

    # ad_conn only works for abstract urls
    set path [ad_conn file]

    if {[string equal "" $path]} {
	# ns_conn doesn't work for abstract urls
	set path "[ns_normalizepath [ns_info pageroot][ns_conn url]]"
    }

    if { [file exists $path]} {
	set modified_time [file mtime $path]

	# googlebot doesn't use ETags;
	#set none_header [ns_set iget [ns_conn headers] If-None-Match]

	# NS 4.x SGI sends If-Modified-Since with "; length=xx"
	# If-Modified-Since = Thu, 21 Aug 2003 04:37:53 GMT
	# If-Modified-Since = Tue, 30 Sep 2003 22:28:53 GMT; length=885
	set since_header_maybe_length [ns_set iget [ns_conn headers] If-Modified-Since]
	
	if {![string equal "" $since_header_maybe_length]} {
	    # If-Modified-Since header sent

	   # remove the ;length=xx part if present
	    set since_header [lindex [split $since_header_maybe_length ";"] 0]

 	   # convert date string to epoch time
	    set since_time [clock scan $since_header]
	
	    if {$modified_time < $since_time} {
		# file hasn't changed
		set return_304_p 1
	    }
	}
    }
        
    return $return_304_p
}
To test it, telnet to your Web server's port 80:
$ telnet jamesthornton.com 80
Trying 209.164.72.61...
Connected to jamesthornton.com.
Escape character is '^]'.
... and send someting similar to the following HTTP commands (make sure not to begin with a blank line, but terminate with a blank line)...
GET /index.html HTTP/1.0
User-Agent: Googlebot/2.1 (+http://www.googlebot.com/bot.html)
From:  googlebot(at)googlebot.com
Accept: text/html,text/plain,application/*
Host: jamesthornton.com
If-Modified-Since: Thu, 21 Aug 2010 23:44:54 GMT
You should get something back like this:
HTTP/1.0 304 Not Modified
Set-Cookie: ad_browser_id=6789763; Path=/; Expires=Fri, 01-Jan-2010 01:00:00 GMT
Set-Cookie: ad_session_id=6789764,0,Y3ajJELPiYQI8bsl.KCRGRRjLVjV1izs,1064970966; Path=/; Max-Age=86400
Set-Cookie: last_visit=1064970966; path=/; expires=Fri, 01-Jan-2010 01:00:00 GMT
Set-Cookie: CurriculumProgress=start; path=/; expires=Fri, 01-Jan-2010 01:00:00 GMT
Content-Type: text/plain; charset=iso-8859-1
MIME-Version: 1.0
Date: Wed, 01 Oct 2003 01:16:07 GMT
Server: AOLserver/3.3.1+ad13
Connection: close

Connection closed by foreign host.
Collapse
Posted by Jade Rubick on
It seems to me that the problem with Carl's suggestion is that unless the actual file has been modified, Google will never index your site again. This is fine for static ADPs, but a bummer for non-static ones...
Collapse
Posted by James Thornton on
A simple hack would be to touch the file when a user posts a comment or adds other dynamic content.
Collapse
Posted by carl garland on
It may be true that GoogleBot may not use ETags there are many other spiders that do/may which is the reason I suggested adding it. For more info on this topic