Forum OpenACS Q&A: How do I serve up a directory without any perms, security, or cookies?

A client's website was built such that for every dynamically generated content page, the master.adp tosses in at least 24 hrefs to various gifs that make for ads as well as table backgrounds.

According to the tools at web-caching.com, which crawled one URL at my request, because each of these gifs is served with a session id cookie,

http://209.203.208.261/templates/shared/background.gif
Date   	Fri, 26 Sep 2003 08:14:01 GMT
Expires   	  -
Cache-Control   	  -
Last-Modified   	24 weeks 6 days ago  (Fri, 04 Apr 2003 19:42:40 GMT) validated
ETag   	  -
Set-Cookie   	ad_session_id=4040301%2c0%20%7b34%201065168841%2009237CF575A7A017302D2B066F11A9F1D60C2F63%7d; path=/; max-age=604800
Content-Length  	0.5K (514)
Server  	AOLserver/3.3.1+ad13
"This object doesn't have any explicit freshness information set, so a cache may use Last-Modified to determine how fresh it is with an adaptive TTL (at this time, it could be, depending on the adaptive percent used, considered fresh for: 4 weeks 6 days (20%), 12 weeks 3 days (50%), 24 weeks 6 days (100%)). It can be validated with Last-Modified. This object requests that a Cookie be set; this makes it and other pages affected automatically stale; clients must check them upon every request."

So for each "real page" the server has to go through 24 rounds of db perm checking and session handling chokepoints. And the client's browser can't use the curv-tl.gif in it's cache and has to ask for it anew so it can compare the last modified date.

I'm just guessing, but I suspect the site would be a lot snappier without some of this rigamarole.

One solution is just to set up another webserver (tux?) to serve these images from a different port.

But is there away w/i the OACS to designate a particular directory or set of directories as files that get served up without all the std. OACS goodness?

See https://openacs.org/forums/message-view?message_id=120221

Put them in package/www/resources/...

Refer to them as /resources/package/...

Not sure if it avoids a session_id, though ... would you care testing?

See also this thread about Mathopd (and ArtBlast).

Mathopd is found here. Michael Cleverly would probably be willing to send you his patch to the request processor for the @http@ stuff.

What if you want to put it in a global, non-package location? I get the same staleness issue with stuff put in /www/images

Is there a global /resources directory?

Ah, thanks for the links and info.

Ack!  Why does mathoppd remind me of Bill the Cat?

I implemented the resources directory via a filter that runs before the request processor so such references shouldn't generate session cookies.

I'd like to see images be package-local since the whole idea of an APM package is that one can tarball it up, ship it to someone, and they can dump it in their packages directory without fear of overwriting any other package's resources.

Wouldn't using Squid or Apache as a caching reverse proxy be simpler than redesigning your templates to explicity repoint certain URLs to a new webserver (Mathopd, tux, etc)?

You wouldn't have to invoke any TCL code for resources like css, gif, etc.  so it should be faster.  You also wouldn't have to migrate content into separate /resource folders for the custom code you've already written.

You might have to make sure cache headers work correctly,  but you could probably get around that by enabling caching on the basis of the URL extension (gif|css|jpg|etc).

Has anyone tried this?

That solution would probably be easiest on the developer and the production server itself. A lot of care needs to be taken with the headers to ensure things are working the way you want. Probably most images (but not image requests) go through some kind of security check, and permissions can change at any time.

But the first step in tuning a production server is to remove public images from, at least, the aolserver process. You could still run a pair of production servers where one handled secured images and the other page requests, you might get some benefit from that, but it would need to be tested.

I think John's suggestion is something I have seen on other platforms too.  I guess when you already on this level of optimization, you are likely able to just serve those on mathopd server.
Moving serving of all public images to a separate non-OpenACS web server process seems like a common engineering approach for busy sites. However, how busy is "busy"?

It would be nice if the OpenACS toolkit can Do The Right Thing, such that the point where the user of OpenACS needs to dedicate engineering man-hours to special purpose scalability solutions is delayed as late in their site growth as is reasonably feasible. Ideally, a medium sized (but what does that mean exactly?) public site should be able to just use vanilla OpenACS (and thus AOLserver) without any special image-only web server, any special hacking of the OpenACS request processor, etc. etc.

Don, it sounds like OpenACS 5.0 already is doing this Right Thing with the new resources directory filter? How far does this seem to take stock OpenACS up the "scalability without extra man hours" curve? Is there anything else along these lines that OpenACS can or should consider doing in the future?

Although I see Don's point about making images be part of a package instead of in the www/ directory, that isn't always practical.

On my own personal site (http://rubick.com:8002), I wanted to upload my thesis, which contains about 80-85 images linked in from one HTML page, which was converted from Word of all things. Ug.

How am I supposed to upload this file?

#1 Make a new package, and put the images under a directory there?

#2 Put it in ETP? Well, that doesn't solve the image permissions problem, and it actually didn't work because the file was too big. It would time out on Safari (more than 60 seconds), and it wouldn't copy and paste into IE on the Mac, (probably for buffer-overflow protection reasons, they didn't want to allow you to paste that much text in?).

#3 Put it under www, in an .adp and .tcl file, and put the images under /www/images or something like that.

I can't imagine a better solution that #3 right now. #1 is just too much work. While not up to the ideal of engineering purity, it still seems like the best option. No?

I would argue this type of thing is common enough that we should provide some default support for it. Sure I can hack the request processor -- looks pretty easy, I think. But I don't think I'm the only one who will run across this situation.

Have to agree with Jade here. A per-package resources directory is nice, but there are other reasons than package templates for having the @http@ tag or similar functionality.

My hobby related website, just a bunch of HTML pages, has numerous pages with lots of large photos. It doesn't see much traffic, yet still it consumes an average of 31MB of bandwidth per day. I host the HTML on my DSL line at home in preparation for moving to AOLserver in the near future, but to conserve bandwidth I would still like to host the photos off site (which I do).

Having a global @http@ parameter would make it very simple for even a small outfit to host their own dynamic pages, while at the same time conserving precious bandwidth by letting their off-site ISP serve the bulk bandwidth for the in-document graphics.