Forum OpenACS Q&A: Strange cross-caching of users

Collapse
Posted by Reuven Lerner on
I'm working on a site that will contain sensitive financial data, which should only be accessible to particular users.  I've convinced the client that OpenACS is reliable and secure, and that we don't need to worry about security issues.  The site is supposed to go live within the coming days.

We were having problems with people knowing when they were logged in, and whether they were logged in as themselves or as someone else, so I added a notice at the top of our administration page indicating the user saying, "You are logged in as NAME."

Unfortunately, we're seeing some odd problems that appear to stem from overaggressive caching of pages.  The project manager just called to tell me that things weren't working, and told me that the system claims she's logged in as my employee.  When I went to double-check the problem, I also saw that I was logged in as my employee.

Perhaps the real culprits here are the respective HTTP caches at my office and at my client's ISP.  And luckily, she hasn't yet figured out that such problems, if real, would be a major security problem.

So I'm wondering if anyone else has ever seen such issues in OpenACS -- and if so, whether there are any obvious fixes for them.  I really hope that this is a local configuration problem, or one that I can solve by tuning my nsd.tcl file.

Collapse
Posted by Matthew Geddert on
I've convinced the client that OpenACS is reliable and secure, and that we don't need to worry about security issues.

This is a scary thing to say, especially if you are dealing with financial documents. It is true that aolserver is relatively secure (in part because of obscurity)... but you should ALWAYS worry about security, and it is irresponsible to tell somebody they don't need to worry about it. For this type of application i would disable the cookies after 10-15 minutes of inactivity or so and disable the feature to permanently store somebodies username/password in cookies... just like online banks do it.

If the information is cached in "work offline" or something like that in internet explorer it has nothing to do with the web server security per say, it has to do with training employees to do one of two things clear their cache (or entirely disable caching of browsing information), or more likely to get them to log off of their computer when not in use (you are using an OS with multi user capability if you are doing finacial work right?). Then each time a user uses a comptuer they log in seperately and get their cache from the previous time they themselves had been logged in. Win95/98/Me doesnt' do this securely. Win NT4, 2k, XP does do this okay, linux/unix are fine, and Mac OS X or higher is fine for multiple users using the same computer with sensitive information.

Although not breakable, it would help to put

<META HTTP-EQUIV="Expires" CONTENT="SOME TIME 15 minutes after now">

in your pages...
Collapse
Posted by Jeff Davis on
Take a look at this thread:
https://openacs.org/forums/message-view?message_id=27295
(and I think it's been discussed more recently as well
but that is what my search turned up).

A quick fix is to set the appropriate HTTP headers in the sitewide master template.

Collapse
Posted by Reuven Lerner on

Matthew, you're 100 percent right in what you're saying. At the same time:

  • I slightly overstated the need for security. The financial data is about aid requests that potential particpiants have submitted. If the information is released, then there will be a heckuva lot of embarrassment, and my head will be served up on a silver platter, but the organization won't go under, and no one will lose any money.
  • That said, the financial information is protected by a set of permissions that only we (the developers) can assign manually via the OpenACS site map. If we haven't explicitly given you permission to view the data, you won't see it.
  • The client's current security system is laughably bad. The financial data was passed around in cleartext e-mail several times per week for the last few years. And the computer on which the data is located sits in the middle of an office (without any password, of course) through which more than 100 people pass each day. I take computer and network security quite seriously, but I also have to be realistic -- and for this particular non-profit, the security that we've set up is so far beyond what they've had before that there's no comparison.
  • All this being true, we're being pretty vigilant about testing and double-checking ourselves.
  • You're probably right to have cookies expire on a regular basis. At the same time, I have a feeling that the client will be rather annoyed by having to log in again every so often. I'll run it by them later today, and will see what they say.

Finally: From the other thread that Jeff pointed me to, it looks like the pages are cached but the permissions and logins are not. This is a major relief, and means that I simply need to set the caching properties on personalized pages. The underlying security mechanism remains unchanged. I wouldn't mind setting a meta tag, as Matthew suggests, but doing these things in .tcl strikes me as a more elegant solution.

Thanks for the rapid and useful comments!

Collapse
Posted by Jeff Davis on
Two other possibilities for circumventing bad
caching proxies (beyond setting cache-control, etc headers) are to have sensitive pages
be https or to add a "sessionid" to the url (which will prevent cache hits since every url
would then be unique to the given session).  I think
both of those will work no matter how broken your proxy is.

Also, even if the pages are cached you cannot carry out
any operation that does a permissions check since the
session_id in the cookie will be correct even if the page
presented to the browser is wrong.

Collapse
Posted by Simon at TCB on
Its also worth noting that OpenACS is still (I believe) vunerable to denial-of-service type attacks. Worth bearing in mind if your client is any kind of company that isn't all that  all that 'popular'. Often very true of financial organisations.
Collapse
Posted by Dirk Gomez on

For example it's vulnerable to cross-site scripting attacks. (see here: https://openacs.org/forums/message-view?message_id=32835)

And you can still muck around with prefetched acs_object ids on -1 forms.

Or have these two gaping holes been closed?

Collapse
Posted by Jeff Davis on
Dirk, both still exist. The object id manipulation is reasonably easy to fix by signing the id but the cross site scripting one is a big job. If I had to guess a time to fix it all I would say probably 5 weeks of full time work (based on there being 332 -2.tcl files to check and on how long it took to do the noquote stuff originally).

To date no one has taken it upon themselves to fix it. The noquote stuff is a start as is sweeping through and signing all the object ids (both of which simply mitigate but do not remove the problem).

ad_form signs keys by default but not that many other places use signed variables (in fact only download seems to use it and then only for spam and export of data). We could sign hidden variables by default in the templated form stuff system but I think that would break some pages that do javascript manipulation of hidden vars. Also, a lot of the most sensitive pages don't use the form api.

Collapse
Posted by Tom Jackson on

I haven't seen a real solution to the cross site scripting problem. Even if you scrub all possible content from this site, you are still left with easy exploits. Just direct readers to a 'foreign url' that contains the malicious code. By malicious code, I mean an 'a' link that contains an 'href' that can execute an http request in the context of the user viewing the 'foreign url'. Another possibility is to send an html formatted email containing the exploit to an admin. For small communities, it wouldn't be hard to come up with good bait.

This is a huge HTTP/HTML/application bug, with no fix in sight.

Collapse
10: Re: CSRF protection (response to 1)
Posted by Dirk Gomez on

We'll update the documentation for our CSRF protection module which describes its background, the solution we picked, and the migration path we took. Once this is done - I'm on vacation for a week - we'll upload a tarball to the file-storage.

Jeff, protecting OpenACS against CSRF protection is certainly a time-consuming (and not exactly interesting) activity. The application I converted had about 570 files - it was mostly mechanical and dull work and took about 4 days.

The migration was as follows: For the first few weeks, the CSRF protection module would only log "suspicious" accesses. Programmer and/or template authors were notified of the suspicious attack and would check the affected pages and templates - and protect them. That way we did not impact the live site.

So I don't think it is too much work, it can be shared quite well, and it must be done because OpenACS with its readable URLs is quite vulnerable to it.

Collapse
Posted by Andrew Piskorski on
Reuven, are you using SSL on your ACS site?  If not, I imagine adding SSL would have the nice side effect of preventing proxies from doing any bad cacheing.

If it's not proxies doing the cacheing, but your own OpenACS server, then once you know that for sure tracking down where things are being memoized or otherwise caches in OpenACS/AOLserver should be much simpler.

Collapse
Posted by Reuven Lerner on
Well, I haven't gotten 100 percent to the bottom of this problem -- but I will be having a long debugging session with the client on Thursday morning, at which point I hope to determine whether the caching is happening on the server side
(i.e,. if AOLServer/OpenACS is caching the ADP pages) or if this is my ISP's fault.

I've started to insert some caching directives in the outgoing HTTP headers, and I believe that this si helping.  Part fo the problem is that it's hard to get reliable test cases working.  But again, I hope to solve much of this tomorrow monring.

Andrew's suggestion of using SSL seems like overkill for a simple problem.  But it's an idea if everything else turns out to be too annoying.

The idea of putting an ID in the URL is also intriguing, but I really would prefer to leave the URLs alone.

I'll let you know if anything particularly exciting or enlightening happens!

Collapse
Posted by Reuven Lerner on
OK, I've figured out what was going on.  I'm a bit stuck for a solution, though.

The deal is this: Neither AOLserver nor OpenACS sends Expires or Cache-Control headers.  And the URLs in OpenACS look like they should contain static documents.

Unfortunately, this means that my ISP's mandatory and invisible caching proxy caches them.  Even more unfortunately, I share a proxy cache with the graphic designer and the client with whom I'm working.

So we've had a fun time the last few weeks, with everyone discovering that they're logged in as someone else.

The real solution would be for OpenACS to put out "Expires" headers at the top of each page, or to set Cache-Control to private.

I'm going to mess with the templating system a bit in the next few minutes, so that we won't have such ridiculous caching problems.  (And if it helps, I'll submit a patch.)  But if anyone has any obvious suggestions regarding where I can put this in, or reasons why this is a bad idea, I'll be happy to listen...

Collapse
Posted by C. R. Oldham on
Reuven,

We solved this one by changing the default-master for our site.  Should I post/email you the files so you can see our solution?

Collapse
Posted by Reuven Lerner on
I've already modified the default master, and it helped.  But we have seven subsites, and are also seeing the caching problem on the main login ("Welcome to OpenACS") page, which isn't part of a subsite.

I basically want the system to turn caching off for anything that lacks a .jpeg, .png, or .gif extension.  Under Apache, it would be pretty easy to do this.  Surely there's a way to configure AOLServer to output HTTP headers conditionally, no?  And if not, then perhaps I should put such code in the templating system.

Collapse
Posted by Reuven Lerner on

Sure enough, a quick change to adp_parse_ad_conn_file made all the difference in the world. I added the following two lines:

ns_set put [ns_conn outputheaders] "Cache-Control" "no-cache"
ns_set put [ns_conn outputheaders] "Pragma" "no-cache"

And everything now seems to work just fine.

Is there any reason why this should not be a standard part of OpenACS?

Collapse
Posted by Tom Jackson on

The link that best explained to me how to control your cache was mnot.net. I don't think messing with the default-master would do much.

You should run a filter from your tcl/init.tcl file. If you don't want or care about controlling access to your images, just have the filter also return the file and skip openacs completely. You still need to also skip out of any trace filters.

Another option that takes care of the few details of programming is to use my VAT module. You would need an additional domain, or subdomain name for the image files in that case. However, I would guess configuration of my module would take about the same amount of time as writing the filter.

If you do change the domain name for the images, you could use a simple static webserver for them as well: thttpd or publicfile might work for you.

Collapse
Posted by Jun Yamog on
Hi,

We use the hack on the default master.  But what solution that will make it into OpenACS?  I think this problem has crop up several times in my opinion a solution must get into OpenACS tree.

Can we have someone commit the default master change or Reuven's ns_conn solution?  Or maybe another solution in aolserver level.  I hope something gets committed in, since someone in the future will have this problem again.  Looking at forums may not be the best option to solve this in the future for other users.

Collapse
Posted by Bart Teeuwisse on

What does the community think of this caching scheme (based on Tom's suggestion):


ad_proc cache_control {conn why} {

    Control cacheing of images, static and dynamic pages.

} {
    set url [string tolower [ad_conn url]]
    switch -regexp $url {

        .gif$ -
        .jpe?g$ -
        .png$ {

            # Expire images after an hour.

            set seconds [expr 60 * 60]
            ns_set update [ad_conn outputheaders] Expires \
                [ns_httptime [expr $seconds + [ns_time]]]
            ns_set update [ad_conn outputheaders] Cache-Control \
                "max-age=$seconds"
       }

        .css$ -
        .html?$ {

            # Expire static pages after half an hour.

            set seconds [expr 30 * 60]
            ns_set update [ad_conn outputheaders] Expires \
                [ns_httptime [expr $seconds + [ns_time]]]
            ns_set update [ad_conn outputheaders] Cache-Control \
                "max-age=$seconds"
        }

        default {

            # Expire all other pages immediately.

            ns_set update [ad_conn outputheaders] Expires \
                [ns_httptime [ns_time]]
            ns_set update [ad_conn outputheaders] Cache-Control \
                "no-cache,no-store,must-revalidate,proxy-revalidate"
            ns_set update [ad_conn outputheaders] Pragma \
                "no-cache"
        }
    }
    return filter_ok
}

# Register the cache control proc.

ad_register_filter -critical f trace * /* cache_control

I've saved this as cache-control-init.tcl in the /tcl directory.

/Bart

Collapse
Posted by Tom Jackson on

So ad_proc and ad_register_filter are defined at that point? I'm not sure, but I would try to run you filter as a regular ns_register_filter to you know exactly when it is run (ahead of the request processor filter).

But I'm just being paranoid. Put in an ns_log and verify when, if, where the filter gets run just to make sure.

Collapse
Posted by Bart Teeuwisse on

Tom,

you are paranoid ;)! Yes, both ad_ procs are defined at that point. Like I said, I'm running this script at a (staging) site. The advantage of ad_register_filter is that you can see the registered filters from the monitoring package.

/Bart

Collapse
Posted by Tilmann Singer on
First of all - thanks for investigating in this! I think this is a long needed and missing functionality in OpenACS, and should definitely be enabled by default. I have told users to "hit Reload, no not Reload, CTRL-Reload, yes you have to hold down the CTRL key and click on the Reload button" ... a few times, and that is annoying.

I think though that the right place to put this is the request processor, because only there you are sure what kind of file a URL really maps to. With abstract URLs you can't tell only by the URL all of the times, e.g. when '/some-file' maps to '/some-file.html'. Also there might be an index.vuh that handles requests for *.html files (not sure if that's really possible), which shouldn't be cached, although the URL ends on 'html'.

If you agree but don't have the time to rewrite your proc then I could give it a try (not immediately though).

Also I think it could be done more efficient with 'string match' instead of regexp, something that should be considered when adding code that is executed upon every request.

Collapse
Posted by Jon Griffin on
Actually, in terms of efficiency, expr is much faster than both.

The ability to use expr with the eq ne keywords is new though so you would have to use AOL >= 3.5, which allows you to use a real version of tcl instead of the builtin one.

The OACS recommended version of AOLserver is showing its age and we should be recommending the newest AOL. (of course acs-lang in the head won't work).

Collapse
Posted by Robert Locke on
Some scattered thoughts on the subject:

* There are other static content extensions which you might wish to include such as .pdf, .wav, .doc, etc.

* It may sometimes be desireable to *not* send any cache control headers for static content.  This is because it would be sent with a "Last-Modified" header and would subsequently be validated with the "If-Modified-Since" directive.  Slightly less efficient, but it atleast guarantees fresh content for those sites which need it.

* There may be some sites which want all their dynamic content to be cached for a time because the information is not very time-sensitive nor user-specific, so that should be possible as well.

* You may also wish to cache only some of your dynamic content.  For instance, what if the browser/proxy is fetching the URL "/shared/portrait-bits" which maps to a CR object stored in the filesystem?  Currently, with cr_write_content, you atleast get the benefit of a "Last-Modified" header.  But, with the proposed URL pattern matching scheme, the browser/proxy would also get a no-cache directive.

Don't know if that helps at all...

Collapse
Posted by Bart Teeuwisse on

Jon, Tilmann,

I agree with you both. Yes, caching should really be part of the request processor. Especially since I made a small but crucial mistake in the above code. The cache-control filter should be registered as a postauth filter rather than a trace filter. A trace filter isn't executed till after the connection closes.

I hadn't noriced because ACS developer support had me fooled for a while as the cache control headers did show up in the request information of ACS developer support. Delorie has a cool HTTP Header Viewer to inspect headers of Internet accessible sites with.

There is a lot more to controlling caching than this simple script handles. Postauth filters for example, are run before the request processor and hence it is impossible to use [ad_conn] to find out if the request was for a static file or not. Robert pointed out some of the other short comings.

Which ever solution will be adopted I belief that it should be part of the request processor where it can better distinguish between content material (dynamic, static, image, content type, etc) and the appropriate action. The final solution should leave room for customization, e.g. cache some parts of a site longer than others.

/Bart

Collapse
Posted by Jeff Davis on
Some issues I see: .css$ etc will also match "xcss" and you should say switch -regexp -- $url since urls starting with "-" will break the filter. It ignores whether the headers have been set already which means if you have a .vuh handler or dynamically generated images etc which might do something like vuhhandler/21987/foo.css you won't have any way to disable it.
Collapse
Posted by Tilmann Singer on
Robert: What component of an OpenACS installation is outputting Last-Modified headers? And which one would be able to deal with If-Modified-Since? I don't see this happening at all right now. Or are you suggesting to add this instead of the Cache-Control stuff for static content?

Regarding adding .pdf, .wav, .doc: When the request processor decides if to add the cache control headers, it's not necessary to keep a list of file extensions at all, the rp would decide based on wether it is serving an actual file from the file system or calling one of the handlers, e.g. the one for tcl/adp pairs.

I agree that it should be possible for individual sciripts to override the cache handling of the rp, for example if a custom last-modified handling was to be implemented in cr_write_content it should be able to call something like "ad_conn -set rp_output_cache_control_headers_p 0" to inform the request processor that it should not add the headers.

Collapse
Posted by Reuven Lerner on
Overall, I think that Bart's solution is a great one.  It certainly would do the trick for my needs, and I expect that it would do the same for issues raised by other people.

Is there any chance of getting this into 4.6?  The number of hours that I've spent on this problem on one site leads me to believe that it'll be a major mistake not to include this patch (or one like it) ASAP.

Collapse
Posted by Robert Locke on
Hi Tillman,

I'm not sure, but it's definitely happening. Here's an example from fetching our beloved logo on the openacs.org site:

$ telnet openacs.org 80
HEAD /templates/slices/openacs.gif HTTP/1.0

HTTP/1.0 200 OK
...snip...
Last-Modified: Tue, 29 Oct 2002 16:41:11 GMT
...snip...

Notice the Last-Modified header.

Then, if I follow it up with an If-Modified-Since and:

$ telnet openacs.org 80
GET /templates/slices/openacs.gif HTTP/1.0
If-Modified-Since: Tue, 29 Oct 2002 16:41:11 GMT

HTTP/1.0 304 Not Modified

This correct behavior is not just limited to URLs which map directly to a file in the file system. cr_write_content, which calls ns_returnfile for static content stored in the file system, seems to also honor the Last-Modified/If-Modified-Since directives, which is a good thing.

And I'm not suggesting to add this *instead* of Cache-Control stuff, since it's already there. I'm suggesting that some people may want to remove all Cache-Control stuff for static content and instead rely on the Last-Modified/If-Modified-Since "protocol", which guarantees freshness along with some measure of efficiency.

Perhaps reasonable default behavior might be:

  • At the end of the request, check if there is Last-Modified header. (From what I can tell, this would imply the URL maps to an actual file or is static content being delivered by, say, cr_write_content as described above. It could also have been dynamically added by a script.)
  • If one exists, then optionally add a configurable caching directive (30 minutes, 1 hour, whatever), if there are no caching directives already present.
  • Otherwise, add the "no cache" stuff, if there are no caching directives already present.

Hope that makes sense...

Collapse
Posted by Tilmann Singer on
Robert, that's great!

From searching a bit in the aolserver code I would say those headers are generated in aolserver whenever ns_returnfile is called. Since the request processor calls that for static files we get this functionality for free, and it also means we don't need to check for the existance of those headers but just refrain from adding any cache control headers when ns_returnfile is being called.

I would therefor suggest to add some code to the request processor that adds the cache control headers in all other cases, overideable by an acs-kernel parameter and on a per-request basis by setting an ad_conn flag as mentioned above.

I don't think that there is much use for a 30min/1 hour/whatever cache directive - typical dynamic pages need to be regenerated on every request and should thus not be cached at all - at least those that display login information on each page, like openacs.org does.

It would be great if the Last-Modified/If-Modified-Since behaviour could be added  to cr_write_content later. Being able to avoid the no-cache headers on a per-request basis makes sure that this is possible.