Forum OpenACS Development: RFC: Separate code and data directories by default in 5.2

I propose that we change OpenACS 5.2 so that (as many have suggested) the executable code directories are separated from the data (content repository files, logs, etc) pages.

Reasons

  1. better security. With an enforced distiction, it's harder to put data in through the web interface that OpenACS will then execute
  2. simplifies production backups. In any system with multiple webservers, you want to back up the shared data only once.
  3. simplifies cvs Usually you want different cvs regimes for the volatile data vs the code. This puts all the data in one place. (And probably means the data directory shouldn't be a subdir of the code dir.)

Approach

  1. New recommended directory in /service0/: data. Should this be inside the service0 directory or parallel (like /service0-data )?
  2. Change the default config.tcl to have two root directories, one for data and one for code. Put log and content-repository-content-files directories in the data directory. (others? Photos? Can we do all this in config.tcl or do we have to touch the packages too?)
  3. Add a flag in config.tcl, unset by default. If set, the flag prevents OpenACS from writing anything at all anywhere but the data directories (and tmp?). This blocks upgrades (which could be malicious), dirty code writes, mistaken overwrites, etc.
There are a lot of details missing before this can be a TIP. What else needs to go in here? Where does /etc belong? I think /etc should be in the code area.
This would be very useful.  It can be a bit confusing in the current file structure to figure out where all the data is.

1- My preference for directories would be to have a data AND a code directory inside service0.

I agree about etc living in the code area. :)

I don't necessarily want logs in the same place as data since generally on a prod site I like them to be written to a seperate device from where the content repository data is, but otherwise I agree its a good idea to isolate the instance specific data from the codebase.

Another pet peeve is the existence of the directories in cvs. I hate seeing:

? content-repository-content-files/10
? content-repository-content-files/11
...
? content-repository-content-files/99
? log/error.log
? log/error.log.000
...
etc.
when I do a cvs update.

I don't see how you could enforce #3 without a serious overhall of the code though.

It would be good if it were possible to have the code on a read only partition which argues against having the data directory in the same place as the code.

I like having the content repository and the webserver tmp dir on the same partition so that uploads are just renames (and for the anal that partition could be noexec).

Jeff

A .cvsignore file is your friend. Just pop one of these little beauties into the root of your service dir and stick the following in it:

    apm-workspace
    content-repository-content-files
    database-backup
    etc
    log
    .*

And cvs will ignore those directories (and files starting '.')

If the directories are already known to CVS you may have to make it forget first - probably removing the CVS directories would do it.

    Steve

Hi Joel,

is this just an issue of changing the definition of the "standard" production environment? In my environment for example the content-repository is the only "data"-like store under the openacs-root. All the other things that an openacs-instance needs (database-dir, database-logs, aolserver-access-log, aolserver-error-log, aolserver-bindir,supervise-scripts,...) are lying somewhere on the disks, maybe sym-linked to the openacs-dir. But not necessarily... As far as point 3 is concerned: My cvs way is to create a checkout of the whole openacs tree  and symlink the necessary directories to my real openacs-root.(this way i can use "cvs update -Pd" later on to update all levels of dirs)
Did i misunderstood the reasons of your RFC?

Yes, this would be a very good thing to do ...