Forum OpenACS Development: Multiple package instances - correct behavior on instance deletion?

Just what is supposed to happen with a package when the admin goes to /admin/site-map/unmounted and deletes a package instance, and what tools and recomendations does (Open)ACS provide to help achieve this correct behavior?

Reason I ask, is I've been fooling with using mulitple package instances of File Storage, and have some concerns about its behavior.

It's clear to me that when the admin goes to /acs-admin/apm/version-view and deletes a package from the system, the proper thing to do is generally to delete all code, content, and other objects owned by that package from the database. So far so good.

But when you install multiple instances of a package, and then delete some of them, what then?

Now, File Storage is basically a thin layer on top of Content Repository. Briefly, it defines a table that looks like:

create table fs_root_folders (
    -- ID for this package instance
    package_id  integer
                constraint fs_root_folder_package_id_fk
                  references apm_packages on delete cascade
                constraint fs_root_folder_package_id_pk
                  primary key,
    -- the ID of the root folder
    folder_id   integer
                constraint fs_root_folder_folder_id_fk
                  references cr_folders on delete cascade
                constraint fs_root_folder_folder_id_un
                  unique
);

What I see File Storage doing (in ACS 4.2), is when I unmount say, the "fs2" instance, the row in fs_root_folders for that instance magically disapears - due to the "on delete cascade" on the fs_root_folder_package_id_fk constraint, above, I believe.

However, all the content appears to be invisibly left behind, orphaned, in the Content Repository. I guess you could argue about whether you should automatically delete the content or not, but clearly, leaving behind orphaned content without even ever mentioning anything about it in the docs is a definite bug, one way or another.

Note also that since the row is gone from fs_root_folders, when the file-storage-drop.sql script does its work, it's

for v_root_folder in c_root_folders loop
    acs_object.delete(v_root_folder.folder_id);
end loop;
loop will not find this orphaned content, and it really will live on in your Content Repository, existing but unreachable, forever and ever - or at least until you drop or otherwise clean up your Content Repository.

In general, the ACS docs discussing multiple package instances always seemed weak to me at best. And while I've done plenty of debugging of package drop scripts, I've never before considered the special concerns of non-singleton packages. So I'd definitely like to hear your thoughts...

Oh yeah, here's a simple example of when multiple package instances might be useful, which is what originally got me looking into this:

The Static Pages module is currently a singleton. It is obnoxiously designed, in that it accepts only one "root folder" below which static content may live! And of course, it provides no way to chop nodes off the tree, so if you have static content living in multiple places (e.g., multiple ACS Packages), and don't want to just suck your whole file-system tree in from the server root on down, well, you're hosed.

One simple (if partial) solution is to do exactly what the File Storage package did - allow multiple package instances. Then you could use one Static Pages package instance per tree of static files that you want to make commentable, index with Site Wide Search, etc. Now, this approach may or may not be the best way to implement that feature, but given the package and package instance model of ACS 4.x, it is clearly is a valid way to do so.

So, hey, when do I get to sign you up as part of my merry band of OpenACS 4 developers? :)
<p>
Orphaning content is a big no-no.  It should always be reachable in some way.  If I remove a file-storage package I should be able to at least optionally tell it to remove all the content physically, and all the other stuff that links to that content.
<p>
It seems that one might want to be able to "dismount" a package without causing anything destructive to happen, too.  I've not looked to see what site-map does for dismounts, but taking a package offline should be an option.  Later, perhaps I'll want to bring it back online
but only visible to admins or maybe just long enough to archive it (after I add my archiving code to OpenACS file-storage!), followed by a nice healthy nuke of the files.
<p>
I think you're probably right about static pages, but have never looked into that package myself.  <a href="mailto:dave@thedesignexperience.org";>Dave Bauer</a> is the person working on static-pages for OpenACS.  He's working on mapping content to the CR (now that the CR knows about filesystem content) and
I'm sure is open to talking about other enhancements.
Don, on "dismounting" packages, yes, the Site Map already supports
that.  It's not immediately obvious, but "mounting" and "unmounting"
packages just moves an existing pacakge instance around to different
URLs, without otherwise changing the instance in any way.  You can
also have the same instance mounted at more than one place.

When you click "new application", on the other hand, a new package
instance is created.  (If the pacakge is a "singleton" and already has
one instance, then that package will not show up in the list of
possible new applications to create.)  And then even if you unmount
that instance, you still have to go to "unmounted applications" and
delete the instance, in order to really get rid of it.

That all seems as it should be to me.  However, as far as I know,
there is no hook to run an "instance create" or "instance delete
script", or anything like that.

At the lowest, package install level (controlled from the APM), there
are the load and drop sql scripts, which I imagine are sufficient.
(At least, they've been sufficient for me so far.)  And at the
highest, URL mapping level, there's probably no need for hooks to do
anything special upon mount/unmount of a package.  But at the middle,
package instance create/delete level, there's nothing, and I'm
guessing there probably SHOULD be some sort of hook there to do
package-specific stuff.  But I haven't looked into it carefully.

(Personally, as far as the Site Map UI goes, I think it's a mistake to
complicate the terminology by saying "new application" when what we
really mean is "create a new instance of an application package".
Better to use the precise term and teach admins what it means, then to
use some vauge "friendly" sounding term that could mean anything.)

As for joining your merry band, well, I think I have to at least
download Postgres and the OpenACS 4 codebase and try them out first.
All of which is on my todo list - no promises, but I do look forward
to it.  :)

On the Static Pages package, I'm probably going to try some solution
to the multiple trees issue soon.  (I've already done some minor stuff
to make it work correctly with PDF files and arbitrary MIME types -
this is on ACS 4.2 of course.)  But I haven't decided yet whether I'll
try the muliple instances way of doing it yet or not.  So I'll give
Dave a holler once/if I actually do something there, rather than just
talking about it.  :)

The general-purpose abilitiy to map file-system content to the CR
should be very cool - sounds much cleaner than the current Static
Pages solution, which is to load a copy of all the content into
Oracle, and periodically run a proc to re-scan, diff, and re-stuff
anything modifed into the CR in Oracle again.

I agree that instance deletion is a feature that is not "well specified", or at best poorly documented in the design of ACS. While the design specs require a drop script to delete an entire package, there is no corresponding mechanism that specifies what happens to the data belonging to an instance of a package once the instance is deleted.

I think we should also require that every non-singleton package must specify what to do when an instance is deleted. An explicit "on delete cascade", or a trigger in the datamodel, perhaps, that gets activated when package_id (or whatever hook to the APM is) is deleted? "On delete cascade" to take care of the cleanup is not always feasible, esp. for packages that use a service. In the case of File Storage, the data resides mostly in CR, which knows nothing about the implementation (singleton or not) of the package that uses it. If CR does not know anything about instances, it cannot "on delete cascade" based on instances. Thus, stuff on File Storage will be orphaned when a package instance is deleted, as Andrew illustrates (unless a trigger is added). It'll be interesting to look at calendar (which uses acs-events) to see how it behaves when an instance is deleted, and whether it also orphans stuff on acs-events tables.

Andrew - remember that OpenACS 4 works with Oracle as well as Postgres.  While our most favorite folks are those who feel comfortable with both RDBMS systems, there's plenty of room for folks who are only using one or the other who want to contribute.  Testing, for instance.

And there will be a lot more bug fixing activity going on in OpenACS 4 than in ACS 4.2, including the Oracle version, simply because this is where the active community lives at the moment.

This even includes writing new packages that work with one or the other RDBMS system but not both.  If such a package is attractive enough, certainly someone else would be willing to help port it to the
other RDBMS.

Also, I think that doing the kind of enhancements to static-pages you envision would be a lot easier to add for both RDBMS versions than you image.

So ... I'm not trying to twist your arm, but I do want to make sure you undestand that an Oracle user like yourself can be very useful in our environment, and that the fact that you might not have time to explore PG at the moment is no reason (from our point of view, that is) to shy away from our project.