Forum OpenACS Development: Can't delete unmounted ETP instance

Collapse
Posted by Jowell Sabino on
I'm trying out ETP, so I installed the package (everything from CVS as of 02/10/2002, from scratch install) and mounted it on a subdirectory on the site map. Once I am done playing with it, I unmounted the instance and tried to delete it. I won't let me, with the relevant error message:
ERROR: cr_flder_pkg_id_fk referential integrity violation - key in 
apm_packages still referenced from cr_folders
I also mounted, unmounted and tried to delete an instance with nothing in it, so the problem is not because the ETP instance has user-added content. I tried to search the forums and it seems nobody yet has encountered this (no ETP yet in SDM, either). Now the constraint is an add-on to acs-content-repository specifically for ETP, which can be seen towards the end of acs-content- repository/sql/postgresql/content-create.sql:
alter table cr_folders
add constraint cr_flder_pkg_id_fk foreign key (package_id) references 
apm_packages (package_id);
  • Should we add "on delete cascade" to this constraint to fix the problem? That is, if the ETP instance is deleted from APM, the "on delete cascade" will also take care of deleting stuff in cr_folders? The problem with this however is that ETP entries in cr_folders are referenced by entries in cr_items and cr_revisions, so there will be referential integrity violations that Postgres will complain about.
  • There is already a constraint "cr_fldr_pkg_id_fk" (note the different spelling from cr_flder_pkg_id_fk). How is this constraint different? Can we clean up the data model so that the required constaint is in the cr_folders table definition, and not a separate, added on constraint? We're still in alpha. 😉
Collapse
Posted by Don Baccus on
Not for long we're in alpha ...

We need to get this cleaned up.  You can work on it by simply using the drop script in the sql/postgresql directory directly, that goes a bit quicker than working through the APM.

Luke, Jun ... can one of you dig into this?

Getting drop scripts working correctly is tricky.

aD, in the beginning of ACS 4, chose against widespread use of "on delete cascade" to clean up datamodels, an unfortunate choice IMO.  Oracle in 8.1.5  and maybe later (?) didn't/doesn't support "on delete set null" or "on delete set default", also useful in maintaining the integrity of data stored in a datamodel.

Later aD started using "on delete cascade" in more contexts but incompletely.

All this makes getting drop scripts right annoyingly tricky.

As far as whether or not the constraint is redundant or not, I don't know, and am a bit too busy to look into this at the moment.

In general, though, one package shouldn't be changing the datamodel of another package, that's mildly evil.  The whole point of having a package abstraction is to have a package abstraction.

Collapse
Posted by Jun Yamog on
Hi Jowell,

This will not be a real help.  Basically what you are seeing is what I experience too.  So the drop script will not work as well.  On ETP deleted stuff are just moved into a trash bin folder on cr_folders.  The cr_folder folder_id is -400.  From what I remeber Luke P has decided for this trash bin stuff because there was something wrong with CR data model.  So I think he is waiting for DanW or somebody to fix this.  I am not yet good enough to debug at this kind of level nor  do I have time to study right now.  Although from what remeber file-storage is able to clean the cr_items, etc. cleanly.  So I guess being the file-storage guy you may apply what you do there in ETP.  Please post a bug fix if possible.  I would like to integrate that on my mod-ETP.  Not much help but I hope it does help even a little bit.  Thanks.

Collapse
Posted by Jowell Sabino on
Ok, I think I understand now. The table definition for cr_folders changed. In the most current version (1.35) of acs-content-repository/sql/postgres/content-create.sql, cr_folders is defined as
create table cr_folders (
  folder_id	    integer
  -- removed due to postgresql RI bug which causes deletion failures.
  -- replace with user triggers
  -- DanW (dcwickstrom@earthlink.net)

  --		    constraint cr_folder_id_fk references
  --		    cr_items on delete cascade
		    constraint cr_folders_pk 
                    primary key,
  label		    varchar(1000),
  description	    text,
  has_child_folders boolean default 'f',
  has_child_symlinks boolean default 'f',
  package_id integer 
  constraint cr_fldr_pkg_id_fk
  references apm_packages
);  
In previous versions (at least in v1.26 that I still have a copy of), there is no column "package_id" referencing apm_packages. Thus, I suspect that the add-on contraint found at the end of the file
alter table cr_folders
add constraint cr_flder_pkg_id_fk foreign key (package_id) references apm_packages (package_id);
was mistakenly left behind when cr_folders changed. Given the new table definition of cr_folders, this last contraint is redundant and can be taken out.

Don, I still think that there may be specialized situations when "on delete cascade" is useful, particularly on this one. We have acs-content-repository and APM (in acs-kernel) as two separate packages, but they must somehow "communicate" with each other when packages are installed and deleted. Installation is a separate problem, but let's see what happens when an instance of a package is deleted, say ETP. APM will delete the entry with package_id in the table apm_packages corresponding to the ETP instance. Without "on delete cacade", there is no mechanism in ACS that allows APM to tell acs-content-repository, "Hey, this instance of ETP is being deleted, clean up the contents of this ETP instance in the database." They are distinct packages, APM was built without any prior knowledge of what it needs to tell other packages like acs-content-repository what to do when APM does something.

I view "on delete cascade" as a way for a package to receive communication from another package about actions that affect it, without altering the datamodel of the other package (in this case, APM in acs-kernel). File-storage solved (I should say, hacked) the problem by constructing a table that maps package_id and folder_id. Note that this was necessary because previous versions of cr_folders didn't have the package_id attribute. This file-storage table specifies "on delete cascade" on package_id, so that when the package instance is deleted by APM, file-storage is informed about it. File-storage has "on delete" triggers that cleans up the contents of the file-storage instance when APM tries to delete it (otherwise, the contents are orphaned, which is an even nastier bug that simply refusing the deletion). I won't go over the code here -- it's on the current CVS.

Note that "on delete" triggers are defined by the file-storage package, so it does not alter in any way the acs-content-repository datamodel. I think the triggers actually belong to acs-content-repository because the mechanism can also be used by ETP. The nice thing though about having package-specific triggers is that it allows the package writers flexibility on what to do when cleaning up. File-storage is actually a bit moronic in this regard, because it simply deletes all content when the package instance is deleted. I would prefer archiving it somehow in acs-content-repository, while still allowing the instance to be deleted. Or do what ETP does: put them in the "Trash" folder. Of course, we still need to clean up the "Trash" folder if the package (and not just a package instance) is totally deleted.

Now about the problem on installation: when a package_id is inserted in apm_packages coresponding to the installation of a package instance, packages that are built on top of acs-content-repository must also do other automatic stuff, like create folders (e.g., the "root folder" in file-storage). Again, there is no way for APM to tell acs-content-repository, "Hey, I just created an instance, go ahead and create the folders you need". File-storage again has a hack to solve this (see the file-storage-proc.tcl file), but it would be preferable to have in APM a way for a package to register an "initialization" function (a pgsql function) that gets called when a package instance is created.

Collapse
Posted by Jun Yamog on
Wow! lots of good info to be picked up especially not having time to
read code.

Anyway I will try to contribute some info too that might help.  ETP
is also a hack in terms of creating the folder etc from I remember.
Basically ETP gets mounted by APM or site-map.  On first time that
ETP package is accessed it detects the folders are not present.  The
cr_folders is created along with the initial contents on cr_items
and cr_revisions for the index page.

Since given the post above that there is no formal way for APM or
maybe site-map to initialize a package instance.  So file-storage
and ETP has its own way of doing things.  A good canditate for
service contract?  Since we are close to beta I think we should
already make a list of things that must be pushed to the next
release.  It would also be great if people would start experimenting
for the next release.

Hey Jowell your Pinoy right?

Collapse
Posted by Don Baccus on
Jowell ... personally I lobbied for systematic use of "on delete cascade" when the issue was originally discussed in the context of ACS 4 Classic.  Folks at aD disagreed and the tradition of "by hand" deleting of rows in related tables was continued.  This always led to problems with deleting stuff in ACS 3 as folks continuously forgot to add delete statements to "nuke-user" and other content-deleting scripts and it's led to similar problems in ACS 4, especially with drop scripts.

Systematic use would greatly simplify the maintenance of drop scripts and the like.  It is part of SQL's referential integrity system for good reason.

My comment, then, is that the current mishmash where the feature is used  in some places, but not systematically, adds to the general PITA nature of trying to fix bugs in drop scripts or anything else that deletes content.

Unfortunately changing the core to use SQL's full referential integrity features would be difficult due to the number of packages, including packages already written and being used on clint sites by Early Achievers.  They'd all have to be modified.

BTW - glad to see you posting here again!

Collapse
Posted by Luke Pond on
I checked in changes to the drop scripts so that the package_id references in the content created by ETP are eliminated.  I'd prefer not to delete content from the CR when deleting the ETP package, but this way it is possible to delete the package from your system.

Can someone test the ETP create script on Oracle?  I just added one line to check if the Trash folder at -400 already exists.  This lets you re-install the package without getting an error.

Thank you!

Collapse
Posted by Jowell Sabino on
Hi Luke,

Actually, the problem I am referring to concerns the deletion of an ETP instance, not the ETP package. Setting the package_id to null in the drop script still won't allow me to delete the ETP instance, because this action will be performed only upon removal of the entire ETP package. I don't think there is a way around this unless we specify in cr_folders:

create table cr_folders
...
package_id integer references apm_packages on delete cascade;
where "on delete cascade" will signal ETP when a package instance is to be deleted. Then, the ETP instance contents can be archived in content repository by defining a delete trigger that sets package_id to null:
create function etp_delete_trig() returns opaque as '

update cr_folders
set package_id = null
where package_id = old.package_id;

return old.package_id;
end;' language 'plpgsql';

create trigger etp_delete_trig before delete
on cr_folders for each row
execute procedure etp_delete_trig();

Gosh, now that I wrote it down, I realize this is really ugly because:
  • we need to change the cr_folders table definition to add "on delete cascade", which is part of the acs-content-repository datamodel and not ETP
  • we just created a trigger on an acs-content-repository table, which is not part of ETP (there goes package abstraction...)
So why not mimic the design of file-storage, where a separate table is created that maps package_id in APM and the folder_id in cr_folders? This approach maintains package abstraction, in that it does not alter in any way the datamodel of acs-content-repository. If we do the same for ETP, there will be a table in the ETP create script, like
create table etp_to_cr_map (
   package_id  integer references apm_packages on delete cascade,
   folder_id integer references cr_folders
);
which gets an entry whenever a package instance of ETP is created (the package_id in cr_folders will not get used). Since we want the contents of the ETP instance to remain in content repository, "on delete cascade" will simply break the mapping between the package instance and the contents in CR when the ETP instance is deleted (we may have to set folder_id to null first through a trigger to avoid RI violations, but still the trigger will be on an ETP mapping table, not an acs-content-repository table).

A separate issue though is if we really want ETP content to be orphaned in acs-content-repository when the ETP instance is deleted. I think the better solution is to put content under the "Trash" subdirectory when an ETP instance is deleted (again, using a trigger), but clean up everything under "Trash" when the entire ETP package is deleted (through the drop script).

See: apm_package_instance_new.

Create the proc etp_post_instantiation {package_id} { ...
and do whatever.

Collapse
Posted by Stephen . on
What a lot of pain and suffering. To protect against someone deleting things accidentally?  How about a big warning message that says "Are you REALLY SURE you want to delete this package (50MB of data)?" and then we can all just use on delete cascade|set null?
Collapse
Posted by Jowell Sabino on
Wow... we really need to document APM carefully. Indeed, you can create a tcl proc that does stuff after an instance is created. Code for apm_package_instance_new is in acs-tcl/tcl/apm-procs.tcl. If this proc was already in APM since 4.0, the original author of file-storage never heard of it. Never heard about it myself, too. Thanks, Stephen.

About the warning message before a package instance is deleted, is there a proc that allows a package to tell APM, "hey, could you display this message before you delete the package instance?". Deletion of the package is from APM, not from the admin page of the package.

Collapse
Posted by Stephen . on
Not as far as I know, but we could just add a generic 'are you sure?' page to the apm, and add call backs simillar to the *_post_instantiation scheme if we wanted to get fancy.
Collapse
Posted by Mark Aufflick on
Ok, so having the same problem, and I REALLY needed to delete an etp instance (so it wouldn't get indexed), I figured out the following manual process (assuming a single page in this case):

  • unmount the etp instance in the site-map
  • figure out the item_id of the page with
    select * from cr_items where name = 'your page name';
  • delete that cr item with: (substitute your item_id)
    select content_item__delete(3130);
  • delete the cr_folder associated with your instance - I used the cms packages gui interface, but i'm sure you could use a plpgsql function. Note that this won't work until you have deleted the page as above.
  • now you can delete the unmounted app in site-map/unmounted
phew! the question is, where to automate this - SO ugly to do it specifically in site-map/instance-delete... if we don't want to do cascading, maybe we make a new naming standard for an "instance-destructor" function which could be run (if it exists) by site-map/instance-delete

if someone important (ie. not me) decides how it should be done I am happy to implement it.

Do I win the record for number of posts on christmas eve?! (noticing the time stamp tells me there should be a user_timezone preference...)

...the angel said to them, "Do not be afraid, for behold, I bring you good tidings of great joy which will be to all people. For there is born to you this day in the city of David a Savior, who is Christ the Lord. And this will be the sign to you: You will find a Babe wrapped in swaddling cloths, lying in a manger."
And suddenly there was with the angel a multitude of the heavenly host praising God and saying:

"Glory to God in the highest, And on earth peace, goodwill toward men!
Merry Christmas everyone!
Collapse
Posted by Jun Yamog on
Hi Mark,

You can put those into a plsql function.  This is what I did when I had to nuke out 90% of a site and keep 10% alive.  It took me about 3 days to do it.  Its a real short coming of OpenACS when it comes to removing data on a per instance basis. The per package is problematic enough, the per instance basis is really bad.  As of now what you are proposing seems to be logical, but if you need a lot of deleting to do now.  Run the it using a plsql function.  Also if you have CR item on the file system, make sure to run the sweeper to nuke out the file in the file system.

Collapse
Posted by Lars Pind on
So here's my variation of above ...

Trying to delete the instance, I get an error like this:

Database operation "0or1row" failed (exception NSDB, "Query was not a statement returning rows.")

ERROR:  cr_fldr_pkg_id_fk referential integrity violation - key in apm_packages still referenced from cr_folders

SQL:
    select apm_package__delete('2272');

Then I go to say:

openacs-4-6=# select etp__get_folder_id(2272);
etp__get_folder_id
--------------------
              2277
(1 row)

Then you know the folder_id. You'll delete that one after you've deleted all the contents.

openacs-4-6=# select * from cr_items where parent_id = 2277;
item_id | parent_id | name  | locale | live_revision | latest_revision | publish_status |  content_type    | storage_type | storage_area_key |      tree_sortkey
---------+-----------+-------+--------+---------------+-----------------+----------------+-------------------+--------------+------------------+--------------------------
    2282 |      2277 | 2    |        |          2284 |            2284 |                | news_item        | text        | CR_FILES        | 000000100000010000000001
    2278 |      2277 | index |        |          2279 |            2285 |                | etp_page_revision | text        | CR_FILES        | 000000100000010000000000
(2 rows)

So we delete those:

openacs-4-6=# select content_item__delete(2282);
NOTICE:  Deleting symlinks...
NOTICE:  Unscheduling item...
NOTICE:  Deleting associated revisions...
NOTICE:  Deleting associated item templates...
NOTICE:  Deleting item relationships...
NOTICE:  Deleting child relationships...
NOTICE:  Deleting parent relationships...
NOTICE:  Deleting associated permissions...
NOTICE:  Deleting keyword associations...
NOTICE:  Deleting associated comments...
NOTICE:  Deleting content item...
content_item__delete
----------------------
                    0
(1 row)

openacs-4-6=# select content_item__delete(2278);
NOTICE:  Deleting symlinks...
NOTICE:  Unscheduling item...
NOTICE:  Deleting associated revisions...
NOTICE:  Deleting associated item templates...
NOTICE:  Deleting item relationships...
NOTICE:  Deleting child relationships...
NOTICE:  Deleting parent relationships...
NOTICE:  Deleting associated permissions...
NOTICE:  Deleting keyword associations...
NOTICE:  Deleting associated comments...
NOTICE:  Deleting content item...
content_item__delete
----------------------
                    0
(1 row)

Then we can delete the folder:

openacs-4-6=# select content_folder__delete(2277);
NOTICE:  Deleting symlinks...
NOTICE:  Unscheduling item...
NOTICE:  Deleting associated revisions...
NOTICE:  Deleting associated item templates...
NOTICE:  Deleting item relationships...
NOTICE:  Deleting child relationships...
NOTICE:  Deleting parent relationships...
NOTICE:  Deleting associated permissions...
NOTICE:  Deleting keyword associations...
NOTICE:  Deleting associated comments...
NOTICE:  Deleting content item...
content_folder__delete
------------------------
                      0
(1 row)

Now if you hit reload on the delete instance page that we started with.

With the new APM enhancements that we've done recently, we shuold include this in an before_uninstantiate procedure.

/Lars

Collapse
Posted by Lars Pind on
And then a slightly more tight variant ...

select content_item__delete(item_id) from cr_items where parent_id = etp__get_folder_id(2249);

(2249 is the package_id of the package you're trying to delete)

select content_folder__delete(etp__get_folder_id(2249));

/Lars