Forum OpenACS Development: Best practice for marking objects as deleted without removing them from the database

I need to have a way to mark contacts as deleted while keeping them in the database (i.e. the do not show up in the regular UI any more, but there is an option for admins to resurect them from "the trash"). I need this for person, user and organization objects now, but have needed it for many other object types in the past. Any object that is a cr_item already have a generic way to do this by marking the item as "expired" in the publish_status column. Objects taht are not cr_items do not have a way of doing this.

Possible Solutions:

  1. Add deleted_p/status column to the parties table (which would need a TIP). Advantage: Its simple, doesn't require a new table, etc. Disadvantage: I need to maintain a column in that table, and any other objects in the future that aren't cr_items need a seperate deleted_p/status column
  2. Add deleted_p/status column to the person and organizations tables and using the member state for group membership in registered users for user objects. Advantage: changes to the tables for the appropriate objects (except users who use group membership for status info). Disadvantage: need to maintain different ways of deleting different parties, which can create code bloat.
  3. Add deleted_p/status column to the acs_objects table (which would need a TIP). Advantage: any object can be marked as deleted/expired/whatever and all packages can use this one place for keeping track of their status. Disadvantage: its a change to the most important table in openacs, and cr_item objects, as well as many other object_types already have deleted_p flags in their own tables. it also requires a join to find out whether or not that object is deleted when you do not need other information from the acs_objects table
  4. Add a parties_status/deleted_p table, with a party_id as only column and status/deleted_p as the other. Advantage: No changes need to be made to core. Disadvantage its not as fast as finding all parties that are marked deleted_p in the parties table (since it requires a join)
  5. Since the contacts package makes party objects into cr_items (i.e. the cr_item item_id is the same as the party_id) for version control purposes when attributes are edited, I could also simply use the publish_status column in cr_items. Advantage: no new tables needed. Disadvantage: Parties that haven't been made into cr_item-parties cannot be marked deleted, which means that contacts that are not cr_items would have to all be made into cr_items before deleting. Also, this solution, unlike adding a deleted_p/status column to acs_objects doesn't work for other objects in the future.

The reason I said deleted_p/status column is because I'm also wondering what is the better approach for that. A true/false deleted_p column is simple and fast for the database, but a status column with a check for various statuses is more flexible.

By the way, I did a few simple greps on a head packages checkout and the following packages (and may have missed some) all had some form of marking objects as deleted in them without actually removing them from the database: acs-content-repository, acs-kernel, acs-lang, cms, contacts, dotlrn, ecommerce, lars-blogger, news-aggregator, photo-album, poll, project-manager, weblogger-portlet, webmail. This appears to be something that would be generically useful...

Am I missing some options? Which method is preferable?

Matthew,

Before I respond, a quick question relating to your proposal number 5. Why are some parties contact details considered to be not suitable to be revision controlled? That is, why are some contact objects party_id=cr_item_id and some not?

R.

Parties that are created by packages other than contacts are not necessarily cr_items (all parties created by contacts are also cr_items). So, if a user creates an account and logs into the website they are an user object (which is a person and party object as well) but not they are not part of contacts cr_items until somebody uses the contacts interface to edit that users attributes. Same goes for organizations that are created via the organizations package (the new version of project manager uses contacts for organization management so people that use the HEAD version will have cr_itemed organization_ids).
Right. So that is the way we are going with this then - to cr_itemise parties, users and organisations.

Maybe the correct answer then is to file a TIP proposing to mod the user registration code to use the content respository. This would have the immediate benefit of keeping a log of old registration data and would avoid the situation where your data lives in or outside the CR depending upon which packages you install.

Since registration is a core function I would think that the CR support should go in to the core so that contacts and pm and other 'contrib' packages can use the existing model.

What do you think?

R.

I'm working in a new module with which you can move, copy objects from one community from another and "delete" objects but keeping them in the database. That means, i've just started to develop similar functionalities so we can put our ideas together.

By now, i have only written the specifications, so i do not have any code, but it will come soon.

I think it will be really helpful to have a general way to delete or mark deleted. I had this problem too, I went to the path of the deleting the data. Another issue will be search, we want search not to output items/objects that are marked deleted already.
I don't think making more object types subtypes of content revision is going to be the best solution here. "Expired" does not mean deleted and I don't think you should change that. In reality there are no built in semantics for what the publish status means. Another issue with Cr_items is NAME since you can't have duplicate names, if you mark something deleted you can't have another object with the same name. If you move the items to a "deleted items" folder you can't delete more than one item with the same name, so using the CR isn't solving your problem, its actually making it more complex.

I think to break this down and get back to basics there are two solutions

1) Application specific code to "inactivate" objects.

2) Generic acs_object code to inactivate objects.

Of course choosing #2 does not stop applications from extending the generic feature for a specific application.

So the question is, do we want to add a column to acs_objects or otherwise extend acs_objects.

Dave,

Yes really good points - I forgot about the unique constraints. In fact I had hoped to catch you on IRC before posting!

In point of fact I think that the plan would not have been to use expired_p for the reasons you stated but to create a deleted_p. However this idea really only takes care of new visitors who enter some registration details so still leaves the problem of a general solution open - and in fact would make the general solution harder to implement in the future.

The most elegant solution as far as I can see is in fact to add delepeted_p to acs_objects since this is the correct place semantically to store object meta-data.

If we are dealing with the addition of a single dichotomous value (i.e. a small integer column) I cannot imagine that this would be a big performance hit. Certainly it would be better than storing ams attributes for this purpose on a package specific basis.

The devil would probably be in the way that other applications request the information from the core. The field may need to be carried throught to another table or two in order to avoid joining to acs_objects.

One other issue would be how this would impact the permissions tables. Is there a cheeky way of using the permissions system to tell you that an object is deleted without visiting acs_objects (a magic permission for deleted objects) or do we take all steps to ensure that any change to acs_objects is designed to avoid touching the permissions tables at all?

R.

While I cringe at the prospect of another change to acs_objects, deleted_p does appear to be general enough to belong there.
Richard said "The devil would probably be in the way that other applications request the information from the core. The field may need to be carried throught to another table or two in order to avoid joining to acs_objects.

One other issue would be how this would impact the permissions tables. Is there a cheeky way of using the permissions system to tell you that an object is deleted without visiting acs_objects (a magic permission for deleted objects) or do we take all steps to ensure that any change to acs_objects is designed to avoid touching the permissions tables at all?"

There is nothing wrong with joining on the acs_objects table. Most list of objects type pages already join on acs_objects to get creation user, and last_modified at least. So adding an addition "and deleted_p = 't'" or whatever should not be a serious issue.

I also don't think you would need to do anything clever with the permissions system. If you had a certain privilege over and object marked deleted, you should be able to "undelete" it. I can't see having a column as opposed to just deleting the object, if you can't recover it somehow.