Forum OpenACS Development: Bug in .LRN/openacs group memberships

Collapse
Posted by Dave Bauer on

I have found a problem in .LRN (at least)
where when an administrator from the group is removed as administrator the rows in party_approved_member_map are not removed.

This may exist in OpenACS also. I have not tested yet. (Most likely it would not appear in OpenACS since it doesn't use rel-segments heavily by default).

If you run this query

select r.rel_id,m.segment_id,m.group_id,m.* from
(select m.*, s.*
from party_approved_member_map m,
rel_segments s
where s.segment_id = m.party_id
and s.rel_type='dotlrn_admin_rel') m
left join
acs_rels r
on (r.object_id_one = m.group_id and r.object_id_two = m.member_id)
where r.rel_id is null;

If you get any rows it shows there are rows in party approved member map for a group membership without a row in acs_rels for that membership relationship.

The result is a user is still part of the admin relational segment while not a member of the group.

Collapse
Posted by Dave Bauer on
I have reproduced on two .LRN installs.

The query returns 0 rows on openacs.org.

Collapse
Posted by Dave Bauer on
Oracle version of the query that shows this problem

select r.rel_id,m.segment_id,m.group_id,m.* from
(select m.*, s.*
from party_approved_member_map m,
rel_segments s
where s.segment_id = m.party_id
and s.rel_type='dotlrn_admin_rel') m,
acs_rels r
where r.object_id_one(+) = m.group_id
and r.object_id_two(+) = m.member_id
and r.rel_id is null;
Collapse
Posted by Tom Jackson on
But this query would never return rows in an OpenACS install, because it relies on the dotlrn_admin_rel, which I hope is not part of OpenACS. So the bug, whatever it is, could still be in the underlying code.

How are the admin rels added/deleted in this case, is it a special pl procedure for dot_lrn? Maybe post them if this might cause the problem.

Also, could you print out the lines showing the results, not just the count, I'm having trouble figuring out your query. How do you select an acs_rel with rel_id null? Is there some magic in this query?

Collapse
Posted by Dave Bauer on
Right you are Tom!

I built the query from the ground up to show the error. Thanks for catching that. I'll try it across all rel_types to see what happens.

Collapse
Posted by Dave Bauer on
Tom,

To print out the acs_rels with null rel_id, it does a left join. The point is, there is a row in party_approved_member_map but the user is not a member of the group according to acs_rels. The rel_id is null just filters out the ones that have a missing acs_rel.

I will post more when I look into how the rels are added and deleted and see if I can reproduce this.

Collapse
Posted by Dave Bauer on

Here is the function that is having the problem:

create or replace function party_approved_member__remove_one(integer, integer, \
integer) returns integer as '
declare
  p_party_id alias for $1;
  p_member_id alias for $2;
  p_rel_id alias for $3;
begin

  delete from party_approved_member_map
  where party_id = p_party_id
    and member_id = p_member_id
    and tag = p_rel_id;

  return 1;

end;' language 'plpgsql';

There are rows in party_approved_member_map where tag is 1 instead of the rel_id. I have not figured out where these came from yet.

Collapse
Posted by Gustaf Neumann on
Dave,

we just looked into the problem on our learn server. Although the original query (https://openacs.org/forums/message-view?message_id=1172094) returns a few thousand entries, we seem to have a different situation, since we have there a value for tag of 0 and not 1 as in your case. For us, all these entries have party_id = member_id and tag == 0. These entries come most probably from parties_in_tr(), which adds identity rows when parties are created. These parties are not users in our case. Since we delete only seldomly parties, these entries are there, they would be removed trough parties_del_tr(). So, I tend to believe that for our site, everything seems fine (but we have many differences from dotlrn, even in the data model).

Tom said

Offhand I would say that there is some confusion with tag.
.... the basic problem is that the field does not have a reference constraint ..... data model problem which was somehow abused by certain applications ...
In principle agreed. The name "tag" is bad, it is as well bad, that there is no fk. Others might know better than me (i am not a veteran from the ad days), but from looking at the code, i got the impression, the basic idea might have been, that "tag" is some arbitrary "client_data" (for maybe multiple purposes). In todays version it is used only for "0" or a rel_id.

Since the key of party_approved_member_map consists of (party_id, member_id, tag), tag can't be a NULL value, therefore a rel_id for the identity rows should be created (an identity rel) and added instead of the 0 when parties are added. Renaming tag to rel_id would help to avoid confusions. These are just my 2 cents.

This does most probably not help Dave with his original problem. Dave, do you have any idea, where the entries with tag=1 are added? Is there anything special about the party_ids and member_ids (are they equal, ancient, etc.)?

-gustaf

Collapse
Posted by Dave Bauer on
This query shows that pretty much every rel type can develop this problem.
select distinct object_type from acs_objects o, acs_rels r, party_approved_member_map m where m.tag = 1 and m.party_id=r.object_id_one
and m.member_id = r.object_id_two
and o.object_id = r.rel_id;

         object_type
------------------------------
 dotlrn_admin_profile_rel
 dotlrn_admin_rel
 dotlrn_ca_rel
 dotlrn_cadmin_rel
 dotlrn_external_profile_rel
 dotlrn_guest_rel
 dotlrn_instructor_rel
 dotlrn_member_rel
 dotlrn_non_guest_rel
 dotlrn_professor_profile_rel
 dotlrn_student_profile_rel
 dotlrn_student_rel
 dotlrn_ta_rel
 membership_rel
(14 rows)
Collapse
Posted by Dave Bauer on
I can't seem to (so far) reproduce a new entry into party_approved_member_map with tag==1, although if I check my database, I can see a new membersip_rel created today that has a tag of 1 in the party_approved_member_map.

Reviewing all the triggers they look ok to me. Everything passes around the actual rel_id from the membership_rel row the trigger runs on.

Collapse
Posted by Tom Jackson on
Offhand I would say that there is some confusion with tag. Why would a pl function take in something called a rel_id (p_rel_id), and equate that with a tag. In my mind, a tag is a string, but it must refer to something else, and the basic problem is that the field does not have a reference constraint. If it did, then only valid tags could be used, and then you could guarantee that the tag was unique. The pl function takes in a compound primary key, but the tag values are not enforced correctly to allow it to be part of a compound primary key.

So partly this looks like a data model problem which was somehow abused by certain applications. If the data model problem was removed, developers would have discovered that things didn't work very quickly, and would have fixed their code. But the API that was listed above is not necessarily in error. It might be the one correct piece in this, hard to say just yet.

I ran these queries with an old 5.0... version of OpenACS and didn't find any tags = 1, no object_id = 1. So I can't add anything to the meaning of the 1.

Collapse
Posted by Dave Bauer on
It looks like when the tag == 1 it is the identity row for the party. That is every party is a member of itself in the party approved member map.

That is how its "supposed" to work as far as I can tell.

I have many rows besides those also that have tag == 1. Again the problem is the trigger to delete from party_approved_member_map relies on the tag matching the rel_id that is deleted.

Collapse
Posted by Dave Bauer on
I can not find anything special. I checked and a new membership_rel had been created yesterday and the party_approved_member_map has tag == 1.

I have tried to reproduce the probem using the user interface to add/remove people in a community and change their role, but I haven't been able to reproduce it yet.

Collapse
Posted by Gustaf Neumann on
how comes that the tag is one? Maybe a missing update?

we have:

create or replace function parties_in_tr () returns opaque as '
begin

  insert into party_approved_member_map
    (party_id, member_id, tag)
  values
    (new.party_id, new.party_id, 0);

  return new;

end;' language 'plpgsql';

Collapse
Posted by Tom Jackson on
I have this tag in a very old version of the trigger, so I would guess that there must be another source for the 1. This particular attribute seems ripe for a bug. First, tag is part of a primary key, second it is never used by that name in the functions which use it. But the fact that it is a primary key should be a big flag that this needs to be an fk from (I guess) acs_rels. Then from the rel_id you could figure out the rel_type, knowing the rel_type, you can figure out a lot of other things about the approved member.

The more you look at this code, the more questions arise. What is the purpose of the trigger here? the party is a member of itself. Is that why the rel_id is bogus? Maybe there needs to be an identity_rel. party = member, or we are me?

But I still haven't seen any obvious bug, confusion, but it looks more like a mismatch between new and delete.

Collapse
Posted by Gustaf Neumann on
Tom wrote
But the fact that it is a primary key should be a big flag that this needs to be an fk from (I guess) acs_rels
Neither 0 or 1 are rel_ids (in acs_rels). So, if there would be a FK for tag, then there should be a different value chosen for the tag in case of the identity (the hypthical "identity-rel" mentioned above, which might be another magic negative value).

The purpose of the trigger is to establish the identity relation (party is approved member of party), which is there for achieving transitive closure (my interpretation, only interesting for transitve memberships).

However, this just source for confusion, not the solution for dave's problem, since we still don't know where the "1"s are from. Are there other dotlrn sites, with

select count(*) from party_approved_member_map where tag = 1;
or is dave's site the only one?
Collapse
Posted by Tom Jackson on

Okay, here is an idea. Given the imaginary identity relationship which causes the row to be inserted into party_approved_member_map, I don't think that the original statement of the problem matches up this. The party_in_tr is inserting the identity, who knows why it is there, since it could be assumed that every party is that party. But the only way to remove it is to delete the party. The party_del_tr deletes all rows in party_approved_member_map regardless of tag. But the relational segment triggers only delete the rows which have a actual acs_rels.rel_id. So the imaginary tag prevents the removal of this identity row. Obviously this should be necessary. This is the trigger statement that leads me to this idea:

delete from party_approved_member_map
  where party_id = old.segment_id
    and member_id in (select element_id
                      from group_element_index
                      where group_id = old.group_id
                        and rel_type = old.rel_type);

Anyway, in this case, the row is added by a trigger on parties, but 'not removed' by a trigger on rel_segments. This seems correct. But if someone uses a tag which corresponds to a real acs_rel, the identity row may be removed (maybe, not sure, maybe the tag should be changed to the party_id, since we know that id will never be an acs_rels.rel_id, or is there a guarantee that 0 will never be an object_id?)

So my question is: the data exists, but what problem is being caused by it? We never got any actual data rows, so I'm still not sure what would need to be fixed.

On a side note, Dave...the hardest bug to find is one which doesn't exist. But these types of bugs are easier to fix if you consider the possibility! Let's reconsider the actual issue.

Collapse
Posted by Dave Bauer on
I checked some more dotlrn installs and only one has tag == 1, so far.

It looks like there should be rows in party_approved_member_map without a corresponding membership_rel. The number should equal the number of parties. If it does not then that is where the problem is occuring.

Collapse
Posted by Dave Bauer on
Here is what I have learned.

On this one install

There are rows in
party_approved_member_map
which are not identity rows
which do not correspond to any group or relational segement.

That is. A user is a member of the dotlrn_admin_rel relational segment (according to the party_approved_member_map) without being a member of the community.

what happened was

1) user was an administrator of community, but for unknown reason the row in party_approved_member_map has an invalid tag of 1.
2) The user was later removed from the community, and the membership_rel was removed. The row in party_approved_member_map was not removed because the plsql procedure did not remove the row because the tag did not match the rel_id that was deleted.

Collapse
Posted by Tom Jackson on
So what does it mean for this row to exist? And what is the row data (rel_segment_id, user_id, 1) or (group_id, user_id, 1)? That might help track down where it gets in and what it is there for, but if the tag is not a membership_rel, I'm not sure what the effect would be unless some code ignores the tag to grant some permission.
Collapse
Posted by Dave Bauer on
The offending rows are (rel_segment_id,user_id,1)

The actual problem is that the user appears as a member of the relational segment.

This affects permissions as the dotlrn_admin_rel relational segment have admin over the community.

Collapse
Posted by Tom Jackson on
What kind of member? The downstream functions should distinguish this based upon the tag. Othewise, why would the tag exist? I think it exists because there can be more than one type of membership, or whatever this means. But there must have been another row with the same rel_segment_id, user_id, but with a real rel_id, no? Strange stuff.

If I'm just confusing the situation, ignore my comments.

Collapse
Posted by Dave Bauer on
Tom, no problem, the questions help me think.

No there was not any additional row.

Each dotlrn user has one acs_rel, either a dotlrn_student_rel or dotlrn_admin_rel for each community membership.

A community is a group. This means the user gets one row in acs_rels regarding their membership in the group.

The membership in the group triggers an entry in party_approved_member_map with (group_id,user_id,rel_id) and the trigger also loops through all the relevenat relational_segments. So a user also gets a row with (rel_segment_id,user_id,1). At least that is what happened in the past. Now the row is (rel_segmetn_id,user_id,rel_id).

There isn't a problem since the rows are unique.

The membership_rels_in_tr function (insert trigger)
calls party_approved_member_add function which calls
party_approved_member_add_one
for the party in the acs_rel.object_id_one
and loops through the relational segments and calls party_approved_member_add_one for each relational segment.

Its only the relational segment rows that have the invalid tag.

I looks like this is legacy data from an old bug and it just needs to be cleaned up. I can't produce a new acs_rel with this problem.