Forum OpenACS Development: username is not unique on users table. Should we change it?

Hi everybody,

I'm wondering how it's possible that username has no unique index on users table. Is there a reason for that? Even if we have more than one authority, the user is registered with the authority where he came from. So, I don't see any reason so it can't be unique.

I' thinking about add an unique index to users table on username. Any comments on that?

Even if the users are in different authorities, the username, along with several other fields (including authority_id) are in the users table.

If you make the field unique, then you will almost certainly break more things than you fix. If the username was forced to be unique, then we couldn't store usernames in that table for external authorities. If we didn't have that, where would we get the username for a user_id if we needed it? Not to mention all the views that have it as well, such as cc_users.

Create a unique (authority_id,username) index, I am surprised that doesn't already exist. Actually I am pretty sure there is already a unique constraint on that. Please check. It can be duplicated across authorities so that is why it can't be unique for the entire users table.
Hi Dave and Nathan,

Thank you for your replies. In my OpenACS version (5.5) there's an unique index for authority_id or username. However, I can't see any reason for this index to exist, and I'll explain you my reasons.

My understanding from the datamodel was the same as proposed by you Dave: unique index on authority_id and username. However, I've tested a lot of situations and got to the following cases:

1 - User is on external authority and it's not on local authority

On this case, a new user is created on local authority, but the users table shows that the user came from the external authority. This means: there's an entry on the users table (of course, the user has to be created in order to login) that points to the external authority.

We could use an unique index for username here, because the table indicates a single entry in the users table.

2 - User is in local authority but it's not on external authority

The system usually doesn't deal with this situation, because if you choose to login on external authority it'll give you a message saying you don't have an account on external authority. If you choose local authority, the login will happen normally.

However it doesn't seem good to me the fact that user can choose authority. In a regular navigation, users don't care wich domain they're logging in; they just want to access the system. I'm changing acs-authentication (I actually did it already) so you can login only in first authority from the list. If the authority is external and user is not found there, check if the user is local. If it's local, add it to external authority. I've tested it with LDAP and it seems to be working fine. I can provide a patch if more people are interested.

On this case, username can be unique too. The authority_id is going to point to local authority and user will have accounts on all authorities.

3 - User doesn't exist on any authority

On this case, user is added to both authorities, and users table will hold local authority for authority_id.

As you can see, I couldn't find any case where the username is duplicated on users table, but I did found a bug. If it's not a bug, it's a strange system behavior.

Let's suppose you call acs_user::get_by_username to an user wich was created by an external authority. If you provide local authority as a parameter, you will not find it, because it's registered on external authority. It doesn't make any sense, because if an user is registered on the system and have an account on OpenACS, why would we not find it with username? The point is: there's no reason to look by username in an specific authority, because sometimes you will not find the user even if it exists. It seems like a false negative for me.

The idea to add an unique index on username for users table is to guarantee that everytime you look for an specific username you will find it, even if the user is registered in a different authority.

Sorry for the long post, but it seems like a big change, so I would like to hear everybody about it. What do you think?

You have missed the case where we have

two different users, one on the external authority and one on the local authority, both with the same username.

Now, in practice mnany installations only use the external authority so in that case you wouldn't have local users at all.

Imagine this case:

External auth from sources such as Facebook, Twitter, OpenID, Google Apps etc. These sources could contain duplicate usernames that are not associated with the same user. This is the sort of case the data model was designed for.

Hi Dave,

I got your point. It seems like a good design on that case. However, let's think about the OpenACS system: the users can authenticate on external authorities as you said. But should they have a single user_id? I can have, as user, a lot of external accounts, such as Facebook, Google Apps, etc, and it's still myself, isn't it? I can't have, for example, two different names (my personal name first_names and last_name) for different authorities. If somebody looks for my data in OpenACS system, it should not be duplicated don't you think?

Maybe the authorities design is ok, but I don't know if it should go to the users table. If it goes to the users table, we should eliminate the foreign key for persons and create a new entity of system users with different login options.

From my understading of OpenACS design (forgive me and fix me if I'm wrong) the users table is for everybody that have a system login. If the same system user can login using a lot of services, this is user can't be identified as an user_id, right? It should be more a party_id or something like this. Imagine the situation where you want to know some user email. The same user could have to entries on users table, because it could have two login ways (LDAP and OpenACS, let's say) but it would be still the same person.

My proposal, and I know it's a huge job, is to separate it. I can see two different ways of doing it: remove foreign key from users table (users_user_id_fk) and identify user by person_id; or remove the fields password, salt, password_question, password_answer, password_changed_date, auth_token, authority_id and username to a new table. This table would have a foreign key for users table (user_id) and there could be duplicated entries for user_id.

Yes, it's a huge job, but it seems like the right way to do it. What do you think?

Good question.

I don't have a good answer. I think if you wanted to write a good justifcation for this kind of change, and applications of it, we could work out the best design.

If you look at the existing Facebook package it keeps track of the facebook connection to the local account in a seperate table, which is sort of what you are looking for but not optimized for reuse.

Hi Dave,

I'll write a propposal and let you guys know when it's ready then.

P.S.: I didn't know about facebook package. Thanks for the tip.

I am actually working on a NEW Facebook Connect based code that is very experimental so far. I am using it on my personal site right now, but haven't had time to work on it lately.
Hey Dave,

I was reading the docs and it seems to mee like adding authority_id was an error considering the initial design. Take a look: http://www.openacs.org/doc/parties.html

"The users table is a more specialized form of persons table. A row in users table represents an individual that has login access to the system. The primary key of the users table references the primary key of the persons table. This guarantees that if there is a row in users table then there must be a corresponding row in persons and parties tables."

So I guess the solution is to create a new table to map external authorities and usernames. I've also realized that there's a seding documento for external authorities authentication (http://www.openacs.org/doc/ext-auth-requirements.html), so we have to keep the requirements. I'm already working on something like this and I'll share with you when it's ready.

Actually this is overly complex and I have added to the confusion.

There is a difference between say, Facebook or twitter account linking, and logging in.

You need to have only one login authority per user. The other type of linking is extra and not related to login.

user_id refers to a local user in OpenACS. The external authority_id just says which authority to login with. There is ever only ONE username/password combination that will login a particular user_id.

I worked on this specification and contracted with Collaboraid on behalf of .LRN users years ago to get this done. I also used it without issue for years in Heidelberg.

Now we are using it in Boston and authenticate against different external authorities with tens of thousands of users on each one.

I don't really have the time to follow this in detail right now, but please consider our use case when thinking about this: user name patterns on our different authorities are very similar (a combination of 2-3 letters and 2-3 numbers) and they certainly could potentially clash (we assume there is a user with the username ab12 in source1 and a different user with username ab12 on source 2).

This isn't uncommon (most system usernames are the first initial of the first name and full last name so you should assume that every authority in the USA has a unique jsmith, every authority in Spain has a unique hgarcia and every authority in Germany has a unique mbauer).

Hi Carl,

Thank you for the reply. I lost the last week digging into this problem, and I could see that your case is very difficult to solve. Looking at the specifications, they look very good. However, in my point of view (wich could possibly be wrong), they've introduced a logical error in datamodel. the users table should represent an user in OpenACS that has login access to OpenACS, and by user a mean one person. With this authority change, there can be more than one user_id referring to the same user.

I could not find any easy way to handle this problem, and I don't know if it's worthed. I guess the community majority doesn't actually see this as a real problem. However, I guess we should think about it, because there are a lot of login paths we can use in modern social networks. A small list could contain facebook, twitter, Open Social and many others, where different usernames represent the same user (I mena, one real person). If it's the best interest for everybody, I can volunteer to propose a solution that won't break everybody else's system. It's hard and it's also going to take a while, but I guess it will be worthed.

In the meantime, I added a switch to acs_user::get_by_username that allows you to search for a single username in any authority, so it will return the user_id for that username without taking authority in consideration. The following patch should do the job:


Index: packages/acs-tcl/tcl/community-core-procs.tcl
===================================================================
--- packages/acs-tcl/tcl/community-core-procs.tcl (revision 1329)
+++ packages/acs-tcl/tcl/community-core-procs.tcl (revision 1455)
@@ -394,9 +394,10 @@
{-authority_id ""}
{-username:required}
+ {-cascade:boolean}
} {
Returns user_id from authority and username. Returns the empty string if no user found.

@param authority_id The authority. Defaults to local authority.
-
+ @param username The username of the user you're trying to find.
@param username The username of the user you're trying to find.

@@ -407,9 +408,17 @@
set authority_id [auth::authority::local]
}
-
- set user_id [util_memoize [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username]]
- if {$user_id eq ""} {
- util_memoize_flush [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username]
- }
+
+ if {$cascade_p} {
+ set user_id [util_memoize [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username -cascade]]
+ if {$user_id eq ""} {
+ util_memoize_flush [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username -cascade]
+ }
+ } else {
+ set user_id [util_memoize [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username]]
+ if {$user_id eq ""} {
+ util_memoize_flush [list acs_user::get_by_username_not_cached -authority_id $authority_id -username $username]
+ }
+ }
+
return $user_id
}
@@ -418,14 +427,19 @@
{-authority_id:required}
{-username:required}
+ {-cascade:boolean}
} {
Returns user_id from authority and username. Returns the empty string if no user found.

@param authority_id The authority. Defaults to local authority.
-
@param username The username of the user you're trying to find.
+ @param cascade if cascade is set, look for username on any authority

@return user_id of the user, or the empty string if no user found.
} {
- return [db_string user_id_from_username {} -default {}]
+ if {$cascade_p} {
+ return [db_string user_id_from_username_cascade {} -default {}]
+ } else {
+ return [db_string user_id_from_username {} -default {}]
+ }
}

Index: packages/acs-tcl/tcl/community-core-procs.xql
===================================================================
--- packages/acs-tcl/tcl/community-core-procs.xql (revision 1329)
+++ packages/acs-tcl/tcl/community-core-procs.xql (revision 1455)
@@ -149,4 +149,15 @@
< /fullquery>

+< fullquery name="acs_user::get_by_username_not_cached.user_id_from_username_cascade">
+ < querytext>
+
+ select user_id
+ from users
+ where lower(username) = lower(:username)
+ limit 1
+
+ </ querytext>
+< /fullquery>
+
< fullquery name="acs_user::registered_user_p.registered_user_p">
< querytext>

Please, let me know your thoughts about it.

If you sign people up more than once, with multiple authorities, they will have multiple local accounts, but the autority data model and authentication code have no knowledge and could have no knowledge of how these users are mapped except by the username,authority_id connection.

I think what you are proposed is seperate from what the current authentication package was designed to do. It is not a flaw or a bug, it just was never designed to do that. Anything that changes that needs to make sure it doesn't change any other behavior or break any existing use cases.