Forum OpenACS Development: The User Information Elephant

Posted by Caroline Meeks on 09/19/03 07:00 PM

I've been gathering data on the current development of dotLRN and I want to heighten awareness in the community of this area.

It looks to me like there are currently 3 to 4 active projects on User Data, I see them as each working on a different part of the elephant.

The good news is there is some awesome functionality being worked on that will put us way in front of the open source pack in this area. The caution is, I dont think the developers have seen much of each others work and this stuff all eventually needs to work together.

Here is a brief summary:

S&R/AISECS Profiles that vary by user type.
Furfly/Sloan Fine grained permission control on user data. New functionality in bulk upload.
Collaboraid/Heidlberg Use of IMS Enterprise Specs for user data
Jon Griffin/LA Schools HR-XML based contact info

I think Jon's code is the only one released at this time. I think the others are in pretty much at the same stage, just need a little bit more clean up and generalization before release. This is not a flame that anyone has done anything bad. This is all great work and I think that the relationship between this work is probably clear to the main developers at Furfly, Collaboraid and S&R. I just want to be sure that the funders are also aware of it and consider cross-pollinating and integrating existing work for their next projects in this area so we can avoid forking.

This is also a heads up to anyone else out there quietly working on user data management and display, whether for an educational client or not. There is lots of useful code out there, please talk about what you are trying to do. We need integrated and bringing this work into the core of both dotLRN and OACS.

This is an opening for discussion, not a conclusion. I invite everyone who is working on this area to expand and correct what I have said.

2: Re: The User Information Elephant (response to 1)

Posted by Lars Pind on 09/20/03 01:16 PM

Caroline,

Thanks for bringing this to the table.

There's definitely a need to integrate all of this stuff.

What we've done for external authentication is to provide a service contract-based mechanism for populating the standard OpenACS user profile data, namely first names, last name, email, screen name, home page, and bio. We plan to add portrait.

We've also added an API for updating this information in a safe manner with a complete transaction log of everything done. If data are invalid, we do not crash, but instead log an error in the transaction log saying what was wrong, so an admin can resolve it.

This API makes it very simple to write implementations that parse user data in any given format, and get it into the database. The driver for the IMS Enterprise format, for example, only takes 50 lines of code.

Existing bulk user upload functionality could probably benefit from using this API.

We haven't tried to tackle the issue of additional attributes that are not part of the hard-coded default OpenACS set of attributes.

But we would love to work with others to resolve this after the release.

To see our code, look at packages/acs-authentication/tcl/sync-procs.tcl.

This invokes some additional API for creating and updating local accounts in packages/acs-authentication/tcl/authentication-procs.tcl.

/Lars

3: Re: The User Information Elephant (response to 1)

Posted by Caroline Meeks on 12/13/03 07:14 PM

Here is an update based on looking at Photobook and Complex Survey.

Photobook is a useful tool for managing various user data, displaying it and providing complex permissioning on who can view what. It was recently implemented at Sloan and I did an initial port to postgres.

Complex survey has cool functionality that lets you define questions where the answers are stored outside the survey package, like in the users table. This allows you to slowly gather demographic information about your users in the context of other surveys.

So Photobook's strengths are display and permissioning.
Complex Survey's strengths are ability to collect information at different times in the context of a survey and use the information already gathered in future surveys.

Based on reading the survey docs it looks pretty easy to use them together by defining complex survey questions that refer to the photobook views. This is good news for anyone who wants to build sophisticated user information collection and display right now.

Please note that this is programmer only territory, you can't customize photobook from the UI yet and its current configuration is very Sloan specific. Also photobook is new, it uses the CR and CMS automatic form building and is generally not for the faint of heart.

I haven't used complex survey yet so no comments on how easy/hard that is. The docs make it look pretty easy.

So thats the good news. The bad news is that this is not where we want to be long term. Yes you can recreate a photobook question in complex survey but if its a select list question you will end up repeating all the data in two unconnected places in the database. Not good for maintenance of your select lists which is one of the strengths of Photobook.

Photobook (or what will evolve from it which probably should be called Profile) needs easier mechanisms for creating questions and determining who should be asked which questions and when. That is what Complex Survey already does well.

Survey itself is undergoing a major rewrite and will probably move to greater use of the CR, which should move it closer to Photobook.

Profile/Photobook programmers and Survey programmers please take a look at the other and lets set a long term goal of moving these packages closer together.

Thanks

4: Re: The User Information Elephant (response to 1)

Posted by Janine Ohmer on 12/22/03 05:27 PM

I think the first question to ask, before talking about how we can move these two packages closer together (I assume the end goal would be to merge them), is whether it is desirable to do so.

The issue really has nothing to do with photobook or complex survey themselves. I think a case could be made for combining them, and a case could be made for keeping them as they are. As I see it the question is more general. Both photobook and Malte's work with complex survey were done on spec for clients, and I believe that when a client pays for a package to be built, they should have a reasonable expectation that the package will remain one that meets their needs after they contribute it. That is, others will hopefully contribute bug fixes and small enhancements, but not make large structural changes.

I am not saying that Sloan and Malte's client would not be happy with a combined package - we don't know how it would turn out. But the potential exists for the needs of the community to clash with the needs of a funder, and then what are their options? They can refuse to allow their package (of which they are the maintainers) to be modified, possibly creating hard feelings and/or forcing yet another package to be created, or they can use the modified package even though they don't like it, or they can stick with their version and lose the benefit of the community's bugfixing efforts. The first is probably the most palatable option to many funders who find themselves in this situation, but not necessarily the best for the community. Of course, yet another option is for them to not contribute their work at all, and I'm afraid that we may encourage them to make this choice if we don't do something to protect the rights of funders, even though that's not the pure Open Source way.

I don't know what the answer is, but I believe this is an issue that will come up eventually, as more and more work is done on spec and then contributed. I believe it's one we should discuss and come up with guidelines for, so that funders know what their rights and risks are in this development model.

5: Re: The User Information Elephant (response to 1)

Posted by Dave Bauer on 12/22/03 05:42 PM

One way to migitgate the problems when client work is contributed is to work in as transparent a process as possible. Of course, this is not always possible, and many client packages will be so specific as to be of no use to others except academically. This is one of the reasons for my position that content management be split between the tcl apis and the user interface. Client requirements for content management interfaces are very specific, but the underlying services are very similar. Photobook seems to be this way, it was built to fufill a very specific need at Sloan. On the other hand the continuing improvement to the survey work Malte seems to fit in with a more generalize approach.

This includes solictiing feedback on the design of a package, involving community memebers working on similar projects, working from a publicly readable code repository, etc.

6: Re: The User Information Elephant (response to 1)

Posted by Dave Bauer on 12/22/03 05:47 PM

After hitting confirm, i realized I wasn't clean on what "contributed" meant. There is a difference between code that is contributed to the community and code meant to be part of the "official" OpenACS packages.

Any contribution of code is greatly appreciated, and any client who allows work done for them to be added to contrib should know this. Whoever contributes the code should maintain it in whatever way the need to.

Code that someday will make it into the main OpenACS distribution has different standards.

7: Re: The User Information Elephant (response to 1)

Posted by Caroline Meeks on 12/23/03 04:37 AM

Hi Janine,

I think you have a really important point. Since the code is GPL you have no means to coerce anyone to do or not to do anything. However, Sloan, AISEC and other such institutions are banding together to form the dotLRN consortium so that they can address exactly these issues.

The processes are still being worked out; in fact I think it is discussions like this that help develop the processes. Here is how I imagine it working for functionality like user demographic information management.

Ideally, an open spec, design and collaborative development process leads to as general code as possible. However, we all know, as in Pinds Rule of 5, that a totally general solution is not always possible or desirable the first few times you solve a problem. Often we will end up with situations like the one we are in where different consortium members have implemented different solutions.

What should the consortium members do when this happens? That will be for them to decide but I hope they open their code, use cases and needs analysis and work together towards creating a solution that meets all their needs and hopefully the needs of other current and future users.

At some point general functionality like managing user data needs to get into the dotLRN core. Through participation in the consortium I expect that Sloan and AISEC will find that that core functionality meets their requirements and that there is a reasonable upgrade path from their current code.

I think that user consortiums can vastly improve the chances that future code will be compatible with the requirements of the clients who paid for the initial work, but I want to also examine what happens if a client isn't a member of any consortium. If you create functionality that has general usefulness (like managing user demographic information) and you don't release it then someone else will write code with similar but likely incompatible functionality. The chances of this code being useful to you are pretty low (say 5 or 10%).

If you release your code to contrib and people start building on it the chances that someone will do something that will be helpful to you in the future go up dramatically. Maybe they find a bug, improve a query, or add a new feature. When and if you want/need their improvements you can go look at what they have done. The chances of getting something useful have risen to maybe 50%.

Now in the non-consortium member case lets imagine that at some point in the future there is an official version of the functionality they had to custom code. There are no guarantees on the features meeting their requirements, but the odds are sure a lot better if its based on their original code! No one is going to write their upgrade scripts for them, but they are likely to be a lot simpler and cheaper if its based on their original code.

8: Re: The User Information Elephant (response to 1)

Posted by Dave Bauer on 12/23/03 02:45 PM

Caroline,

Thank you very much! Your explanation was much better than mine.

9: Re: The User Information Elephant (response to 1)

Posted by Janine Ohmer on 12/23/03 03:49 PM

This all makes perfect sense as long as we're talking about adopters who are doing this because they believe in the benefits of Open Source. Though let's not forget that the core is actually OpenACS, not dotLRN, and the Consortium has limited influence there.

However, it is my understanding that we are trying to take dotLRN out into the world at large and convince institutions to use it who might have otherwise purchased a commercial package. These folks don't care a bit about collaborative development; in fact they will often regard it with suspicion. The only reason for them to choose dotLRN is that they can make it work *exactly* the way they want it to, and they can put the money they would have spent on a commercial software license into building their own modules instead.

These folks are not going to be interested in following an open development model. They may be willing to contribute the finished code, and I believe it would be valuable for the community as a whole for them to do so, but often they will only do it if they feel they are going to retain some influence over it.

To take this point even further, I believe it will discourage adoption of dotLRN in these institutions if there is an implication that to be good community citizens they *have* to develop collaboratively.

As I said before, I don't know what the final answer is. I just find myself being concerned about the direction things are going, based on my experiences with clients and potential clients who had these concerns, and I wanted to mention them. It is, as you said, ultimately up to the Consortium to decide how these issues should be handled.