Forum OpenACS Development: Category Package

Collapse
Posted by Jun Yamog on
Hi,

I understand that Timo will be commiting a Category Package.  Is this the package that will be used for categorizing objects site wide?  There are some other developers that did something similar in particular one of Rafael's student.  How does the other efforts fit in to the equation?

So Timo's package is the one that is recommended to be used?  What will be the next release of OACS?  4.6.3 or 4.7?  I may need to use the category package for a production site so would like to know if I will wait for 4.6.3 or back port Timo's package to 4.6.2.  Thanks.

Collapse
2: Re: Category Package (response to 1)
Posted by Peter Marklund on
Jun,
yes, Timo is developing a site-wide categorization package. I'm hoping it will be committed soon, realistically (because of easter holidays) next week. The package allows for the creation of various category trees that can be used to categorize any acs object.

I don't see any particular reason why the package couldn't be used on 4.6.2. Timo can probably best answer this question though.

Collapse
3: Re: Category Package (response to 1)
Posted by Timo Hentschel on
Jun, i just uploaded the new category package. You can find it here.
Collapse
4: Re: Category Package (response to 1)
Posted by Jun Yamog on
Thanks Timo,

What is the target OACS ver? 4.6.2 or is it 4.7 and if I need to use it I will have to test and backport if needed to 4.6.2?

Collapse
5: Re: Category Package (response to 4)
Posted by Timo Hentschel on
Afaik it will be committed to the cvs head and thus will make it in 4.7. It uses several things that you will need to backport to 4.6.2:
Internationalization
noquote-patch
service-contract to figure out the local url from an object_id
vuh-page providing stable url for all objects based on their object_id
named objects

Most of those issues still need to be agreed upon (the last three items), but i hope that in one form or the other we will have those things.

To sum it up, yes you will have to do some backporting and even now the package does not completely fit in the current core yet since it depends on extensions not yet agreed.

Collapse
6: Re: Category Package (response to 1)
Posted by Jun Yamog on
Hi Timo,

Thanks for the feeback.  Going into a quick glance of the code, it seems  it currently only supports Oracle?  Also I am not sure if it will be possible to back port internationalization.  I think that is a huge task.  Maybe if I will change the data model to our needs and remove any internationalization.  Do you advice that should I need this package for 4.6.2 project to just remove the internationalization?

As for the others I think those can run with some changes to a 4.6.2.  I think.

Collapse
7: Re: Category Package (response to 6)
Posted by Timo Hentschel on
I just fixed some bugs in the after_install and before_uninstall callbacks. New version is here.

And yes, it currently only supports oracle since i don't have a single clue about postgres. But I hope that the postgres port will be done soon.

Regarding removing internationalization: yes, you could remove the code but it would probably be easier to add a ad_lang table and substitute every [ad_conn locale] to some default value and remove the possibility to change the locale in cadmin/master. In that way you would be compatible with the 4.7 version.

Collapse
8: Re: Category Package (response to 1)
Posted by Jun Yamog on
Thanks Timo,

I would try your suggestion, it should be better considering I have little to know how internationalization works.  Thanks for the advice.

Collapse
9: Re: Category Package (response to 8)
Posted by Hazi Gharagozlou on
Timo,

I am doing a complete port of OpenACS to Oracle 9i. In our talks in Copenhagen, we decided to change delete procedure to del. If in your next version you change delete to del then making it available under Oracle 9i would only entail one change (acs_objects.delete)

Collapse
10: Re: Category Package (response to 1)
Posted by Rafael Calvo on
Hi,

We have developed an extension to Postgres tthat does *automatic* classification. A machine learning algorithm has to be trained first, using already "tagged" data. Afterwards the classifier does it automatically. We have done tests with several collections of documents and the accurracy is excellent, often as good as human.

We have not had the time (or need) to package it as an openACS package, but will hopefully be done soon.
My idea is that it could be an extension to the content repository, or maybe application specific.

I think that Timo's package is something different.

Collapse
11: Re: Category Package (response to 10)
Posted by Malte Sussdorff on
Hello Rafael,

what is really important here is the fact that your package classifies the data within the categorization system Timo developed if possible. Btw.: Awesome that it works so well! This might proove to be a very good alternative to Autonomy (something I've been looking for quite a while).

Once we have a little bit less workload, we would definitly like to take a peek at it and look into ways for integration, unless you are keen on doing it :). I think around 80k AIESEC people will be very grateful if they don't have to categorize all the documents by hand :).

How do you store the classification so far?

Collapse
12: Re: Category Package (response to 11)
Posted by Rafael Calvo on
Hi Malte

Yes, the system does the same as Autonomy (I haven't used their products).

The paper:
Williams K., R. A. Calvo and D. Bell. Automatic Categorization of Questions for a Mathematics Education Service. Artificial Intelligence in Education Conference. Sydney, Australia. July 2003

at:
http://www.weg.ee.usyd.edu.au/people/rafa/papers/

explains what we did.

cheers

Collapse
14: Re: Category Package (response to 12)
Posted by Bruno Mattarollo on
Hello Rafael,

That would be very interesting for an "asset management" package for OpenACS ... I wish this could also be done for Oracle ;)

This is very interesting developments hapenning in the community towards getting OpenACS in the core of the enterprise systems, it's very promissing.

Keep on the good work!

/B

Collapse
13: Re: Category Package (response to 1)
Posted by Jun Yamog on
Hi Rafael,

That is great.  I think maybe your auto category code may work complementary with Timo's code.

Collapse
15: Re: Category Package (response to 1)
Posted by Rafael Calvo on
Jun,
We will have a look at how to do the integration with Timo's work. I am still not sure about the reason for that package. The CRC has a way of describing metadata (i.e. a category), taking it out of the CR and do it for *any* acs_object is not a clear choice to me. I must admit I haven't had the time to look at it.

Bruno,

I am not sure how it could be done in Oracle, but I am sure it is possible to add procedure that calls an external library.

Collapse
16: Re: Category Package (response to 15)
Posted by Bruno Mattarollo on

Hello Rafael,

In what programming language is this library?

Cheers,

/B

Collapse
17: Re: Category Package (response to 1)
Posted by Timo Hentschel on
A new version just got uploaded and committed to cvs head. I renamed the plsql-procs category.delete and category_tree.delete to del since that will work better with oracle9i.
Collapse
18: Re: Category Package (response to 1)
Posted by Jun Yamog on
Hi Rafael,

Using CR's category stuff should be fine if the package is CR based.  But packages that does is not CR based will either have become CR based to have this category feature.  Or roll your own category package or use Timo's code.  Timo's code may eliminate the need for you do it your own category package.  I think putting categories in a acs object level is good.  Also RHEA/CCM has moved to this direction.  The category service is on the acs object level already.

I am still very very early in the stage of this project.  All I am doing is to gather potential useful packages.  If the project will not use CR (even though I like using it), Timo's package is one of my alternative.

Timo,

What will happen to cr_keywords when your package is officially in?  Will it stay?  I think they are still different, but similar too.  I am also thinking of the option to just use the CR keywords and later upgrade when 4.7/5.0 comes out.

Collapse
19: Re: Category Package (response to 1)
Posted by Peter Marklund on
Jun,
the categories package that Timo developed is on head now. It's not fully functional yet as it relies on some changes to core that haven't been made yet (primarily named objects, object display redirects, and noquote). However, the bulk, and essential features of the categories package should be functional already.

I cannot really judge if it is best to keep cr_keywords long term, although, my guess is that ideally we would use the new categories package for all categorization of acs objects (including everything in CR). If we did remove cr keywords we would have to migrate that data to the categories package of course. I don't think this change will make it into 5.0. It would also mean making categories part of core.

Collapse
20: Re: Category Package (response to 1)
Posted by Rafael Calvo on
Bruno,

The package is mostly in perl, althought we have added some extensions using C++ (using perlsx), mostly performance enhancements. With the extensions we are able to use over 100K documents.

Collapse
21: Re: Category Package (response to 1)
Posted by Timo Hentschel on
Does anyone volunteer to port the package to postgres? I just assume the hardest part would be to port the two plsql packages, but they don't contain any fancy stuff.
Collapse
22: Re: Category Package (response to 1)
Posted by Michael Steigman on
I'll volunteer but I won't be able to start until the middle of next week. If nobody gets to it before then, count me in.
Collapse
23: Re: Category Package (response to 22)
Posted by Timo Hentschel on
That's great news! Thank you, Michael!
Collapse
24: Re: Category Package (response to 1)
Posted by Frank Bergmann on
Hi Rafael,

<blockquote> we have developed an extension to Postgres that does
*automatic* classification
</blockquote>

We are working on a similar module, but trying to classify documents according to the projects (or: work groups) in which they are created in a company. This way we already get training sets for free (the documents in a projects filestorage).

However, we are going for multiple classifications (a vector of probablilities for each project) instead of a single "best fit" category.

We have looked around a bit, but we haven't found your code anywhere. Is it still available and/or GPL?

I think we should really try to kick the asses of these Autonomy guys ... 😊

Frank

Collapse
25: Re: Category Package (response to 24)
Posted by Timo Hentschel on
The category package is not located in the core (yet?), but can be found in the cvs.