Forum OpenACS Development: named objects

1: named objects

Posted by Timo Hentschel on 04/22/03 06:06 PM

I just wanted to get the discussion about named objects going since I made use of that in my categorization package.

Problem: When displaying all objects categorized in a particular category, I need to list all the object names. I could use acs_object.name, but it's evident that this proc does not scale at all and thus should never used for more than a few objects.

I added another object-type 'named_objects' derived from acs_objects. All displayable objects (i.e. not like acs_rels) should then be derived from named_objects, no longer from acs_objects.

I also added a table acs_named_objects with the columns object_id, object_name and package_id.

Is this the way we want to do this or should we place this information in the acs_objects table (package_id should certainly go there - why not object_name, too?) ? Storing all data in the acs_objects table would have the advantage of not having to join two potentially large tables everytime we want to get the names of some objects.

One disadvantage does this solution have: We could only store an object_name in one language - so this is not multilingual. But I don't see this as a real difficulty since it probably won't happen too often that an object gets entered in more than one language - if it does, it will be package's responsibility to figure out which name to store as the official object_name. And: acs_object.name is unilingual, too. I strongly oppose of adding another column like locale or something similar since that would make queries a lot lot harder and would be a huge scalability threat (unless someone can come up with a real clever way to do it so queries can run fast), which is exactly what should be avoided by introducing named objects in the first place.

2: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/22/03 06:19 PM

I don't think we should rule out multilinguality right from the beginning, *especially* not with a categorization module. I18N is a conscious decision that affects scalability anyway!

The fact that the current solution is not I18Nized doesn't mean that a future and hopefully *better* solution doesn't need to support multiple languages.

I don't know anything about I18N yet, but a viable solution could be to add the default name to acs_objects and localized names to acs_object_names.

Maybe Peter or Lars could tell us a fair bit of how I18N could work here and how it would affect the categorization package.

4: Re: named objects (response to 2)

Posted by Timo Hentschel on 04/22/03 07:03 PM

You got me wrong: I use multilingualism in my categories package so that every category can be translated in several languages. To speed up queries, I actually cache all names and let a tcl proc figure out the name in a particular language (if there is none available in that language use the default language - extremely hard to do that in a sql query). All I'm saying is that I don't see a fast and scalable way to do so when storing object names and we currently didn't have that feature when using acs_object.name either.

3: Re: named objects (response to 1)

Posted by Lars Pind on 04/22/03 06:58 PM

Hm. This is tricky.

There are probably generally 3 types of objects (thinking out loud here):

- Internal stuff like acs_rels, parameter values, etc., which should never be listed, and some of them maybe not even be acs-objects in the first place. Alas, they currently are.

- Actual content, which can be multilingual, and which mostly should probably go in the content repository. These could be multilingual in the case of broadcasting/CMS, sometimes unilingual in the case of user-contributed content in a community.

- In-between stuff like actions and states in a workflow, categories in a categorization package, projects and variables in a logger applications. These will generally need to be i18n'd, since they're going to be integrated into the user experience, e.g. tightly integrated into the navigation. On the other hand, I'm not sure how frequently you'll want to actually browse by category?

Maybe what we need is a catalog of the types of objects that we have and expect to have, and we can determine how they should best be handled? Where's Jeff when you need him? :)

/Lars

5: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/22/03 07:11 PM

Is sorting out the categories in a TCL really faster than doing it in a SQL query? Why not store the whole category tree per language then? You don't have the TCL proc overhead per call then.

6: Re: named objects (response to 5)

Posted by Timo Hentschel on 04/22/03 07:38 PM

Not that easy. Could be that a category just got added to a tree and it hasn't been translated in all languages yet, so you have to provide a default. But maybe I'm just thinking too much of scenarious that won't happen too often. And you certainly shouldn't create a seperate category_id for every translation, but I guess that's pretty evident. And I certainly prefer doing some fast tcl code over executing nasty slow queries.

7: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/23/03 12:03 AM

Hmm, what about fast sql queries over nasty slow TCL code? It is too easy to jump at Oracle/Postgres for executing queries not fast enough!

I'm not sure about scenarios either, but my gutfeel is that if you don't have a translation for a particular language+category, you don't have a categorization!!

8: Re: named objects (response to 7)

Posted by Malte Sussdorff on 04/23/03 07:31 AM

Hi Dirk, you confuse me. Didn't Timo explicitly state that he uses multilinguality to support having multiple languages for a category?

Taking Lars three categories, the first doesn't need it, the second will take care using multiple objects, it is the third that is tricky.

What I'd suggest (without looking into performance issues too much, as caching is an option), take a table i18n_supporters made up of (object_id, locale, type_of_object_support (e.g. name, description), actual_text).

9: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/23/03 08:58 AM

Yeah, I got that: the categorization package already is multilingual. However now we are trying to make it perform better and may want to change acs_objects...and the discussion is about acs_objects, not the categorization package.

It should be properly thought through - that's my wish. I I don't like the current solution with acs_objects.name - see my previous postings on it - , however replacing the PL/SQL proc with a simple columns fails another design goal: we'd be replacing a *really* bad solution performance-wise (which seems to have affected only a few sites yet) with a really bad solution I18N-wise.

Localized content needs to be stored somewhere. Why not pull out all user-facing textual information from acs_objects - maybe into a table called acs_objects_description - and make this table locale-dependant. An index on object_id+locale on acs_object_description will make an access relatively light-weight. (Long lists may lead to expensive joins)

10: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/23/03 09:20 AM

Or well: call it cr_items altogether.

11: Re: named objects (response to 1)

Posted by Peter Marklund on 04/23/03 11:02 AM

Timo,
I think named objects should have multilingual names and that this should be supported by acs-core. I appreciate the concern about scalability here but there are various ways to address that issue, an index like Dirk suggests, caching in memory that you already use, maybe both of those approaches.

To me it seems natural to let the table acs_names_objects have the columns (object_id, name, locale) and potentially others that might be needed. Do we want to add description as well?

Concerning package_id that is a denormalization that probably belongs in acs_objects. The proposal by Don, that I agree with, is to drop the context_id (no longer needed by permissions) column from acs_objects and introduce a parent_id column instead. I'm not sure what exactly that upgrade will involve.

12: Re: named objects (response to 11)

Posted by Timo Hentschel on 04/23/03 01:51 PM

No, you can't be serious about that. That's exactly a design i want to avoid. Think about it. Adding a locale field will get you into deep trouble because what if you want to get the names of all objects in german, but most of them are not available in german? You should fall back to display whatever language is available (most likely the default language). I don't see a way to do that kind of query in a fast and scalable way or do you?. And caching the names of all objects? I don't think so. The system can be huge and although memory might be cheap, I just think that's too much.

13: Re: named objects (response to 12)

Posted by Dirk Gomez on 04/23/03 05:46 PM

How do you intend to support I18N then? Not only for categorization, but also for *every* other package in the system?

In a document called "Best Practices for Globalization using the Oracle 9i Internet Application Server" to which I was pointed at by Mohan, Oracle suggests this:


-- fallback language is English which is abbreviated as 'US'.
CREATE VIEW default_message_view 
  AS SELECT msgid, message 
  FROM messages 
  WHERE langid = 'US'; 
/

-- create view for services, with fall-back mechanism
CREATE VIEW messages_view AS
SELECT d.msgid,
  CASE WHEN t.message IS NOT NULL 
    THEN t.message 
    ELSE d.message 
  END AS message
FROM default_view d,
  translation t
WHERE t.msgid (+) = d.msgid AND
t.langid (+) = sys_context('USERENV', 'LANG');

SELECT message FROM message_view where msgid = 'hello';

Looks kinda slick: If you don't find a translation with the first index lookup, you need a second one. Shouldn't be that much of a performance impediment: If an average index lookup takes 5 I/O accesses (that's already for a huge table), you add a maximum of 10 I/Os per row.

It *clearly* is much more expensive than returning the name from a row that Oracle has touched anyway (access costs amount to zero there).

The current I18N solution sacrifices performance for tight coupling of related objects. Is it good? I dunno. Is lose coupling - every translated object gets its own row in acs_objects - better. I dunno.

23: Re: named objects (response to 12)

Posted by Thomas Taylor on 04/29/03 09:09 AM

Have a look at https://openacs.org/forums/message-view?message_id=96326

This i18n package does not support names for objects or such, but at least some multilanguage design is possible.

24: Re: named objects (response to 23)

Posted by Timo Hentschel on 04/29/03 12:55 PM

I didn't say that i18n is not possible at all. Actually, my categorization package is multilingual in that categories can be translated in any language. I'm just saying that doing i18n for object_names so that you can write fast queries without using a cache is a hard thing to do.

14: Re: named objects (response to 1)

Posted by Timo Hentschel on 04/24/03 09:56 AM

Dirk, your solution does have certain drawbacks: The default language can't be hardcoded as in your example, but actually is an apm_parameter. Further, we can't be sure that there's always a translation in the default language available. In my categorization package i made sure that that's always the case (and maybe some people won't like that code, but it helped me write slightly faster code in tcl to fetch the translation from the cache). So, what will you do if you can't get an object_name in the language the user requests nor in the default language? You should probably display it in just any language (although this adds a certain randomness, but so what) since i'm quite opposed to not showing such an object at all (what if the user is an admin and really needs to see every object and doesn't care at all if he can actually read it or not?).

In chatting with Peter we concluded that his approach of caching the translations of every acs_object might be ok. But how about a mixture of both to speed up the total query time since Dirk is worried about the speed of tcl code? How about trying to get direct matchings from the database (when there's a translation in the users language available) and let tcl figure out the best translation from the cache in the other cases? Therefore we would have a fast db query and use tcl only in those cases, when it's really best to use it.

15: Re: named objects (response to 14)

Posted by Dirk Gomez on 04/24/03 12:15 PM

The SQL code is a direct cut and paste from the mentioned Oracle document and I didn't intend to put that into OpenACS without change.

As I said: at the moment (!) I am leaning towards not showing anything if a user can neither be shown something in his chosen language nor in the default language. I wouldn't want to promise a site translated into a user's language that falls back all the while into some other random language.

Maybe someone who is running (Greenpeace) or intends to run (Heidelberg) a multilingual site can give us some "real-life" input on that.

16: Re: named objects (response to 14)

Posted by Dirk Gomez on 04/24/03 12:32 PM

In fact: these two views struck me because they provide a transparent means of "multilingualising" columns. I think we should consider taking up on this!

17: Re: named objects (response to 1)

Posted by Timo Hentschel on 04/24/03 05:08 PM

Sorry Dirk, but I still think showing something - although the user might not be able to understand it - is better than creating separate worlds in one and the same system - worlds in which some objects do or do not exist from the users point of view. But to make you happy, the user could explicitly tell the system (user parameter?) to only show content in his language and in no other.

18: Re: named objects (response to 17)

Posted by Dirk Gomez on 04/25/03 12:35 AM

I don't really have a fixed opinion :) - I can happily go along with your approach as well.

E. g. both arguments somewhat hold:

- it doesn't make sense to show an object in a "random" language.

- the category tree needs to be translated completely and a "out-of-bound" category item may lead to users asking for a complete translation. It may be good that people perceive a page to be erroneous or lacking information. Or they may just make sense of the "out-of-band" language.

How do you intend to pick the fallback if there's neither a user-default nor a system-default langauge? Will it just be the random first item of some list?

19: Re: named objects (response to 18)

Posted by Timo Hentschel on 04/26/03 02:04 AM

As I already said I would opt for showing at least something if we can't get a decent translation. How about showing the object in the language the object creator used when creating the object? That would add a date field to the table acs_named_objects (object_id, locale, object_name, creation_date); If we would cache the translations for each object we could sort them by creation_date thus displaying the first translation when no translation is available in the users or the default locale.

20: Re: named objects (response to 1)

Posted by Peter Marklund on 04/26/03 07:58 AM

Timo,
that solution sounds really good to me.

21: Re: named objects (response to 1)

Posted by Dirk Gomez on 04/26/03 09:01 AM

Timo - I like that too!

22: Re: named objects (response to 21)

Posted by Timo Hentschel on 04/27/03 04:44 PM

Ups. I just started doing the implementation of the named_objects cache when I suddenly realized that caching is not the answer to the problem at hand: When displaying a list of objects we want it to be sortable by name and maybe more (like let the user see the list of all objects starting with letter 'T'), but we can't do that if we resolve the object_name in tcl by accessing a cache.

So we are back to where we started. We have to come up with a clever and fast solution to do it all in the db.