Forum OpenACS Development: named objects
Problem: When displaying all objects categorized in a particular category, I need to list all the object names. I could use acs_object.name, but it's evident that this proc does not scale at all and thus should never used for more than a few objects.
I added another object-type 'named_objects' derived from acs_objects. All displayable objects (i.e. not like acs_rels) should then be derived from named_objects, no longer from acs_objects.
I also added a table acs_named_objects with the columns object_id, object_name and package_id.
Is this the way we want to do this or should we place this information in the acs_objects table (package_id should certainly go there - why not object_name, too?) ? Storing all data in the acs_objects table would have the advantage of not having to join two potentially large tables everytime we want to get the names of some objects.
One disadvantage does this solution have: We could only store an object_name in one language - so this is not multilingual. But I don't see this as a real difficulty since it probably won't happen too often that an object gets entered in more than one language - if it does, it will be package's responsibility to figure out which name to store as the official object_name. And: acs_object.name is unilingual, too. I strongly oppose of adding another column like locale or something similar since that would make queries a lot lot harder and would be a huge scalability threat (unless someone can come up with a real clever way to do it so queries can run fast), which is exactly what should be avoided by introducing named objects in the first place.
The fact that the current solution is not I18Nized doesn't mean that a future and hopefully *better* solution doesn't need to support multiple languages.
I don't know anything about I18N yet, but a viable solution could be to add the default name to acs_objects and localized names to acs_object_names.
Maybe Peter or Lars could tell us a fair bit of how I18N could work here and how it would affect the categorization package.
There are probably generally 3 types of objects (thinking out loud here):
- Internal stuff like acs_rels, parameter values, etc., which should never be listed, and some of them maybe not even be acs-objects in the first place. Alas, they currently are.
- Actual content, which can be multilingual, and which mostly should probably go in the content repository. These could be multilingual in the case of broadcasting/CMS, sometimes unilingual in the case of user-contributed content in a community.
- In-between stuff like actions and states in a workflow, categories in a categorization package, projects and variables in a logger applications. These will generally need to be i18n'd, since they're going to be integrated into the user experience, e.g. tightly integrated into the navigation. On the other hand, I'm not sure how frequently you'll want to actually browse by category?
Maybe what we need is a catalog of the types of objects that we have and expect to have, and we can determine how they should best be handled? Where's Jeff when you need him? :)
I'm not sure about scenarios either, but my gutfeel is that if you don't have a translation for a particular language+category, you don't have a categorization!!
Taking Lars three categories, the first doesn't need it, the second will take care using multiple objects, it is the third that is tricky.
What I'd suggest (without looking into performance issues too much, as caching is an option), take a table i18n_supporters made up of (object_id, locale, type_of_object_support (e.g. name, description), actual_text).
It should be properly thought through - that's my wish. I I don't like the current solution with acs_objects.name - see my previous postings on it - , however replacing the PL/SQL proc with a simple columns fails another design goal: we'd be replacing a *really* bad solution performance-wise (which seems to have affected only a few sites yet) with a really bad solution I18N-wise.
Localized content needs to be stored somewhere. Why not pull out all user-facing textual information from acs_objects - maybe into a table called acs_objects_description - and make this table locale-dependant. An index on object_id+locale on acs_object_description will make an access relatively light-weight. (Long lists may lead to expensive joins)
I think named objects should have multilingual names and that this should be supported by acs-core. I appreciate the concern about scalability here but there are various ways to address that issue, an index like Dirk suggests, caching in memory that you already use, maybe both of those approaches.
To me it seems natural to let the table acs_names_objects have the columns (object_id, name, locale) and potentially others that might be needed. Do we want to add description as well?
Concerning package_id that is a denormalization that probably belongs in acs_objects. The proposal by Don, that I agree with, is to drop the context_id (no longer needed by permissions) column from acs_objects and introduce a parent_id column instead. I'm not sure what exactly that upgrade will involve.
How do you intend to support I18N then? Not only for categorization, but also for *every* other package in the system?
In a document called "Best Practices for Globalization using the Oracle 9i Internet Application Server" to which I was pointed at by Mohan, Oracle suggests this:
-- fallback language is English which is abbreviated as 'US'. CREATE VIEW default_message_view AS SELECT msgid, message FROM messages WHERE langid = 'US'; / -- create view for services, with fall-back mechanism CREATE VIEW messages_view AS SELECT d.msgid, CASE WHEN t.message IS NOT NULL THEN t.message ELSE d.message END AS message FROM default_view d, translation t WHERE t.msgid (+) = d.msgid AND t.langid (+) = sys_context('USERENV', 'LANG'); SELECT message FROM message_view where msgid = 'hello';
Looks kinda slick: If you don't find a translation with the first index lookup, you need a second one. Shouldn't be that much of a performance impediment: If an average index lookup takes 5 I/O accesses (that's already for a huge table), you add a maximum of 10 I/Os per row.
It *clearly* is much more expensive than returning the name from a row that Oracle has touched anyway (access costs amount to zero there).
The current I18N solution sacrifices performance for tight coupling of related objects. Is it good? I dunno. Is lose coupling - every translated object gets its own row in acs_objects - better. I dunno.
This i18n package does not support names for objects or such, but at least some multilanguage design is possible.
In chatting with Peter we concluded that his approach of caching the translations of every acs_object might be ok. But how about a mixture of both to speed up the total query time since Dirk is worried about the speed of tcl code? How about trying to get direct matchings from the database (when there's a translation in the users language available) and let tcl figure out the best translation from the cache in the other cases? Therefore we would have a fast db query and use tcl only in those cases, when it's really best to use it.
As I said: at the moment (!) I am leaning towards not showing anything if a user can neither be shown something in his chosen language nor in the default language. I wouldn't want to promise a site translated into a user's language that falls back all the while into some other random language.
Maybe someone who is running (Greenpeace) or intends to run (Heidelberg) a multilingual site can give us some "real-life" input on that.
E. g. both arguments somewhat hold:
- it doesn't make sense to show an object in a "random" language.
- the category tree needs to be translated completely and a "out-of-bound" category item may lead to users asking for a complete translation. It may be good that people perceive a page to be erroneous or lacking information. Or they may just make sense of the "out-of-band" language.
How do you intend to pick the fallback if there's neither a user-default nor a system-default langauge? Will it just be the random first item of some list?
that solution sounds really good to me.
So we are back to where we started. We have to come up with a clever and fast solution to do it all in the db.