Forum OpenACS Development: AMS package ams_options

Collapse
Posted by Dave Bauer on
I was looking through the AMS package trying to learn how I might use it's services. I have found some things I do not understand in the data model.

The tables: ams_options, ams_option_ids,ams_option_types.

I would imagine that ams_options would contain the valid options for a certain ams_attribute. This is not the case. The ams_option_types table seems to hold all the important information. ams_options and ams_option_ids are empty tables and refer to each other. It seems to not be used. Is this true?

I am examining the use of AMS by the contacts package, so maybe I just have overlooked a certain feature, but the ams_option_ids package only has one column that refers to the value_id in ams_options.

Collapse
Posted by Matthew Geddert on
Dave, All the tables you mention store data.

As you know, the ams_option_types table, lists the valid options per attribute.

In ams_attribute_values (the master skinny table for ams) there is a one to one to one mapping of attribute_id, value_id and object_id for the sake of speed, i tried one to many and it was a lot slower at data retrieval then a simple one to one (which is most often the case). This table only has indexed integers and is reasonably fast.

This is great of text input, addresses, telecom numbers, etc. But for multi value widgets such as checkboxes or multiselect it wouldn't work since its may options to one value_id. So, there is the ams_option_ids table, this is a index of possible answers to ams_option_types. Whenever somebody answers a question, lets say what colors you like (the options are found in the ams_option_types table):

A, Red
B, Green
C, Yellow

with A and B they get the same ams_option_ids table, value_id. This is one of the ways in which this skinny storage is kept relatively fast - since we only store unique values, with one value_id (instead of creating a new value_id for the same response).

Anyways, the ams_options table is a mapping of value_id to option_id, so it stores that value 1234 has option type A and B. And value 1234 can have option B by itself, etc.

The naming of these tables is a bit wired. The reason its named this way is because many of the other storage mechanisms for attributes such as text, number, dates, postal_address, telecom_number are always one value_id to one answer, and their tables are named ams_texts, ams_times, ams_numbers respectively.

That said, the ams tcl api is complete enough to run contacts (the most complex) and it does this all behind the scenes.

Collapse
Posted by Matthew Geddert on
The end scentence should have been "contact (the most complex application based on AMS)"... I cut some other stuff before posting.
Collapse
Posted by Malte Sussdorff on
I was wondering if creating schemas or materialized views for the AMS lists would speed up the application? At least the initial retrieval of values would be considerably sped up (we have lists with 50 attributes and the list is used by 30.000 objects, resulting in around 150ms per attribute_value to be retrieved). This might also get rid of the need for caching (AMS is heavy in using the ns_cache).

Obviously this would only work if the refreshing of the materialized view does not involve refreshing all the rows, but only those rows that are changed.

Alternatively we could solve this in TCL as well, creating a new table for each list. In AMS we would change the lists. Once we are done with changing we say "publish" (somewhat equivalent to "map", though you can have multiple versions of a list, with only the published version mapped). Upon publish, the old ams_${list_id} table is dropped and a new table recreated prefilled with a column per attribute and a row per mapped object_id.

Once we save a new value we additionally update the row in all the list tables that contain the attribute_id.

This is massively storing more data in the system, but it might speed things up considerably for retrieval of values. Additionally, upon saving the value, we could first update the list's tables and then run a scheduled proc in a seperate thread to insert properly into AMS.

Multiple values (aka options) could then be stored as TCL lists in the various lists' tables.

But again, not sure if this would be worth the effort.

Collapse
Posted by Dave Bauer on
I think the ultimate answer might be to store all the attributes for one type in one table, how regular objects do. I am looking into using contacts, and AMS, and so far, I don't see a way to implement what I need using these packages, even though I think the idea of a contacts package is the right model.

The code I am thinking of converting to contacts adds a complementary object with all the attributes we want to use to extend person. This seems much more efficient and, using the ability to arrange attributes into differnet forms, the user should never notice the difference if all the attributes are in one table, or stored in generic storage.

The other thing that bothers me about contacts is that it extends person, instead of creating an extension type that is related to a person object. This isn' a big deal though, I can see how it will work either way.

Collapse
Posted by Malte Sussdorff on
I like this approach. So basically you talk about generating a "ams_${object_type}_values" table with columns for the object_id and each of the attributes (where the column is named with the attribute name and not the id).

I think this should be fairly easy to achieve taking into account that not too many procedures need to be changed (namely the one for attribute generation and the ones for attribute retrieval).

Not sure what impact this would have though on the contact searches (brr...). Maybe Matthew can elaborate on that one?

Collapse
Posted by Dave Bauer on
THe one table approach makes queries easier, but if you want to have per-package or per-user attributes, you would need to do some additional mapping to determine which attributes belong to which package or user.
Collapse
Posted by Malte Sussdorff on
Well, AMS is a singleton, so you would only have an attribute name across all packages using AMS. It would be interesting though to have the same attribute name in multiple instances of contacts (as an example) and have different values for the contact (object_id), depending on the contacts_package_id.

On the other hand, I like the beauty that you can share the same value of an attribute and maybe even do this across contacts package instances. I guess in this area some thought still needs to be put in.

As for belonging to which user, you could probably handle this with the permissions, but for the way contacts works, this is not needed (the permissions are granted on the group / ams_list level)

Collapse
Posted by Matthew Geddert on
Thinking off of the top of my head converting AMS skinny to AMS generic storage would pose these problems, all of which are not insurmountable but its reasonably tricky:

1. Postgres has a limit of the number of columns that can be in a table, so we might need to create secondary storage tables for sites with many attributes, i.e. persons, persons1, persons2, etc.

2. we would want contact and mime_type for all richtext widgets, etc. So the oclumsns do add up

3. Multiple widgets (checkbox, multiselect) would likely require a column per option (and then be true, false or null), this could mean many, many columns are needed

4. I think it would make sense to store revisions in skinny and live data in the type specific storage type, that way searches and retrieval are fast for live data and the revisions are still there and a bit slower. Speed for frequently accessed data, efficient storage for infrequently accessed data.

5. Re-writing the storage part of ams to deal with type specific shouldn't be too difficult. Re-writing the contacts searches related to this would likely be a lot more tricky (or at least time consuming).

Anyways, I can think of a lot more specific concerns but this I think sums up the main conceptual issues related to such a re-write.

Collapse
Posted by Malte Sussdorff on
To make our life easier at the moment, we should probably have the generic approach added in addition to the the skinny one, thereby reducing the need to rewrite all the search queries. Though the data entry into the skinny tables should happen in a lazy fashion, meaning that the website should return once the value has been stored in the generic storage and not wait for the values to be safed in the skinny storage as well.

1) I think we might not have to think too hard about it. I do agree this problem exists, but if someone really hits this constraint, then we can add the person1, person2 extension.

2) Though this might sound strange, why don't we store the content *always* in text/html. We can do a conversion before storing if the mime_type was text/plain.

3) Bad idea with the one column per option. I would save the selected options in a list within the generic storage. Though the benefit only works with simple display, on edit we need to work with the option_ids again...

4 + 5) Yeah. This also helps us with the searches (as the searches work on revisions anyway and the live revision is in the skinny tables anyway).

All in all I think we should go down this way. Now the question which arises for myself could we do the generic storage using dynamic types? I mean, this is what dynamic types does in the first place or did I miss something here?

Collapse
Posted by Matthew Geddert on
1; Agreed. Once the limit is reach that person (or more likely large organization) can deal with it.

2; At least my school needs a mime_type for richtext widgets, textboxes or textareas could be stored as plain text (the way the rest of the system does). Anybody that doesn't use the javascript richtext widgets would likely want mime type storage as well. If only richtexts had it, I don't think it would add too many columns. Converting from text/html to other formats (such as text/plain, text/enhanced, text/textile or text/wiki) conversion isn't foolproof, the other way around does work consistently.

3; I'm thinking either we shouldn't add a column for multiple attributes (checkboxes, multiselects) and just have a seperate mapping table, or we should store those the way ams_options does with a value_id. Addresses and telecom numbers could just be saved as address_id or number_id entries.

4; 5; Agreed.