Forum OpenACS Development: Re: GUIDs (Globally Unique Identifiers) as Object IDs?
Starting with pg9.2, postgres supports UUIDs as datatype, but does not provide a standard function for generating it (there are add-on modules available) . Instead of sequences, one has to use functions to generate IDs. The organization of the files in the CR has to be altered. The impact of UUIDs on OpenACS magic IDs is as well not clear.... Upgrade scripts look like an intellectual challenge.
So, i would not call this a small change, since it will effects at the end every package (data model and the input validation). Don't expect, that this would solve all aspects for horizontal scaling. If you "ignore unique constraints" you customers will complain about data losses.
Using UUIDs has some sex appeal, i thought about it in the past, but it would be a substantial change.
"everything looks like a hammer when all one knows is a hammer",
What are the weaknesses in addressing this with mapping for everything output and input, where internals remain unchanged:
create table uu_site_object_map (
-- indexes created for each case
So long as UUIDs are pre-generated (such as via abundant output from a bitcoin mining pool or blockchain) and confirmed unique?
OTOH, if you know to import data from some satellite" system B or C into A, one can store the highest known object_id of the last sync on A (e.g. B_max, and C_max). When performing an import from e.g. B, map just the IDs higher than B_max (i.e. ignoring deletes). When the goal is a bidirectional sync, guess then there is no other way than having per "satellite" a separate acs_objects table as basis for a sync....
quite easy to use GUIDs (large random numbers...)
> one cannot store a UUID in a integer type
Yes you can. This is not about adherence to standards here, but about a simple solution to scale out horizontally So for example I could imagine to just use "large random numbers".
Actually, sequential numbers with a "prefix" * 1000000000 offset (with a separate "prefix" for each system) could also be sufficient to create separate "number circuits". So even sequences might continue to work... And this might actually work with normal integers and without the needs for bigints...
If you mean "1000000000" literally, be aware that the maximum value of an integer in PostgreSQL is "2147483647" (roughly twice that value) . Notice that on sites like openacs.org, we had already overruns on 32bit sequences .
Nevertheless, for "site-synching" (see above), i see no need for any kind of UUIDs.
I mentioned the blockchain for obtaining a pool of preprocessed random hashes to be used as UUIDs, but those are apparently hashed at 256bits. And as Gustaf mentions, using standards outside of their definition tends to confuse things and break convention;
Maybe verifiable, unpredictable 128bit UUID format hashes can be generated using a system/satellite salt reference and object_id et cetera with SHA-3/ It apparently has little risk of collisions and relatively low processor demands, and the bit length can be configured to fit UUID.