Forum OpenACS Q&A: Holes in the acs_object_id sequence

Collapse
Posted by Lars Pind on
I'm trying to move my pinds.com web site over to the latest version
from OpenACS. Since the version I'm currently running is from
October of last year, I figured that the easiest thing to do would
probably be to start from scratch: There are no upgrade scripts.

But moving data from one installation to another is no easy feat.
With all this object crud, there are tons of rows in tons of tables
to move. And if I want people's bookmarks to keep working, I'll have
to preserve the object_id of all the public objects! This isn't
going ot be easy!

Anyway, the problem I'm run into just now is that when I do a clean
install of OACS 4 from CVS on PG 7.2, and I install the packages
that I need to stuff data into, it's already taken up a lot of
object_id numbers that I need for objects from my old server.

What's puzzling me is that there are large gaps in the object_id
sequence. There's a jump from object_id 1437 to object_id 2033, for
instance, almost 600 numbers, but there are plenty more gaps like
this.

Any idea why this is? Is this a PG feature? Or something that we do
on purpose?

Btw, if someone have any clever ideas for how I can do this awful
migration job smarter, please do chip in :)

Thanks,
/Lars

Collapse
Posted by Don Baccus on
These sequences used to be created with "cache 100" in both Oracle and PG but that seems to be gone now in CVS (I think I removed it, I've never seen any evidence that sequence hits are a big problem in real OpenACS installations and the holes are annoying).

I'd check to make sure that the sequence isn't being created that way.  As to how that could cause 600-number gaps, each PG back-end would cache 100 sequences so several backends could easily cause gaps of this sort.

Collapse
Posted by Jon Griffin on
Lars, Now you know why I have been harping for upgrade scripts (which BTW aren't happening for most packages). For OpenACS to succed there needs to be a path to get to the next version. Telling someone to reload from scratch is a shit solution and prevents momentum.

I wish I had an easy hack for you but as you found out the object model (and its complete misuse) really gets in the way of a dump into a new system. Upgrade scripts for all the data-model changes would go a LONG way towards making this usable.

I would hope that the Gatekeepers don't let 4.6 out without EVERY package having upgrade scripts, with NO EXCEPTIONS. In fact I think that one of the testing procedures should be in place upgrade.

Collapse
Posted by Lars Pind on
Jon,

I couldn't agree more.

But I don't blame anyone for my mess. I knew full well when I started out that I was running on an unsupported version, and that there wouldn't be upgrade scripts for me.

But since 4.5, upgrade scripts should be mandatory, and definitely part of the test plan.

/Lars

Collapse
Posted by Jon Griffin on
Acutally if you look at the old posts on this, you will see that from Beta on all changes required an upgrade script. This didn't even come close to happening which is why I don't have a lot of faith that it will be enforced this time.
Collapse
Posted by Janine Ohmer on
And there are going to be at least a handful of people, myself included, with ACS 4.x sites who want to move them to OpenACS.  We'll run into all the same fun issues.  So Lars, please document what you end up doing!  I am pretty sure it will be painful, but a little less so if we know that the approach has been tried before and works.  Thanks!
Collapse
Posted by Jon Griffin on
Janine,
I have started a doc at http://jongriffin.com/static/openacs/acs-conversion/upgrade. It is just some procedures but I will probably be doing this myself for a client and will make all scripts available.
Collapse
Posted by Branimir Dolicki on
Here's one possible way to solve that problem:

(1) Program a package, let's call it "ID Remapper".  The package uses
    a .vuh to respond to any URL comming in and uses a lookup table
    to figure out the appropriate redirector.  ID Remapper should be
    configurable so you can make it respond to bboard URLs, news URLS
    etc.  You should be able to specify the last part of the URL as
    well as the name of the relevant form variable to look at.

(2) Mount an instance of URL Remapper to every site_node you used to
    have on the old site.

(3) Migrate the old site in such a way that you never use the same
    URLs as in the old site.  Store the id-to-id mapping that will be
    used as a lookup tible for the "URL Remapper"

BTW it has occured to me on a several occasions before that although
it is a nice idea to have just one site-wide sequence of objects
for internal use (although I am one of those who would largely
reduce number of things we call objects), it is perhaps a bad idea to expose that sequence
to the outside world.  Lars's painful experience proves that.  As
OpenACS sites mature it will be increasingly important to be able
to move things around - merge two sites into one, split a site into
two etc.  Scripts that upgrade from one version to another won't
solve this problem.

Maybe it wouldn't be a bad idea to use one sequence per app. instance
solely for the purpose of URLs for referencing content - a composite
candidate key consisting of package_id and the "local" object_id.

As new
pieces of content don't get created so often, we don't really need
database sequences for that, just a database cell storing the last
used ID per app. instance and object type.

That way, if I create a new forum and post a message in it it will
become the message_id number 1 in that forum and will be retrievable
with /forums/foo/one-thread?message_id=1

If the whole forum has to be migrated to another instance of ACS 4
it would be trivial to write a redirector that wouldn't need an
id-to-id lookup table.

On a related note, alternative to upgrade scripts would be a convention that every application has to have
an export and import feature (to XML I suppose).  That could be used
both for migrations (possibly even to and from other systems) and
upgrades.

Collapse
Posted by Dave Bauer on
How about this?

When you load the data model for you new database, set the acs_objects sequence to start higher than the last number you used in the old database.

Then when you import your objects, specify the object_id in the call to acs_object.new.

The system level objects will get created with different ids but that is not an issue, just the visible objects, forums postings, news items etc.

You don't have to grab an id from the sequence as long as the integer you use is unique.

Collapse
Posted by Don Baccus on
Upgrade scripts for 4.6 is in the plan, yes.  Jon, are you up for helping make sure they happen by doing tests of upgrades?
Collapse
Posted by Jon Griffin on
I will of course test out any upgrades, but I already spent half a day creating scripts for the core and I can't justify creating scripts for everyone elses packages.
Collapse
Posted by Don Baccus on
I was offering you the chance to beat them up ... strike that, gently point out the fact that they've forgotten to write upgrade scripts ... not do their work for them.
Collapse
Posted by Jon Griffin on
When the time comes I will gently remind people (everyone but you, who I will beat up!), but I don't use that many packages right now so I will undoubtedly miss some.
Collapse
Posted by Lars Pind on
Branimir,

I agree with you completely that each package instance should use its own numbering scheme that starts from 1. This is exactly what I did for the bug-tracker. Each bug has both a bug_id, which is the object_id for acs_objects, and a bug_number, which is not a primary key, but is unique when combined with the package_id (which happens to be called project_id in this table). Only the bug_number is exposed in URLs.

Would people agree that this should be the recommended practice?

I'd also very strongly support XML-based export/import scripts, although I know that it's going to be even more painful to get people to keep those in sync than it already is with the upgrade scripts.

Alas, neither of these things are likely to happen any time soon.

/Lars

Collapse
Posted by defunct defunct on
Jon,

I for one would 'do my nut' if there aren't upgrade scripts, as it'd make a bit of a joke bothering to try and get testing going etc....

But I'm sure that won't happen ;)

However testing them (he says scratching his head) is going to be an absolute mare! As someone will have to create lots of dummy data etc for each package and attempt it...

Not wishing to put you off though!!!! Are you up for taking on that task? We could organise that separately from Acceptance?

If not, does anyone else want to volunteer?

I just came with an idea for a new OACS version (version 5?), change the way to associate id from sequence to chunk of data and in place, put a hashed code of the data (a MD digest for instance). This version would provide a universal index structure valid for OACS all over the world. The drawback would be: the sequence information (next, previous) would be lost.
Collapse
Posted by Jon Griffin on
Regarding XML, I had another thread on import/export. XML isn't really that widely accepted yet, it will be but not yet. Also CSV and DIF are very valid formats for Legacy data. There needs to be a general solution that doesn't require package designers to write their own scripts. Either and ns_import,ns_export or some package in tcl. I say make it non-acs specific.

Re: Testing, I will test any packages that I am using but I really don't have the time to load ACS with every package and test. Maybe we can just get a list of whoever wants to test something and has package XXX on a system. The kernel I will do.

Collapse
Posted by Michael Bryzek on
<blockquote> XML isn't really that widely accepted yet, it will be but not yet
</blockquote>

I would argue that XML is extremely well accepted, esp. within the Java community. But that's not really too important here. We recently completed a complex data migration from a non-relational sql server schema to an ACS object-oriented oracle schema. We used XML as an intermediate representation for the data. The major advantage was that all data was loaded thru our api. The major disadvantage was a serious performance penalty. If people are interested, I'd be more than happy to share more details of what we did.

With any serious data migration, where serious means having enough data that you're worried about the performance of the one time migration, we really need to consider native tools for doing the migration. Oracle provides a loader utility which is much more efficient than loading data through SQL.

For testing upgrade scripts, the problem is extremely difficult and has two distinct pieces:

  1. Is the structure of the upgraded schema correct?
  2. Did all my data make it to the right place?

The first question is much easier to test. For Oracle, I wrote a pl/sql script awhile ago which compares most of the objects in two different schemas. This package ensures that the same objects exist in both schemas and then does different validation based on the type of the object:

  INDEX (existence check only)

  LOB (existence check only)

  PACKAGE, PACKAGE BODY, FUNCTION, PROCEDURE (line-by-line check)

  SEQUENCE (check for identical next values)

  TABLE (check for same columns, same constraints, same number of rows)

  TRIGGER (check for same type, table, when_clause, text)

  VIEW (check for same number of rows, same text)

I'm not sure if this script ever made it into ACS, but I'd be more than happy to dig it up if somebody thinks it will be useful. I found it very useful to validate that a version 1.0 schema + an upgrade script is equivalent to a version 1.1 schema.

The second point (validating that the data made it) is much more difficult - I don't know of any good tools/ways to automate even parts of this validation.

Collapse
Posted by Branimir Dolicki on
> With any serious data migration, where serious means having enough data that
> you're worried about the performance of the one time migration, we really
> need to consider native tools for doing the migration. Oracle provides a
> loader utility which is much more efficient than loading data through SQL.

You are right, Mike. I'm taking the XML idea back. I remember our last data migration (from an ACS3-based system to an ACS4-based). Although the migration was performed completely within Oracle (moving data from one schema to another using PL/SQL scripts), the full run took abut 12 hours to complete! I can't imagine how long it would have taken if we dumped everything in XML and then parsed it again.

Collapse
Posted by Tom Jackson on

Jon,

What should ns_export and ns_import do? I am very interested in looking into some general facility like this. I have a new data format that offers all the advantages of XML for data representation, but can parsed very fast, faster than a CSV format.

Also, has anyone discovered a format for representing an SQL data model, besides SQL, that can then be translated into multiple SQL dialects? It would probably need to cover sequences, views, etc. I don't have much hope there is such a thing, but just thought I'd ask.

Collapse
Posted by Jon Griffin on
I feel that ns_export/import should genericize the ability to take lists and such and export them. On the other end it should take a standard (or non-standard) dump format and convert it to a standard tcl structure.

I would hope it could have a plugin architecture, but if not, it could comprise several functions.

I only mention it as an ns_module because I feel it should be below the OpenACS level and then can easily exploit existing libraries/C code.

On another note, I was looking at some libraries and found a few interesting crypto libs in C++, I think AOLServer can use lib in C++ but I am not sure. It seems that a couple of modules are written in C++ with a wrapper in C. Does anyone know about this?

Collapse
Posted by Dan Wickstrom on
I've noticed that the newer jdk's are referencing c++ libs when I build nsjava, so I think it is possible to include c++ shared libs in aolserver.  I think all you have to do is declare the wrapper functions using "extern C" when building the shared libs in c++.
Collapse
Posted by Neophytos Demetriou on
Also, has anyone discovered a format for representing an SQL data model, besides SQL, that can then be translated into multiple SQL dialects? It would probably need to cover sequences, views, etc. I don't have much hope there is such a thing, but just thought I'd ask.
Tom, a persistence layer package could do what you describe (an xml-based layer could feed the data to the persistence layer). I hope that makes sense.
Collapse
Posted by Jade Rubick on
Branimir, I don't suppose any of those upgrade scripts are lying around
anywhere?
Collapse
Posted by Branimir Dolicki on
Jade,

I doubt you would find the scripts useful.  They upgrade a heavily
customized ACS3 to a heavily customized ACS4.