Forum OpenACS Q&A: Response to Data migration

Collapse
8: Response to Data migration (response to 1)
Posted by Michael Bryzek on
John - Your perspective makes a lot of sense to me. Let me add a few more details about our particular migration. Most importantly, we were migrating from a flat, non-relational schema that we had never seen before the migration started. There were several distinct problems we had to solve:
  1. Transfer data from the original database to somewhere we could import it into our new database. In our case we were migrating from a SQL Server database, but the source doesn't really matter.
  2. Completely understand the old schema so that we could actually figure out what objects we would need in the ACS schema. This probably took 50% of the total time to do the migration.
  3. Export the old data and generate some kind of import script.
  4. Import the data.
For us, using XML as an intermediate representation allowed us to entirely focus on transforming the original data into a hierarchy that matches that in the ACS. We could do this without worrying about the actual import process (e.g. I could create a user XML tag without worrying about additional methods to call for password encryption).

Our actual process looked like:

  1. Grab data from SQL server
  2. Massage data and map to appropriate ACS Objects
  3. Emit XML
  4. Parse XML
  5. Load data
In our case, the last 2 steps are now done for us and will likely require only minor modifications the next time we migrate data, even though our API's and data model are constantly changing.

You make a good point that by the time we've organized our data into a logical XML structure, we could have emitted an appropriate script to load our data. In our case, the code that parses the XML and instantiates the actual objects is written fairly generically. We can add attributes to our object types, create new types of relations between our objects, and in some cases add entirely new object types without having to modify the actual code that imports the data. This is a big win for us as we expect to do many more migrations from disparate sources.

I would argue in this simple case of 3.x->4.x data migration it won't be worth the overhead of XML
I would agree with you. I have never done a 3.x -> 4.x migration, but I have done migrations with other versions of ACS. These data models share enough similarities that an additional data interface would have little benefit for a one-time migration.

One other angle to consider is the frequency with which you intend to migrate data. A one-time migration has different requirements from a weekly/monthly/yearly migration. My limited experience so far has shown that it is easier to think about generating an XML document than the correct sequence of actual API calls. If you are going to be migrating data more than once, from different sources, an intermediate data structure may make more sense.

And this is what led me to post initially. With OACS, we should expect lots of people to do lots of data migrations as they adopt OACS. It would be easier for users to migrate data to OACS if the process was well-defined and part of it was even implemented (the XML Reader and loader). This would allow users familiar with an old schema to migrate correctly to OACS without understanding every API in detail.