John -
Your perspective makes a lot of sense to me. Let me add a few more
details about our particular migration. Most importantly, we were
migrating from a flat, non-relational schema that we had never seen
before the migration started. There were several distinct problems we
had to solve:
- Transfer data from the original database to somewhere we could
import it into our new database. In our case we were migrating from
a SQL Server database, but the source doesn't really matter.
- Completely understand the old schema so that we could actually
figure out what objects we would need in the ACS schema. This
probably took 50% of the total time to do the migration.
- Export the old data and generate some kind of import script.
- Import the data.
For us, using XML as an intermediate representation allowed us to
entirely focus on transforming the original data into a hierarchy that
matches that in the ACS. We could do this without worrying about the
actual import process (e.g. I could create a user XML tag without
worrying about additional methods to call for password encryption).
Our actual process looked like:
- Grab data from SQL server
- Massage data and map to appropriate ACS Objects
- Emit XML
- Parse XML
- Load data
In our case, the last 2 steps are now done for us and will likely
require only minor modifications the next time we migrate data, even
though our API's and data model are constantly changing.
You make a good point that by the time we've organized our data into a
logical XML structure, we could have emitted an appropriate script to
load our data. In our case, the code that parses the XML and
instantiates the actual objects is written fairly generically. We can
add attributes to our object types, create new types of relations
between our objects, and in some cases add entirely new object types
without having to modify the actual code that imports the data. This
is a big win for us as we expect to do many more migrations from
disparate sources.
I would argue in this simple case of 3.x->4.x data migration it
won't be worth the overhead of XML
I would agree with you. I have never done a 3.x -> 4.x migration, but
I have done migrations with other versions of ACS. These data models
share enough similarities that an additional data interface would have
little benefit for a one-time migration.
One other angle to consider is the frequency with which you intend to
migrate data. A one-time migration has different requirements from a
weekly/monthly/yearly migration. My limited experience so far has
shown that it is easier to think about generating an XML document than
the correct sequence of actual API calls. If you are going to be
migrating data more than once, from different sources, an intermediate
data structure may make more sense.
And this is what led me to post initially. With OACS, we should expect
lots of people to do lots of data migrations as they adopt OACS. It
would be easier for users to migrate data to OACS if the process was
well-defined and part of it was even implemented (the XML Reader and
loader). This would allow users familiar with an old schema to migrate
correctly to OACS without understanding every API in detail.