I have a couple of issues with the current install behaviour but wanted to collect feedback first before changing something.
a) If I load a db script for an upgrade and it fails, the changes made in the script so far are not rolled back. Shouldn't we not make this either a transaction by default or have the best practice to wrap all upgrade.sql files in a statement?
b) If there is a problem with the upgrade.sql files (see above), why not change our default behavior to do upgrades not in the SQL upgrade but within a special "sql_upgrade" callback, which allows us to source DM changes with db_transaction, therefore not loading a file but executing dml commands from TCL
c) If a callback fails all the other upgrade callbacks are still executed. shouldn't we prevent that? Preferably shouldn't we roll back the whole upgrade (with all callbacks) in an attempt to prevent us leaving the system in a broken state?
d) If it is not possible to undo all, but just the last one, wouldn't it make sense to increase the version number with each single step, effectively changing how we do upgrades? So instead of loading all upgrade.sql files in one go, we take them one step at a time, look for a corresponding callback and then execute them. If there is no error, then increase the version number to the max of both, otherwise abort. This will at least allow us to fix an error upon upgrade and run the upgrade scripts again, especially the callbacks.
e) If the after_upgrade callbacks fail, the package is still enabled. This is okay in that it describes correctly what the callback does and when it is executed. Sadly the granularity fails here, so if you upgrade from XoWIKI 0.48 to 0.60, all of them will be executed after 0.60 has been enabled, which might not be the case as the enabling requires the after_upgrade callbacks to be successful.
f) I am missing an "after DB, before enabling" callback.
I think the majority of my problems come from the fact that upgrades are not treated as a granular thing but we do them in one block. This was not an issue when we only had DB scripts to be concerned about, but with the apm callbacks this can lead to a lot of unexpected behavior. Which is fine if you have a staging server and have the habit of first running the upgrade on staging with the latest production data and then on production, but I'm not sure all users are doing that. Apart from the hassle to reload a backup of your DB 😊.