Tuesday, May 18, 2010

Maintaining legacy data is hard

I smiled when I saw that even the folks at Flickr admit to having trouble maintaining their legacy data.

On my work project, the most difficult aspect to maintaining legacy data is usually caused by the evolving domain model constraints.  In particular, when you try to create explicit relationships among entities that previously were only inferred, or not previously required.  The legacy data invariably seems to violate the relationship constraints, forcing the model to allow for and handle missing relationships.

For those familiar with the Screensaver domain model, this occurred when migrating the PlatesUsed entity to AssayPlate.  PlatesUsed did not have an explicit relationship with Copy, but only maintained the Copy name, as a text value.  When the PlatesUsed entities were converted into AssayPlate entities, there were missing Copy name values.  Furhtermore, many AssayPlates were not defined at all, even though they were screened.  This forced us to create AssayPlates for which the Copy is unknown.  In both cases, the domain model had to allow for and handle AssayPlates that did not have an associated Copy.  But since every AssayPlate absolutely needed to know its library plate number, we had to store that redundantly in AssayPlate, even though it could be determined via AssayPlate.Copy.Plate.plateNumber.