Sunday, February 1, 2015

Software Stymied by a Single Schema?

Most commonly, a software application's persistence layer is a relational database. And so the application's software architecture becomes intricately tied to a single, underlying physical relational schema. Each table is represented by a domain model class, which is tied to a physical table via an ORM framework (Hibernate, Rails ActiveRecord, etc.) that together comprise the domain model.

Over time, the domain model and schema evolve and grow to accommodate additional features (business requirements). In turn, tables and their associated domain model classes take on additional attributes/columns and associations/foreign keys to support the persistence of data needed by the new features. Often, a non-trivial feature will require updates and additions to the domain model and schema that span numerous classes and tables. Before long, the features that comprise the application have code that is spread across the domain model. And conversely, a given domain model class will include attributes and code to support numerous and often unrelated features and business requirements.

The problem is that the application's class structure and physical schema can end up bearing little resemblance to the feature set and business requirements of the application. The mapping between the business requirements (features) and the class design of the application becomes a many-to-many relationship.

One undesirable outcome of this is that multiple features may end up depending upon many of the same classes and attributes in the domain model. And thus changing the usage, semantics, or implementation of any given model or attribute for one feature involves understanding its usage and impact of any other feature that depends upon it as well. Conversely, studying the applications domain model and physical schema does not directly reveal the underlying set of features and business requirements the comprise the application.

Is there a better way to structure our applications to maintain a more direct mapping between the implementation and persistence schema of the feature set and business requirements?

Perhaps an application should be written as a set of mini-applications, where each of these smaller implementations directly implements a single feature or business requirement.
Those paying attention to recent developments in software architecture trends might cry out "use micro-services!" And indeed, the single-responsibility tenet of this architectural pattern is in fact what I am describing here. But note that I am not concerned specifically with the distributed deployment aspect of this pattern, since my concerns apply to distributed and "monolithic" deployments equally. Regardless of how the application's code is structured and deployed, inevitably the disparate feature implementations require access to shared data. For example, the identity of a "user" must be consistently represented across these multiple feature set implementations. Even if we find an appropriate way to structure the highest-level layers of an application to have a clean, one-to-one mapping with the application's feature set, we still end up having a single persistence layer that becomes a catch-all repository for the full set of features. In other words, the schema becomes the union of mini-schemas that might otherwise be needed by each individual feature set implementation (or "micro-service", if you prefer).

And so we arrive back at the original problem posed herein. Namely, how do we maintain a persistence structure that cleanly maps to the individual feature sets and business requirements of the application?

Is there a way to maintain individual schemas--one per feature--where the previously shared data is instead redundantly stored and structured to singularly support the needs of one and only one feature? This flies in the face of normalized database design tenets. Clearly, without significant additional work, our mini-applications' persistence stores will grow out of sync. Both the schemas and the data contained will end up as very different representations of core domain concepts and domain instances. All the benefits of normalized database design are lost.

But might we be able to free ourselves from the strict rules of normalized database design? Can we develop a synchronization layer to guarantee that necessary and specific constraints are satisfied between the disparate data stores? Can we specify these constraints in a way that guarantees the data can still be used in future, unknown capacities? This after all, is perhaps the greatest promise of the relational model. But can we confidently move past this "plan for the future" design mentality? And if we do, will our applications' architectures benefit from these simpler partitions of both logic and data structure?

I hope to continue my research and thoughts on this matter, since I believe it as the core of the software complexity problems the plague classic application architectures today.