Saturday, February 27, 2016

Async Job Lessons


On my project at work, we are improving our HTTP request processing times by extracting expensive update operations and offloading them to a distributed job queuing system that can execute them asynchronously. While this has helped reduce our system's response times, we have run into a couple of issues that are a fallout of the asynchronous design.
  1. Consider an async job that is triggered while a database transaction is in progress. The transaction inserts row A. The async job will read row A as input, perform a calculation on it, and then record its result elsewhere. If the async job runs before the transaction is committed, the job will fail, since row A is not yet visible outside the transaction. One solution is to delay enqueuing the job until after the transaction is committed. However, this may be difficult to implement. The advantage of this approach is that it can decide never to invoke the async job ever if the transaction aborts. A less desirable solution is to enqueue the job immediately, but to delay the execution of the async job a fixed amount of time (a feature commonly supported by job queuing systems), but then one must decide what a reasonable delay should be. A third solution is to simply rely upon the job being retried, assuming the job queuing system supports retries. This has the possible downside of errors being reported when the job initially runs, which may be unnecessarily alarming depending upon the configuration of the ops environment.
  2. Async jobs that perform updates can suffer from race conditions. If multiple jobs of the same type are invoked in quick succession, and the ordering of async jobs is not guaranteed, then incorrect updates may be made. In particular, the ordering might be reversed if the first job suffers a transient failure (e.g. network communication error) and gets retried after the second job runs successfully. If the two jobs are recording different values, the recorded value will be incorrect after the retried job succeeds, since the value it records is a stale value. One solution is to have such jobs calculate the correct value at the time the job run. If implemented this way, the ordering will not matter, since the last job to run will always calculate the correct value using input values that up to date. As a corollary, it is a bad design to have a job record a value that is provided to it via a parameter at the time of invocation, since that value may become stale.



Tuesday, August 18, 2015

RSpec 3.x Goodness

RSpec 3.3 was released "way back" in June, but I just looked into it, and it provides a new "aggregate_failures" feature. This allows RSpec to run test setup code once for multiple expectations. This moves away from the "one expectation per example" pattern while still running all tests (normally, if you include multiple expectations in a single test, the first failing expectation will short circuit the rest). Used properly, I would expect massive performance increases with feature specs, in particular! In a similar vein, the 3.x series also introduces "compound expectations" for checking a result value against multiple criteria all at once via "and" and "or" composing methods. And if you haven't seen 3.0's "composable matchers", have a look!

Tuesday, August 11, 2015

Kill Process Group

When a UNIX process spawns child processes, they all belong to the same process group. And you can kill the whole process group at once using kill -9 -. The negation symbol in front of the PGID value is the key here. How did I not know this before? I always grep'd the output of ps and killed the individual processes.

Sunday, August 9, 2015

Long vs Short Command Line Options

It's simple really. Don't use short options when invoking commands from scripts. Thanks.

Thursday, August 6, 2015

Code Review

After watching RailsConf 2015 - Implementing a Strong Code-Review Culture, I realize my own code review process is perhaps much more heavy weight than it could be. In particular, I realize that I generally perform a full QA check on each pull request, doing my own manual testing, etc. I also focus heavily on finding bugs, unlike the presenter of this talk. Of course, if I scale back my CR efforts, I know there will be lots of defects that will make it into production, as my current team does not have a QA team at its disposal.

Sunday, February 1, 2015

Software Stymied by a Single Schema?

Most commonly, a software application's persistence layer is a relational database. And so the application's software architecture becomes intricately tied to a single, underlying physical relational schema. Each table is represented by a domain model class, which is tied to a physical table via an ORM framework (Hibernate, Rails ActiveRecord, etc.) that together comprise the domain model.

Over time, the domain model and schema evolve and grow to accommodate additional features (business requirements). In turn, tables and their associated domain model classes take on additional attributes/columns and associations/foreign keys to support the persistence of data needed by the new features. Often, a non-trivial feature will require updates and additions to the domain model and schema that span numerous classes and tables. Before long, the features that comprise the application have code that is spread across the domain model. And conversely, a given domain model class will include attributes and code to support numerous and often unrelated features and business requirements.

The problem is that the application's class structure and physical schema can end up bearing little resemblance to the feature set and business requirements of the application. The mapping between the business requirements (features) and the class design of the application becomes a many-to-many relationship.

One undesirable outcome of this is that multiple features may end up depending upon many of the same classes and attributes in the domain model. And thus changing the usage, semantics, or implementation of any given model or attribute for one feature involves understanding its usage and impact of any other feature that depends upon it as well. Conversely, studying the applications domain model and physical schema does not directly reveal the underlying set of features and business requirements the comprise the application.

Is there a better way to structure our applications to maintain a more direct mapping between the implementation and persistence schema of the feature set and business requirements?

Perhaps an application should be written as a set of mini-applications, where each of these smaller implementations directly implements a single feature or business requirement.
Those paying attention to recent developments in software architecture trends might cry out "use micro-services!" And indeed, the single-responsibility tenet of this architectural pattern is in fact what I am describing here. But note that I am not concerned specifically with the distributed deployment aspect of this pattern, since my concerns apply to distributed and "monolithic" deployments equally. Regardless of how the application's code is structured and deployed, inevitably the disparate feature implementations require access to shared data. For example, the identity of a "user" must be consistently represented across these multiple feature set implementations. Even if we find an appropriate way to structure the highest-level layers of an application to have a clean, one-to-one mapping with the application's feature set, we still end up having a single persistence layer that becomes a catch-all repository for the full set of features. In other words, the schema becomes the union of mini-schemas that might otherwise be needed by each individual feature set implementation (or "micro-service", if you prefer).

And so we arrive back at the original problem posed herein. Namely, how do we maintain a persistence structure that cleanly maps to the individual feature sets and business requirements of the application?

Is there a way to maintain individual schemas--one per feature--where the previously shared data is instead redundantly stored and structured to singularly support the needs of one and only one feature? This flies in the face of normalized database design tenets. Clearly, without significant additional work, our mini-applications' persistence stores will grow out of sync. Both the schemas and the data contained will end up as very different representations of core domain concepts and domain instances. All the benefits of normalized database design are lost.

But might we be able to free ourselves from the strict rules of normalized database design? Can we develop a synchronization layer to guarantee that necessary and specific constraints are satisfied between the disparate data stores? Can we specify these constraints in a way that guarantees the data can still be used in future, unknown capacities? This after all, is perhaps the greatest promise of the relational model. But can we confidently move past this "plan for the future" design mentality? And if we do, will our applications' architectures benefit from these simpler partitions of both logic and data structure?

I hope to continue my research and thoughts on this matter, since I believe it as the core of the software complexity problems the plague classic application architectures today.

Sunday, January 18, 2015

Postgresql: Database Quick Copy

Postgres' database creation commands allow a new database to be cloned from an existing "template" database, including both the schema and the data. As there is nothing special about a "template" database, you can use any existing database within the same database cluster as the source database. This can be much faster than a dump and load operation. This is done simply by using one of the following commands (copied from the Postgres documentation for your convenience):

To create a database by copying template0, use:
CREATE DATABASE dbname TEMPLATE template0;
from the SQL environment, or:
createdb -T template0 dbname
from the shell.