Imagine if you could design the perfect organisation and architecture to build your next product – what would it look like? Think about this for a moment, and then let it sink in that you’re wrong.
“Don’t be so arrogant”, you might retort, “how could you know?”.
I know you’re wrong, because:
- you don’t even know what your next product will end up looking like;
- you don’t know how it will evolve based on feedback; and therefore
- you won’t know how you will structure and restructure your organisation over time to build it, in turn influencing the design in a paradoxical feedback loop (Conway’s Law).
This is true whether or not you’re building the next Spotify, or rebuilding a well known existing product. The truth is, organisations and their architectures evolve.
The question is, outside of actionable business metrics, how do we measure that our development arm, technical architecture and processes are working in order to improve them?
Let’s see how we can do this following the evolution of an imaginary startup – Soundify – an oddly familiar sounding music streaming service.
Continuous delivery and the evolution of teams and services
We want to get our newly funded product to market as soon as possible, whilst also maintaining the highest quality, lowest risk and lowest cost base. Let’s map out our organisation’s evolution, beginning with its life as a startup, tracking key metrics as we go.
Stage 1: Life as a startup.
As pragmatic practitioners, we didn’t jump straight to microservices, but began by rapidly experimenting and building an MVP that garnered a portion of the market, with a good old-fashioned web-based “monolith”. We composed a single cross-functional team to see it done and we saw that it was good.
Testing was painless, deployments were quick and development was easy!
Diagram: Our starting point – one team, one monolith!
|Time in pipeline (commit to prod)||5 minutes|
|Deployments per day||10|
Stage 2: Split UI and API
The first thing we noticed, was that the UI was becoming coupled with the back-end and our growing team were finding it difficult to change components easily and independently. Moving to React was a catalyst to refactor the UI into its own component. Furthermore, the streaming aspects of the service were specialised and big enough to warrant a separate team to work on.
We needed multiple environments to manage this move so that the teams could test and operate autonomously, and because we had separate components we now had to integration test prior to releasing either component, adding a few more minutes to our pipeline. This integration made things a bit riskier and occasionally we miscommunicated a change causing a minor outage. Overall, we were still delivering more value to the customer and our shareholders, so we accepted this risk.
Diagram: Our team and platform at stage 2: two components, two teams!
|Time in pipeline (commit to prod)||10 minutes|
|Deployments per day||15|
Stage 3: More product lines
Our seedling has now grown to 24 people, split into teams across separate product lines: the music streaming service (API), experience (mobile and desktop + API) and administration system (Web + API).
This added 14 more test environments to enable teams to operate independently, which we cleverly managed with a combination of Docker (for local development) and Cloudformation. The builds are now up to 30 minutes due to all of the components and integration tests having to run before a release.
Often, developers batched multiple changes together so they didn’t have to wait in the deployment queue. This occasionally led to missing something and releasing a defect, further increasing our risk profile.
Yes, we had a few problems but we were pumping out features at a rate never seen before at Soundify, our customers loved our attention to their requests and our investors saw that it was good!
Diagram: Our team and platform at stage 3
|Time in pipeline (commit to prod)||30 minutes|
|Deployments per day||10|
Stage Z: Large organisation
A year passes and we have grown to 10 teams. We also recently acquired uTunes, a cloud-based music streaming service by Mapple, doubling our customer base and product offering overnight. We have re-arranged our engineering group into 10 separate teams managing 20 components, and our next task is to offer the music catalog from uTunes to our existing Soundify customer base.
Unfortunately, uTunes runs on very old technology that cannot be run on AWS and must run on specific OBM hardware.
At this point, we needed at least 200 environments to allow headroom for all teams to do their testing. This became hard to manage and quite expensive, so we time-shared environments and coordinated releases. Furthermore, as uTunes runs on bespoke, expensive OBM kit we had to time-share that as well.
We also needed to create a shared services team (we called them “DevOps”) whose sole job was looking after our shared build servers, test environments, build scripts and so on as it no longer made sense for each team to store the exact same set of build and deployment scripts in their own repositories.
The lack of environment headroom meant that many changes would have to wait to be merged (and therefore shipped) to avoid cascading build problems. This led the build team to move from simple feature-branch based development to GitFlow, a more complex code management scheme. This increased developer idle time and therefore both cost and risk.
Builds now took about two hours, so changes were always batched to avoid huge build queues piling up. Unfortunately, this led to integration test and subsequently merge hell. The feedback loop between making a change and getting feedback was at least two hours, exacerbating the issue, so we assigned a dedicated release manager to oversee the process.
Diagram: Our team at stage four: things are… complicated.
|Time in pipeline (commit to prod)||120+ minutes|
|Deployments per day||1|
A series of logical steps and unfortunate events
So what happened? We evolved our architecture and organisation in response to changing conditions as most best practices would encourage, yet:
- Our teams were able to build independently, but were forced to collaborate closely in order to ship;
- We had to manage and pay for an exponential amount of environments, or wear the cost of additional developer idle time;
- Our delivery pipeline has grown linearly in time resulting in batching changes together;
- Batching has resulted in increased failure and defect rates;
- Integration tests failure results in disproportionately more time debugging them;
- Deployments are riskier than when we were continuously shipping the original monolith;
- The role of “Tester” emerged, whose job was to manually test features in fixed environments and fix build issues instead of coaching and doing exploratory testing;
- The role of “Release Manager” emerged, to govern the entire release process;
- Tech debt has accumulated as a result of these pressures.
All of this has meant that development costs increased significantly higher than the proportional increase in engineering resources. This is not what we had in mind when this all began (told you)!
Losing the cohesion
The root cause of much of the above, can be attributed to the fact that while we wanted “cohesive components, loosely coupled” what we instead created were cohesive teams, tightly coupled by these components. This has led to a series logical changes that made it worse as we scaled the organisation.
The key mistake was introduced in Step 2 when we decided that integrated tests were the way to prove our system worked together. This has resulted in a cumulative effect on the system: as teams and components were introduced, nonlinear pressures were applied to build time, risk and cost. These effects are represented below:
Diagram: Incremental impact of team size and components
The delusion of integrated tests
J.B. Rainsberger said it best – “Integrated tests are a scam”. The disadvantages are both numerous and terrifying, they are:
- They require expensive setup/teardown routines;
- They interact with running, connected systems, traversing networks, consuming resources etc.
- Expensive setup/teardown routines increase likelihood of race-conditions and other failures;
- Data fixtures become orders of magnitudes more complicated as they must span the entire ecosystem;
- Test failures now may occur anywhere in the system, making it difficult to find the root cause;
- Non-deterministic, leading to other race-conditions and brittle tests.
- Combinatorially complex
- As we increase the size of the system, the testing required to get sufficient coverage spirals out of control.
Let’s assume we want to test the UI and its immediate collaborators API1 and API17 from Diagram 4. To have 100% coverage, we end up also having to test their transitive dependencies which for arguments’ sake are API2-5 for a total of seven components.
Environment setup complexity aside, let’s do the maths on how many tests we need. The formula to work out the combinations is
and so on. This can be reduced to the following if all components have the same number of branches, where simplest of branches is the old if-else statement:
- 2 code branches = 128 tests
- 5 code branches = 78,125 tests
- 10 code branches = 10M tests
As you can see, the number of tests (and therefore, time) required spirals out of control very quickly, even with trivially simple services. In practice, this is an impossible feat to pull off.
What we need is fast, isolated tests – just like unit tests – but for testing collaborating services:
Diagram: Split integration tests into separate unit tests
Such tests have the opposite properties of integrated tests:
- 100’s or 1000’s per second.
- Simpler data fixtures, no requirement of complex environment setups;
- Can pinpoint test failures as per unit tests.
- Linear complexity:
- Only ever two boxes tested at any point in time.
We use Test Doubles to stand in for the real service when running these. Of course, this leads one to ask: “Can we trust the Test Double will always be a good stand in?”. The answer to which, is “No”.
Consumer-driven contracts and Pact
Using a style of integration contract tests known as consumer-driven contracts (CDC) is the final piece of our puzzle. CDC compares the requests and responses between interactions of the consumer and its test double, recording them as a contract. These contracts are then shared back to the actual providing service, allowing it to validate that any changes it makes when it is next updated will not break its consumers. The process also supports the inverse scenario.
Whilst you can certainly roll your own contract testing framework, we recommend the use of Pact, an open source consumer-driven contract testing tool purpose built to solve these sorts of problems. It is available in multiple languages (Java, Scala, Groovy, Ruby, .NET, Swift, JS and Golang) making it a great fit for heterogeneous microservices.
Using these strategies is how we guarantee that the Test Double is always a valid stand-in, how we avoid having to spin up fixed environments per feature/project/team and the key to uncoupling the team’s release cycles. If we had done this at stage 2 – the moment we distributed parts of our application – our scaling issues disappear and we can add teams and components independently of these other factors. We have no need for a release manager (or team) and deployments are simple, fast and most importantly – safe – again.
So there you have it, it was your process that was the problem after all!
Further Reading / Watching
- Introduction to consumer-driven contracts with Pact (Article)
- Pact (Website)
- Deploy with Confidence! – Ron Holshausen (Video)
- Integrated tests are a scam – J.B. Rainsberger (Video / Article)
- Verifying Microservice Integrations with Contract Testing – Atlassian (Video)
- Microservice Testing (Article)
- Escape the integration syrup with contract tests – Stefan Smith (Video)