Diabol Continuous Delivery, Microservices, Quality Assurance, Testing

2018-04-13 TOMY TYNJÄ 

Let’s address the pitfalls that integration testing still creates in our industry.

Integration Testing Dos And Donts

I’ve had many discussions with clients and colleagues about integration testing, especially in a microservice-oriented architecture. One of the main reasons for using such an architecture is to be able to develop, test and deploy parts of a system independently. These architectures are typically backed by automated Continuous Delivery pipelines that aim to give the user fast feedback and lower risk when deploying to production.

The shared integration environment, a.k.a. staging

A very common scenario is to have a shared integration environment (a.k.a. staging or continuous integration environment), where teams deploy their services to test these and their integrations towards other services. I've seen environments where over 35 teams were deploying their services to above on a regular and frequent basis. The environment is supposed to be treated as production but in reality, it never is. Usually the environment is not set up with an architecture matching production, and it lacks proper monitoring, alarms and necessary feedback-loops to the developers.

Even in companies where all those aspects are addressed, teams tend not to care about the overall health of the shared integration environment as they think it is time consuming with minimal return on investment.

In most cases there is some end-to-end test suite running tests in this environment. Theoretically it might sound like a feasible idea to run extensive tests to catch bugs before hitting production. In practice, quality assurance teams responsible for the overall quality of the system have to run around and hunt down teams to find out why certain tests fail and why these failures are not addressed.

With 100+ services in such an environment there is always new code being deployed and tests executing all the time with flakiness as a common result. A helpful pattern is to reduce the amount of tests to just a few critical flows. One seldomly debated question is how to warrant the time and cost of keeping this environment up and running at all times. There is a lot of waste to be removed from processes that involves a shared integration environment.

What version to test against?

One of the biggest problems testing in a shared integration environment is that you seldom test services against versions that are being used in production. If you test a service and its interactions towards other services that run a different version than in production, how can you be sure that you test the right thing? This provides a false sense of security and invalidates the whole purpose of those tests.

Here is a real world example.

A team had developed a code change which made an addition to an existing API endpoint. The code change passed the commit and test stages and was therefore deployed to the shared integration (staging) environment. Normally, the deployment pipeline was supposed to automatically proceed with deploying this change to the production environment. However, the deployment pipeline happened to be broken between the staging environment and production due to a environment-specific configuration issue. Another team wanted to use this new API addition and tested their service in the staging environment, where it existed.

This second team deemed that their functionality worked as expected and proceeded with deploying their code change to their production environment. They did not notice that the API endpoint used had a different version in the production environment where the new functionality did not exist. Therefore, the second team’s software broke miserably with a broken customer experience as a result, until the code change was quickly rolled back.

This could have been avoided if the service was tested against the same version as running in production.

How to do this, then? Always tag each service with a "production-latest" label after it has been successfully deployed to production and use is versions when testing the integrations of a service. This gives high confidence in that the tests are accurate, in comparison with tests running in a shared integration environment where versions seldom are the ones expected.

Integration testing without a shared integration environment

Consumer-driven contract testing

Consumer-driven contract testing is a great alternative to a shared integration environment approach. Especially in a microservice-oriented architecture, where a big benefit is being able to speed up development and delivery by composing a system based on smaller building blocks. Each single deployment pipeline represents a single deployable unit, e.g. a service. With consumer-driven contract testing, the interactions between services are tested by verifying the protocol between them. These tests are fast to run and thus provides fast feedback. Keep in mind that it is important to validate both sides of the API contract.

The next-door neighbour approach

For some teams, consumer-driven contract testing with existing tools is not possible, for instance if a custom protocol is used to communicate between services. An alternative approach is to run integration tests for a particular service by just testing their “next door neighbor”. In the real world, a dependency graph can be daunting to traverse for a particular service. As the idea is to have short feedback loops to test each service, which are deployable on their own, you do not focus on the overall system as such. Therefore scoping what to test to the current service and the service(s) it integrates with is enough. To support this, services have to be able to start without all of their dependencies up and running. An approach is to allow services to run with degraded performance and capabilities, so that neighbouring services can use them even though not all of its dependencies are satisfied.

Mocks and stubs

Most services can use mocked or stubbed equivalents of their dependencies as in many test cases it is not of interest to validate the behaviour of the integrated service.

For cases that there is an interest, the drawback is that you have to mimic the behaviour of real services which can leave bugs unnoticed. Interactions are better tested using consumer-driven contracts or the next-door neighbour approach mentioned above.

Tests against mocks and stubs run fast and are therefore suitable for unit tests running in the deployment pipeline commit stage.

Keep in mind that too complex mocking can leave you spending a lot of time re-implementing logic from the real service in your mock, which will also require maintenance down the road. Ideally, you want to spend most of your coding in the service you are developing and not in the mock. If too much complex logic has to be mimicked, ask yourself whether the mock is the right choice or if you would be better off just using the real system instead. This can be the case for e.g. off-the-shelf third party products.

Here’s what to keep in mind

Key is to have stable APIs. Don't just change the behaviour of existing methods. Instead, add new methods or provide new services with new functionality. This fosters stability, quality and easy maintenance.

A single deployment pipeline represents a single deployable unit, such as a service. Services that must be tested together should therefore also be deployed together. Test services and their interactions by verifying the protocol that is used between them. Prioritize fast-running tests that can be run as early as possible in the deployment pipeline to reduce waste and to encourage small frequent changes with less risk.

Testing for all possible scenarios is impossible, as the problem space is infinite. No matter how much effort spent on testing before production, bugs will happen. It might be due to user input, growing data sets or infrastructure failures. Therefore it is of utmost importance that the proper feedback loops are in place in order to quickly discover errors and to have the ability to quickly roll back a code change. Mean time to recovery trumps mean time between failures. The best thing we can do, is to have the right tools and feedback loops in place to be able to get feedback as fast as possible whether a given code change works or not and to be able to react to situations when it is not.

I’m here for any questions.