The Muselet #19: Beyond TDD

Wikipedia:

Test-driven development (TDD) is a software development process relying on software requirements being converted to test cases before software is fully developed, and tracking all software development by repeatedly testing the software against all test cases. This is opposed to software being developed first and test cases created later.

Once you get into it, TDD is a beautifully satisfying experience: we write the specification of software to be built in the form of programmatic tests; these tests will all initially fail (turn red). Then, the fun starts: as we write the code that implements the spec, and as we run the tests, write more code, run the tests again over and over, more and more of the red lights will turn green. Dopamine galore. 

And then, after some time: boom. All tests turn green and... we’re done! It’s a great feeling. Not only is it clear we accomplished our goal, we now also have the tests that ensure our code will keep working according to spec tomorrow, next week and next year. We’re building up a regression suite.

TDD is awesome.

However, there’s a big, hidden assumption in TDD.

It assumes our specification is correct. It assumes we know what we need to build. Not plan to build, mind you; need to build.

Do we really know, though?

Defining success

Frankly, in some areas we most likely do, and we can probably safely nail down our requirements with a specification. Certain areas of product development have more or less been figured out, at least at the rudimentary level. For instance, having people sign up for an account, having them reset their password when the forget, adding an item to a shopping cart on an e-commerce site. Or we may just “borrow” some proven ideas from our competition (to return later). We can write specs for these, have engineers implement them, push them live and be reasonably convinced customers will be happy with them.

But those are “table stake” pieces of functionality, not differentiators. Customers don’t decide to use our product over the competition because of our awesome sign-up form, or how marvelously we copied the functionality from the competitor. The value ought to be elsewhere.

And these are the areas where it’s far less clear how to make customers successful. We may think we have a pretty good idea of how to do this, but too often, we’re... just wrong. Dead wrong.

Anybody who’s ever attended a session in a UX lab where a version of their product was tested by a regular off-the-street customer will have received their share of facial pounding with reality. Often we are shockingly bad at predicting how people perceive or use our products. Even if (and that’s a big if) they understand how to use our product, they do batshit crazy, unanticipated stuff. They zoom their web browser to 200%, because they cannot find their glasses. They get lost, because a link opened in a new tab in their web browser. They casually dismiss the value of a feature that you’ve invested months in developing. Real people don’t behave as we expect them to.

So, if that’s the case, how do we adapt our approach? 

We apply TDD. At the product level.

We start with clearly defining what success looks like from the customer’s perspective. Next, we translate this picture of success into a metric that we can track closely. Then, we set a target value for this metric. With this, we have our framework in place, we can now test if we’re there or not; if our light is green or red.


Sidebar: if this somehow reminds you of my universal recipe for success — well, what can I say, it’s universal.


We gave my mother a picture of the kids printed on canvas for her birthday this year. I arranged this online: I uploaded a picture, cropped it, adjusted a few things, and had it shipped to my mother’s address. Let’s use this service as an example to make things a bit more concrete.

For this service, we could define success as follows:

70% of customers who upload at least one picture complete the purchase process successfully.

Conceptually, we could now apply TDD at this much higher level of abstraction. A type of “Product TDD” if you will. We can create a dashboard we’ll use to track this metric over time and shiny red and green lights signifying if the goal is achieved. 

Initially the lights would be red, and then the fun starts. However, it’s a different type of fun. It’s fun that no longer starts with a clear mental picture of the code you need to write to make the light flip from red to green, it requires a much wider set of skills. It requires research: if currently only 30% of customers complete the purchase process, where do they drop off? You need to collect tracking data for this. Once you figure out where they drop off, can you figure out why they drop off? 

Sometimes you can just ask (like we do in a UX lab), or you just spy on them, but that’s not always the solution. It’s good to find obvious hurdles, but it doesn’t scale. Your next best bet is to experiment.

Experiment all the things

If people drop off at the payment page, is it the price that scares them?

What would happen if we tell them the price much earlier in the process? Let’s try. 

Do they drop off because they don’t find a payment method that suits them? We notice we have a lot iOS users. Perhaps if we offer Apple Pay, would that help? Let’s try.

We try by splitting our audience in two: one part gets to see the old version, the second gets to the adjusted version, then we compare results. This is what we call A/B testing.

After we try, and the results show significant improvement, we switch everybody to the new version. When it doesn’t work, we discard the functionality. Then: on to the next experiment!

One aspect that makes TDD so satisfying is the fast feedback loop. If you write your tests in a somewhat efficient manner, you tend to get feedback on the progress you’ve made within a few seconds, at most minutes.

Sadly, this will be very hard to achieve with “Product TDD.” However, it is very much worth thinking and investing in making this cycle as short as possible. Yes, we’re back to optimizing cycle time.

There’s various parameters that will affect this cycle time:

  1. Volume: how big is your audience? If you get a few orders per week on your website, it will take a very long time to get any reliable feedback on whether your flow improvements have the desired effect. If you get hundreds of purchases per minute, you will get this feedback much more quickly. 

  2. Lead versus lag metrics: certain metrics are “laggy” — they will change value with a significantly delay. For instance, if your purchase flow consists of eight screens and you measure the conversion of the entire flow (like in the metric we used as our example), it’s likely it will take people around an hour to go through the whole flow. This means that in the most ideal case, assuming you have significant volume, it will take a few hours or maybe a day to know if your change had any impact. However, if you narrow down your conversion at a screen-level of granularity, you can come up with less laggy lead metrics that are reasonable predictors of the lag metric at the screen level. For instance, you could set target conversion rates per screen and thereby cut the feedback cycle time potentially eight times (in this case).

  3. Time to experiment: How much time does it take to go from “hey, I got an idea” to that experiment running on production? These ranges may vary wildly. If you release your product twice per year, an experimental approach becomes essentially impractical. However, even if you release multiple times per day, you need the proper infrastructure in place to run experiments. Tools to A/B test and distinguish differences in behavior between the versions, quality controls in place, deployment pipelines etc.

Similar to “regular” TDD, the beauty of this approach is that once you have the tests in place, even after all lights have gone green, the automation investment will have long-term value. Once achieved, your metrics and their targets can simply be turned into health metrics that can be monitored for regressions.

Perhaps later elsewhere in your product you unintentionally introduce a change that breaks the conversion of your purchase flow. You track this already, so you will be notified when this happens. This way you effectively build up a product-level regression suite.

As I said: TDD is awesome, its concepts should be used more widely.