Types versus Tests: two approaches for writing correct software
The good thing about consulting with many firms is that we get to see many projects, many approaches, and we end up developing a sense of what the best approach is for each project.
In some sense, this allows us to learn from the experience of many, and greatly amplify our ability to improve. Today, we want to share a bit of our experience, in this case, related to constructing correct software.
The problem: software quality
I’m saddened to have to admit that no person in the world hasn’t had a bad day because of bad software. Software defects are something that keep happening way too often. I’m struggling to think how can we justify constructing driverless cars. Maybe the only reason is that humans are even more likely to make mistakes.
We try our best to have software without bugs. It’s just a hard problem, and no silver bullet to slay this beast. All we have are several approaches at trying to kick the bugs out of the software.
Short, incomplete and inaccurate history of software quality
At first, software developers started writing code. Their code ran on gigantic slow computers made of relays that were housed in buildings and required dedicated power plants to run them. They also had less processing power than the usual calculator watch you buy for ten bucks. And then the first bug happened. A moth flew within the computer, and crossed the contacts in a relay when it switched, getting squished in between, and preventing the current from flowing. The first bug was actually a moth, a kind of bug.
This wouldn’t be the end of it.
Better computers came, with bug-free (the critter kind) components, but the bugs of the other kind kept appearing.
It was obvious that the developers couldn’t be trusted to write code that did the right thing all the time. Sometimes they made mistakes, and implemented the specifications the wrong way. Other times, they implemented the specifications the right way, but the specifications were wrong. In any case, something had to be done. This caused the birth of the QA team.
The QA team had a simple role. Check what the developer had implemented, and see if worked as expected. They had to read the blueprints, and try to see if the developer’s interpretation was within the spirit of the specification. Also, look for ways where the design could be made to do the wrong thing, or where the program could be made to behave incorrectly, even within the specification. This new team caused the cost to rise, but also managed to banish some of the bugs.
Then the first programming languages started to exist. The developers didn’t have to write everything in assembly or machine code any more, which freed some time in their brains. Now that they didn’t have to understand all the gritty little details of the machine, they had a bit more available brain power. Brain power that was fetched immediately to construct more complex software, with more functionality, and more bugs. The problem had to be made worse, before making it better.
At the same time, these first programming languages existed because computers started to become more powerful, which opened another venue: automation. A developer, somewhere, thought that she could start automating some of the work, specially work from the QA team. After all, all these defects the QA team found looked very bad on her when she was trying to claim her quarterly bonus. So she thought about writing her own QA team out of software. Since then there haven’t been that many revolutionary discoveries, but quite a few incremental improvements. Let’s go a bit over them:
Developer-driven manual testing
I’m a developer. I test my code. I fix the failures I find. Which is great, because my code is great, and it works only in one way, and no one is going to use it in another way. The real world out there just giggles.
QA team quality
This is old school. You get a big team of testers to pound at the software looking for ways to break it. Good QA testers are intelligent, scarce and expensive. And testing all the day can bore to death even the strongest soul. So this is expensive, hard, and hardly sustainable. Also it’s likely to find the weirdest defects that no one else will catch.
Post-code automated test quality
You write the code. Then you write more code - called tests - that checks that the previous code does what is expected. Then, on each change you do on your code base, you run the tests, and see if a test fails, hoping that way to detect the introduction of a bug in the program.
Many people claim to do this, but they don’t. They will write tests as soon as the feature is completed, but then the next feature comes in and the next deadline is too near, and we have to defer testing for a bit because of this new deadline. Then another deadline comes in, then another, and another. Maybe I’ll write the tests next week, maybe I’ll write the tests next month, maybe I’ll write the tests next year. Maybe I’ll write the tests before the heat death of the universe.
Now we are talking about automating the testing in a consistent way. Now I can’t write any more any software unless I have a test that justifies writing it. No tests, no code. If we have code, we have tests that check it. Test-Driven Development requires a lot of discipline, but forces you to have automated tests on everything. Also forces you to be humble. You expect your code to work all the time, but the tests tell you continuously what you have broken. Your ego receives punch after punch from reality. But, at least, it happens on your screen, not on the screen of the investors while they are deciding if they want to fund your startup.
Some programming languages allow you to specify details of your program that can be checked at compile time. You have seen it in action in Java when you try to assign a Bar to a Foo, and the compiler refuses to compile. This is the type-checker in action, ensuring you use Bar only with operations that accept Bar. In languages like Haskell and PureScript we try to get these types to the next level.
The benefits of each technique
Each new technique we add to our toolbox allows us to reach a bit more quality, and reduce our dependency on the other tools. Initially, when your developers are in charge of all the QA as well, you need a lot of developers, because they are doing way more than writing code. They have to check their code works, they have to check they haven’t broken any other part of the system, they have to check that their implementation fits the specifications.
You add a QA team. Your developers still have to check that the code works, but checking that they haven’t broken any other part of the system, or that the code fits the specification, can now be done by QA. Also QA is more efficient at checking these details, because they don’t have to understand the deepest details of how a computer works, and they can devote that part of their brains to understand what the specification actually means. Your quality goes up a bit, and you depend less on having so many developers.
You make your developers start writing tests for features already written. Now some of the QA work is automated. You can reduce your QA team, while still doing all the testing that is needed. Also, automated testing is more consistent than fallible humans. So your quality goes up another bit, and you depend less on having so many developers and QA team.
You get your developers to start doing Test-Driven Development. They hate it at first, but it forces them to write tests, and separate and make modular code. Over time, they have a decent architecture and a test suite that covers almost all possible paths. At this point, the most boring part of testing has been automated. Your developers receive real-time feedback of what they have broken, and the QA team can focus on the specifications, instead of on testing if the form accepts only positive numbers for the twelfth time today. Your quality goes up another bit, and you don’t depend that much on having a brutally-focused QA team.
As you have already seen, each tool adds quality, and lowers constraints. So how does Type-Driven Development fit in? The type-checker can be used as another tool, like tests, to catch defects early. If you use a programming language that uses strong static types, you don’t have to care about someone passing a String to a function that only accepts Integers. The type-checker will just not allow that. But also, from the other side: if your function accepts Integers, the type-checker will ensure that inside the function, the Integers will be used only with Integer-accepting operations. All this is great, but detecting that we are passing Strings as Integers is something easily detected on a code review. The point of the type-checker becomes more obvious when we go from toy examples to real world code. I’m sure your reviewers are going to catch the case when you send a String to something that accepts an Integer. I’m not so sure they are going to catch the case when you send a
String to a
Observer, to an operation that accepts a
String to a
Priority. The type-checker will capture that situation.
In some sense, the type-checker allows us to reduce low-level tests. You don’t need to check that this function doesn’t return
Null if the types don’t allow returning
Null at that point. If someone tries to return
Null, that code will be rejected by the type-checker. The more expressive types you have, the more you can constraint what you can write, the more the type-checker will step in and kick bugs out.
But the type-checker is not the final solution. Surely it will guarantee that you are calling the function with an Integer, but it will not check that you are actually calling the function with the right Integer. Which brings us to what all the other techniques did: the type-checker adds another bit of quality, allows us to reduce the amount of tests we need to write, and reduces our dependency on tests. After all, nothing prevents you from using the type-checker the wrong way, and thus you should still keep a safety net of tests to ensure you are not just bypassing the type-checker and pretending everything is fine.
undefined at runtime. But PureScript will complain that the
User record doesn’t have a
Which means we can’t remove testing. We still need it. But having decent types allows us having less testing. Decent types can help you remove 90% of your tests, the tests that check low-level stuff. But you still have to test that other 10%. You need to check that you are actually sending the right Integer to the Integer-accepting function. We haven’t found a way yet to create a type called
RightInteger that holds the right
Integer for each situation, and we suspect it’s impossible.
Expressive strong types provide a new dimension to your toolbox. They capture cases that slip through tests, code review and QA. The type-checker is always there to help you, to tell you what parts of the code you are writing don’t fit with other parts.
It’s one of the many tools a strong development team should use. It doesn’t substitute any other tool, but instead it’s part of why a development team is strong: a strong development team uses the best tools for the job, and strong types provide an extra edge that no other tool can provide.