Enhance Software Reliability with Formal Verification

Software is an integral part of modern society. We use it every day, directly or indirectly, to make our lives easier. We find it in our phones, it is used to run companies and governments all over the world, and it is also found on vehicles like cars and airplanes.

Naturally, given how ubiquitous software is, we would want it to be as free of bugs as possible. Errors in software affect us in different ways, like leading us to conclude inaccurate information or causing companies to lose money. In the more extreme cases, they could even cause fatal consequences resulting in deaths, for instance in a vehicle failure. These kinds of errors have happened in the past:

Therac-25, a machine unit used to treat cancer patients with radiotherapy, had several failures due to race conditions and ended up administering lethal doses of radiotherapy to six people, killing some of them and leaving the others with permanent injuries.
The Ariane 5 rocket had a software failure due to an integer overflow which resulted in the explosion of the rocket, causing a loss of $370 million. Fortunately, the Ariane 5 was unmanned.

We as developers understand the importance of correct software. At Stack Builders, we are committed to delivering high-quality code. To achieve this, we continuously explore cutting-edge techniques and tools to ensure the correctness of the software we develop. In this blog post, we will provide an introduction to formal verification, a technique used to rigorously prove that a program is correct. Later on, we will dig a bit deeper into the more precise meaning of the term "correct" in the context of software programs.

Wanting our programs to be free of these critical bugs brings us to the question: how can we increase our confidence in the software we are developing? One way to do that, and the most widespread approach, is what we usually think of as "testing". This includes all sorts of tests: unit tests, integration tests, end-to-end tests, etc. Testing helps us spot potential bugs in our software earlier before it gets shipped, and can also help prevent regressions.

Testing and the absence of bugs

Tests can help us find bugs in our programs, especially when covering corner cases. However, tests cannot, in general, ensure the absence of bugs. Testing is an approach that aims to check the behavior of software against a given set of test cases. Ensuring the absence of bugs with testing requires us to define a set of test cases that covers the entirety of the domain of any given function that we want to test.

// Logical AND
function and(x: Bool, y: Bool) -> Bool {
    if (x == False) then
        return False;
    else
        return y;
}

// Tests
assert(and(False, False) == False)
assert(and(False, True) == False)
assert(and(True, False) == False)
assert(and(True, True) == True)

In this example (in a made-up programming language, to keep things simple) we have defined an AND function which takes two boolean arguments and returns a boolean value. We have defined four tests to check the behavior of AND. These four tests ensure the absence of bugs in the function since we have defined tests that cover the entire domain of the function; that is, we have one test for every possible combination of the arguments' values.

The easiest way to know how many test cases we need for a particular function is to calculate the size of its domain. In this case, we have two possible values for the first argument and two possible values for the second argument, so the size of AND's domain is 2 * 2 = 4.

Easy, right? Well, let's try to use the same testing approach in another example:

function square(x: Int32, y: Int32) -> Int32 {
    return x * y;
}

Uh-oh... what happens if we try to cover the whole domain of the square function? Let's see: the first argument has 2^32 possible values, and the second argument has 2^32 possible values, so the size of the square's domain is... 2^32 * 2^32 = 18446744073709551616.

It is clearly unfeasible to have that many test cases written! But it can get worse:

function toUppercase(name: String) -> String {
    for (letter in name) do
        uppercase(letter);
    return name;
}

The toUppercase function takes a string and returns that string with every letter turned into uppercase. In this case, the domain is theoretically infinite, so it's straight out impossible to write test cases for every input. Realistically, if we assume that a String is arbitrarily large and allocated in dynamic memory, the number of possible values depends on the free memory of the system, which is still an astronomical number.

As we were saying earlier, to ensure the absence of bugs with testing, we need to cover the domain of the function we are testing. We have seen that in general, this is not doable or practical.

However, there are other techniques more effective than testing when we aim to ensure that our program is correct. Many of these techniques come from formal methods research efforts.

One of the most well-known and widely applied techniques, and the one we will be focusing on, is formal verification.

Formal verification in short

Formal verification is a technique used to prove the correctness of a program against a given specification. That might be a lot to take in, so let's break it down.

The first important term to introduce is specification. A specification, simply put, is a description of what a program should do. The concept of specification appears in other areas of software engineering, and we as developers often encounter specifications in its informal form. One example of a specification, taking the toUppercase function we defined earlier, would be:

The toUppercase function receives a string argument and returns a matching string in which every letter appearing in the argument is turned into uppercase. Every other character is left untouched.

In formal verification, as the name suggests, we deal with formal specifications of our programs. A formal specification is a more mathematical and rigorous description of what our program does. Formal specifications are written in what's called a specification language; the specifications we write might look like code written in a conventional programming language, but specification languages are usually much limited in their expressive power. The goal of a specification language is not to be a programming language on its own, but rather to describe particular properties of our programs. For example, going back to our toUppercase function:

function toUppercase(name: String) -> result: String {
    satisfies:
        - result.length == name.length
        - forall (i <- 0..result.length-1). result[i] == uppercase(name[i])
}

This could be an example of a specification for our toUppercase function in a specification language I just made up purely for illustrative purposes. In our specification, we unambiguously state:

The toUppercase function receives a string argument name, and returns a string result, satisfying the two following conditions:

"the length of result (the returned string) must be equal to the length of name (the argument string)."
"for all indices i from 0 to the length of result minus 1, the character at index i in the result string must be equal to the character at index i in the name string applied to the uppercase function."

This is relatively similar to our informal specification we gave earlier! Just a bit more math-y.

The properties we can describe of our software are determined by the focus and intent of the specification language. A specification language can be aimed at checking the kinds of data our programs deal with, or whether our programs return values satisfying some properties (like in our example), or even more general semantic properties, like how a program behaves when seeing it from a concurrency point of view.

Now that we have introduced the concept of a specification for our functions or programs, we can now answer the question of what we mean when we say that a program is correct. As you might have guessed, we say that a program is correct when it adheres to the specification we have given for that program. It's as simple as that! It also relates to our intuitive notion of correctness: we would say that a program is correct when it behaves exactly like it is supposed to. A specification describes this behavior.

This naturally implies that the program is free of bugs! ...well, while being true, this is kind of inaccurate. The kinds of bugs that get ruled out depend on the kind of behavior that can be expressed by our specification language. One specification language could be great at allowing you to express concurrency behavior, but not have support for modeling memory access behavior. The fact that our program is correct against a specification written in such a specification language means that it is free of concurrency bugs, which is still great if that's what we're looking for! But it means nothing in terms of memory safety.

Finally, we can go back to our main topic. We stated at the beginning of this section that formal verification is the process of verifying (i.e. checking) that a program adheres to the formal specification we have given for that program. With everything we have introduced so far, we can now make sense of this definition.

Here is what a general process of formal verification looks like:

Using formal verification

Now, how can I as a developer make use of this? You might have already been using a form of verification all this time without realizing it! If you have developed programs in a statically-typed language such as Java, C/C++, C#, Go, Rust or Haskell, you have used one of the most common and useful verification tools out there: static type checking. With static types, you are describing the expected behavior of your program when it comes to data: if an argument is tagged as a string, then it must be treated as a string or it will result in a compiler error (meaning that your program is not correct). The specification for your program is comprised of the type annotations.

float findArea(float x, float y) { ... }

In this example of a function definition in C, we have just provided a specification for how you expect your function to behave:

The findArea function receives two floating-point numbers as arguments and returns another floating-point number.

Admittedly, this specification is fairly limited in expressing the behavior of our function: it says nothing about what the function does, and there are many other functions that we could come up with that would fit in that description. But it still does a very good job of ruling out potential bugs when calling the findArea function with incorrect data (e.g. a string argument). The type systems found in Java or C are not particularly expressive, but you can find type systems like the ones found in Haskell or Rust that allow you to express much more nuanced properties of the kinds of data that your program can accept. Type systems keep being researched and taken even further to check for the correctness of other useful properties such as concurrency (e.g. session types) or memory access (e.g. linear types).

However, as we showed in the examples earlier, when relying on verification we usually want to have ways to describe in more detail how our program behaves. It's possible to find specification languages and tools that integrate tightly with a given programming language, allowing us to express formal properties as annotations in the source code. For example, Java Modelling Language (JML):

public static final int MAX_BALANCE = 1000; 
private /*@ spec_public @*/ int balance;

//@ requires 0 < amount && amount + balance < MAX_BALANCE;
//@ assignable balance;
//@ ensures balance == \old(balance) + amount;
public void addToAccount(int amount) { balance += amount; }

The formal specification in this case is given by the annotations just above the function. The required annotation is a precondition, a property that must hold right before the addToAccount function is called. The ensures annotation is a postcondition, a property that must hold right after the addToAccount function is called and has finished executing. This fashion of specifying a function's behavior is called design-by-contract.

The useful thing about functions that use design-by-contract is that if we are able to guarantee that when calling the function the stated precondition holds, then we can safely assume that the stated postcondition will hold when the function completes its execution. This is possible because a design-by-contract verification tool will try to ensure that the postcondition holds by formally reasoning about the implementation of the function. Otherwise, the verification tool will raise an error stating that the function is not correct! We will need to go back to the function and see how the implementation is violating the postcondition.

Other formal verification tools are independent of particular programming languages and are instead focused on verifying an abstract model of our program's behavior. A couple of examples:

TLA+ is a formal specification language with a complete toolchain and IDE, with a great focus on concurrent and distributed systems.
Coq is an interactive theorem prover that is also used to verify algorithms and programs. It provides a formal specification language called Gallina.

These kinds of tools can be extremely powerful if used properly, although admittedly they can also be pretty hard to use for people less familiar with formal verification. They can be used to verify very complex programs. For example, CompCert C is a verified compiler for C that can provide extra guarantees that there won't be bugs generated during compilation (compilers are also programs created by humans, and can have bugs!), intended for use in life-critical software. CompCert C is primarily written and verified in Coq.

Conclusions

Formal verification is not a substitute for testing. Given that it's noticeably harder and more time-consuming to ensure correctness for a program when compared to writing some relevant (even if they are not 100% exhaustive) test cases, it still makes more sense to simply rely on testing when deadlines are tight and what you care about is catching the most prominent bugs that could appear in your software.

However, we have seen that there are times when tests are not enough to ensure that our programs behave exactly as we expect them to. When we need such a high degree of confidence in our software, formal verification can help us accomplish that.

To get started with formal verification, you can try out one of the tools we mentioned earlier, or you can look up on the Internet what's currently available for a programming language of your choice. If your chosen programming language is mainstream enough, chances are there will be at least one tool to perform some degree of formal verification in your programs.

Increasing confidence in your software with formal verification

Increasing confidence in WebAssembly code with formal verification

Diving into Property-Based Testing with JS - Part 3