# Obverse versus Reverse: Benchmarking in Haskell with criterion

In the Obverse and Reverse blog post, we used mathematics to improve the performance of a function that returns the elements of a list in reverse order. More specifically, we used mathematics to get from:

``````obverse :: [a] -> [a]
obverse []     = []
obverse (x:xs) = obverse xs ++ [x]``````

To:

``````reverse :: [a] -> [a]
reverse xs = reverse' xs []
where
reverse' []      ys = ys
reverse' (x:xs') ys = reverse' xs' (x:ys)``````

Additionally, we measured the time it takes for both functions to reverse specific lists using the `clockSomething` function, which uses the clock package as described by Chris Done in Measuring duration in Haskell:

``````clockSomething :: a -> IO ()
clockSomething something = do
start <- getTime Monotonic
void (evaluate something)
end <- getTime Monotonic
fprint (timeSpecs % "\n") start end``````

In particular, we clocked the time it takes to reverse lists of one and ten million consecutive integers, as follows:

``````ghci> clockSomething (obverse [1..1000000] :: [Integer])
320.47 ms
ghci> clockSomething (reverse [1..1000000] :: [Integer])
176.93 ms
ghci> clockSomething (obverse [1..10000000] :: [Integer])
3.38 s
ghci> clockSomething (reverse [1..10000000] :: [Integer])
1.63 s``````

Instead of using lists of consecutive integers, we can use QuickCheck to generate lists of a given length with arbitrary integers:

``````arbitraryIntVectorOf :: Int -> IO [Int]
arbitraryIntVectorOf n = generate (vectorOf n arbitrary)``````

For example:

``````ghci> arbitraryIntVectorOf 10
[5,15,11,6,-22,21,-18,29,10,1]
ghci> arbitraryIntVectorOf 10
[24,-23,-19,15,20,-9,21,-30,-28,5]``````

Now we can generate several lists of ten million random integers and compare the time it takes for both functions to reverse them:

``````main :: IO ()
main = do
putStrLn "obverse versus reverse/10000000..."
putStr   "  obverse/1... "
arbitraryIntVectorOf 10000000 >>= clockSomething . obverse
putStr   "  reverse/1... "
arbitraryIntVectorOf 10000000 >>= clockSomething . reverse
putStr   "  obverse/2... "
arbitraryIntVectorOf 10000000 >>= clockSomething . obverse
putStr   "  reverse/2... "
arbitraryIntVectorOf 10000000 >>= clockSomething . reverse
...``````

And this is just an executable that we can add to a Cabal file as a benchmark:

``````benchmark clock-benchmarks
build-depends:
base,
clock,
formatting >= 6.1.0,
QuickCheck,
reverse,
time
ghc-options:      -Wall
hs-source-dirs:   benchmarks
main-is:          ClockBenchmarks.hs
type:             exitcode-stdio-1.0``````

In order to run `clock-benchmarks`, we use `cabal bench`:

``````\$ cabal bench clock-benchmarks
...
Benchmark clock-benchmarks: RUNNING...
obverse versus reverse/10000000...
obverse/1... 3.63 s
reverse/1... 1.85 s
obverse/2... 4.44 s
reverse/2... 1.33 s
obverse/3... 3.30 s
reverse/3... 2.27 s
obverse/4... 3.22 s
reverse/4... 2.14 s
obverse/5... 3.38 s
reverse/5... 1.32 s
Benchmark clock-benchmarks: FINISH``````

In average, `obverse` and `reverse` took 3.59 and 1.78 seconds to reverse lists of ten million elements, respectively, which confirms that `reverse` beats `obverse` in terms of performance.

Even though clocking things is useful, Haskell provides much more powerful ways to benchmark programs, like Bryan O’Sullivan’s criterion benchmarking library, which has an easy tutorial and complete documentation. Let’s use criterion for comparing the `obverse` and `reverse` functions.

First, we define a function that generates two lists of a given length with arbitrary integers using the `arbitraryIntVectorOf` function:

``````arbitraryIntVectorsOf :: Int -> IO ([Int],[Int])
arbitraryIntVectorsOf n = do
xs <- arbitraryIntVectorOf n
ys <- arbitraryIntVectorOf n
return (xs,ys)``````

Second, we create an executable in which we run `obverse` and `reverse` over random lists of one hundred, one thousand, and ten thousand integers:

``````main :: IO ()
main =
defaultMainWith
(defaultConfig {reportFile = Just "obverse-versus-reverse.html"})
[env (arbitraryIntVectorsOf 100)
(\ ~(xs,ys) ->
bgroup
"obverse versus reverse/100"
[bench "obverse" \$ nf obverse xs
,bench "reverse" \$ nf reverse ys])
,env (arbitraryIntVectorsOf 1000)
(\ ~(xs,ys) ->
bgroup
"obverse versus reverse/1000"
[bench "obverse" \$ nf obverse xs
,bench "reverse" \$ nf reverse ys])
,env (arbitraryIntVectorsOf 10000)
(\ ~(xs,ys) ->
bgroup
"obverse versus reverse/10000"
[bench "obverse" \$ nf obverse xs
,bench "reverse" \$ nf reverse ys])]``````

We use `criterion`’s `defaultMainWith`, which takes a configuration and a list of benchmarks. We use the default configuration, but specify a report file to get a complete report. We have three benchmarks, which correspond to the lists of one hundred, one thousand, and ten thousand elements.

For each benchmark, we use `env`, which creates an environment (in each case, it creates two lists of the given length) and then uses it to produce a benchmark. Each benchmark consists of two benchmarks grouped by `bgroup` and identified by either `"obverse"` or `"reverse"`. The benchmarks apply both functions to the generated lists using `nf` (which evaluates the result to head normal form) taking into account that the result is a lazy structure.

Of course, we can add this executable to a Cabal file, as follows:

``````benchmark criterion-benchmarks
build-depends:    base, criterion, QuickCheck, reverse
ghc-options:      -Wall
hs-source-dirs:   benchmarks
main-is:          CriterionBenchmarks.hs
type:             exitcode-stdio-1.0``````

And run it using `cabal bench`:

``````\$ cabal bench criterion-benchmarks
...
Benchmark criterion-benchmarks: RUNNING...
benchmarking obverse versus reverse/100/obverse
time                 35.82 μs   (35.34 μs .. 36.44 μs)
0.998 R²   (0.998 R² .. 0.999 R²)
mean                 36.06 μs   (35.66 μs .. 36.50 μs)
std dev              1.406 μs   (1.218 μs .. 1.591 μs)
variance introduced by outliers: 43% (moderately inflated)

benchmarking obverse versus reverse/100/reverse
time                 919.8 ns   (912.1 ns .. 930.7 ns)
0.999 R²   (0.998 R² .. 0.999 R²)
mean                 936.5 ns   (925.9 ns .. 949.1 ns)
std dev              40.75 ns   (35.83 ns .. 44.23 ns)
variance introduced by outliers: 60% (severely inflated)

benchmarking obverse versus reverse/1000/obverse
time                 5.994 ms   (5.934 ms .. 6.050 ms)
1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.987 ms   (5.964 ms .. 6.021 ms)
std dev              81.83 μs   (58.04 μs .. 130.1 μs)

benchmarking obverse versus reverse/1000/reverse
time                 9.551 μs   (9.495 μs .. 9.633 μs)
1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.510 μs   (9.490 μs .. 9.558 μs)
std dev              92.05 ns   (52.81 ns .. 165.5 ns)

benchmarking obverse versus reverse/10000/obverse
time                 1.520 s    (1.499 s .. 1.537 s)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.518 s    (1.514 s .. 1.520 s)
std dev              3.648 ms   (0.0 s .. 4.206 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking obverse versus reverse/10000/reverse
time                 133.6 μs   (133.1 μs .. 134.4 μs)
1.000 R²   (0.999 R² .. 1.000 R²)
mean                 134.7 μs   (133.9 μs .. 136.1 μs)
std dev              3.611 μs   (2.359 μs .. 5.243 μs)
variance introduced by outliers: 22% (moderately inflated)

Benchmark criterion-benchmarks: FINISH``````

The output of each benchmark lists the time needed for a single execution of the function being benchmarked, followed by a goodness of fit measure which should lie between 0.99 and 1, the mean execution time, the standard deviation, and an optional warning of variance introduced by outliers, which is moderate in three cases and severe in one (but we’ll ignore that for now and blame “the phase of the moon”).

Again, it’s clear that `reverse` beats `obverse` considering the execution times, but the report file includes charts and a lot of useful information, as well as details about what everything means.

Many thanks to Franklin Chen, who suggested using criterion for comparing the `obverse` and `reverse` functions.