Obverse versus Reverse: Benchmarking in Haskell with criterion

In the Obverse and Reverse blog post, we used mathematics to improve the performance of a function that returns the elements of a list in reverse order. More specifically, we used mathematics to get from:

obverse :: [a] -> [a]
obverse []     = []
obverse (x:xs) = obverse xs ++ [x]

To:

reverse :: [a] -> [a]
reverse xs = reverse' xs []
  where
    reverse' []      ys = ys
    reverse' (x:xs') ys = reverse' xs' (x:ys)

Additionally, we measured the time it takes for both functions to reverse specific lists using the clockSomething function, which uses the clock package as described by Chris Done in Measuring duration in Haskell:

clockSomething :: a -> IO ()
clockSomething something = do
  start <- getTime Monotonic
  void (evaluate something)
  end <- getTime Monotonic
  fprint (timeSpecs % "\n") start end

In particular, we clocked the time it takes to reverse lists of one and ten million consecutive integers, as follows:

ghci> clockSomething (obverse [1..1000000] :: [Integer])
320.47 ms
ghci> clockSomething (reverse [1..1000000] :: [Integer])
176.93 ms
ghci> clockSomething (obverse [1..10000000] :: [Integer])
3.38 s
ghci> clockSomething (reverse [1..10000000] :: [Integer])
1.63 s

Instead of using lists of consecutive integers, we can use QuickCheck to generate lists of a given length with arbitrary integers:

arbitraryIntVectorOf :: Int -> IO [Int]
arbitraryIntVectorOf n = generate (vectorOf n arbitrary)

For example:

ghci> arbitraryIntVectorOf 10
[5,15,11,6,-22,21,-18,29,10,1]
ghci> arbitraryIntVectorOf 10
[24,-23,-19,15,20,-9,21,-30,-28,5]

Now we can generate several lists of ten million random integers and compare the time it takes for both functions to reverse them:

main :: IO ()
main = do
  putStrLn "obverse versus reverse/10000000..."
  putStr   "  obverse/1... "
  arbitraryIntVectorOf 10000000 >>= clockSomething . obverse
  putStr   "  reverse/1... "
  arbitraryIntVectorOf 10000000 >>= clockSomething . reverse
  putStr   "  obverse/2... "
  arbitraryIntVectorOf 10000000 >>= clockSomething . obverse
  putStr   "  reverse/2... "
  arbitraryIntVectorOf 10000000 >>= clockSomething . reverse
  ...

And this is just an executable that we can add to a Cabal file as a benchmark:

benchmark clock-benchmarks
  build-depends:
    base,
    clock,
    formatting >= 6.1.0,
    QuickCheck,
    reverse,
    time
  default-language: Haskell2010
  ghc-options:      -Wall
  hs-source-dirs:   benchmarks
  main-is:          ClockBenchmarks.hs
  type:             exitcode-stdio-1.0

In order to run clock-benchmarks, we use cabal bench:

$ cabal bench clock-benchmarks
...
Benchmark clock-benchmarks: RUNNING...
obverse versus reverse/10000000...
  obverse/1... 3.63 s
  reverse/1... 1.85 s
  obverse/2... 4.44 s
  reverse/2... 1.33 s
  obverse/3... 3.30 s
  reverse/3... 2.27 s
  obverse/4... 3.22 s
  reverse/4... 2.14 s
  obverse/5... 3.38 s
  reverse/5... 1.32 s
Benchmark clock-benchmarks: FINISH

In average, obverse and reverse took 3.59 and 1.78 seconds to reverse lists of ten million elements, respectively, which confirms that reverse beats obverse in terms of performance.

Even though clocking things is useful, Haskell provides much more powerful ways to benchmark programs, like Bryan O'Sullivan's criterion benchmarking library, which has an easy tutorial and complete documentation. Let's use criterion for comparing the obverse and reverse functions.

First, we define a function that generates two lists of a given length with arbitrary integers using the arbitraryIntVectorOf function:

arbitraryIntVectorsOf :: Int -> IO ([Int],[Int])
arbitraryIntVectorsOf n = do
  xs <- arbitraryIntVectorOf n
  ys <- arbitraryIntVectorOf n
  return (xs,ys)

Second, we create an executable in which we run obverse and reverse over random lists of one hundred, one thousand, and ten thousand integers:

main :: IO ()
main =
  defaultMainWith
    (defaultConfig {reportFile = Just "obverse-versus-reverse.html"})
    [env (arbitraryIntVectorsOf 100)
         (\ ~(xs,ys) ->
            bgroup
              "obverse versus reverse/100"
              [bench "obverse" $ nf obverse xs
              ,bench "reverse" $ nf reverse ys])
    ,env (arbitraryIntVectorsOf 1000)
         (\ ~(xs,ys) ->
            bgroup
              "obverse versus reverse/1000"
              [bench "obverse" $ nf obverse xs
              ,bench "reverse" $ nf reverse ys])
    ,env (arbitraryIntVectorsOf 10000)
         (\ ~(xs,ys) ->
            bgroup
              "obverse versus reverse/10000"
              [bench "obverse" $ nf obverse xs
              ,bench "reverse" $ nf reverse ys])]

We use criterion's defaultMainWith, which takes a configuration and a list of benchmarks. We use the default configuration, but specify a report file to get a complete report. We have three benchmarks, which correspond to the lists of one hundred, one thousand, and ten thousand elements.

For each benchmark, we use env, which creates an environment (in each case, it creates two lists of the given length) and then uses it to produce a benchmark. Each benchmark consists of two benchmarks grouped by bgroup and identified by either "obverse" or "reverse". The benchmarks apply both functions to the generated lists using nf (which evaluates the result to head normal form) taking into account that the result is a lazy structure.

Of course, we can add this executable to a Cabal file, as follows:

benchmark criterion-benchmarks
  build-depends:    base, criterion, QuickCheck, reverse
  default-language: Haskell2010
  ghc-options:      -Wall
  hs-source-dirs:   benchmarks
  main-is:          CriterionBenchmarks.hs
  type:             exitcode-stdio-1.0

And run it using cabal bench:

$ cabal bench criterion-benchmarks
...
Benchmark criterion-benchmarks: RUNNING...
benchmarking obverse versus reverse/100/obverse
time                 35.82 μs   (35.34 μs .. 36.44 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 36.06 μs   (35.66 μs .. 36.50 μs)
std dev              1.406 μs   (1.218 μs .. 1.591 μs)
variance introduced by outliers: 43% (moderately inflated)

benchmarking obverse versus reverse/100/reverse
time                 919.8 ns   (912.1 ns .. 930.7 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 936.5 ns   (925.9 ns .. 949.1 ns)
std dev              40.75 ns   (35.83 ns .. 44.23 ns)
variance introduced by outliers: 60% (severely inflated)

benchmarking obverse versus reverse/1000/obverse
time                 5.994 ms   (5.934 ms .. 6.050 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.987 ms   (5.964 ms .. 6.021 ms)
std dev              81.83 μs   (58.04 μs .. 130.1 μs)

benchmarking obverse versus reverse/1000/reverse
time                 9.551 μs   (9.495 μs .. 9.633 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.510 μs   (9.490 μs .. 9.558 μs)
std dev              92.05 ns   (52.81 ns .. 165.5 ns)

benchmarking obverse versus reverse/10000/obverse
time                 1.520 s    (1.499 s .. 1.537 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.518 s    (1.514 s .. 1.520 s)
std dev              3.648 ms   (0.0 s .. 4.206 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking obverse versus reverse/10000/reverse
time                 133.6 μs   (133.1 μs .. 134.4 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 134.7 μs   (133.9 μs .. 136.1 μs)
std dev              3.611 μs   (2.359 μs .. 5.243 μs)
variance introduced by outliers: 22% (moderately inflated)

Benchmark criterion-benchmarks: FINISH

The output of each benchmark lists the time needed for a single execution of the function being benchmarked, followed by a goodness of fit measure which should lie between 0.99 and 1, the mean execution time, the standard deviation, and an optional warning of variance introduced by outliers, which is moderate in three cases and severe in one (but we'll ignore that for now and blame "the phase of the moon").

Again, it's clear that reverse beats obverse considering the execution times, but the report file includes charts and a lot of useful information, as well as details about what everything means.

Many thanks to Franklin Chen, who suggested using criterion for comparing the obverse and reverse functions.

Obverse versus Reverse: Benchmarking in Haskell with criterion

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Announcing semantic-release-hackage, a semantic-release plugin to publish packages in Hackage

Hiring for Haskell? Here's What You Need to Know

Celebrating 10 Years of Our Haskell Training Course

Announcing semantic-release-hackage, a semantic-release plugin to publish packages in Hackage