Pattern Matching: Wot's... Uh the Deal?

The patch will not be noticeable if the pattern is skilfully matched.

—Idabelle McGlauflin, Handicraft for Girls

Pattern matching is everywhere in Haskell. It's simple, but very interesting: there's more to it than just from left to right and from top to bottom. One way to better understand pattern matching is to answer two questions:

What is the syntax of pattern matching?
What are the semantics of pattern matching?

In order to summarize the different types of patterns for pattern matching, we can study the definitions of three standard Haskell functions: span, unzip, and fromMaybe.

First, let's consider the span function, which splits a list into the longest prefix of elements that satisfy a predicate and the remainder of the list, as defined in GHC.List:

span :: (a -> Bool) -> [a] -> ([a],[a])
span _ xs@[]       = (xs,xs)
span p xs@(x:xs')
  | p x            = let (ys,zs) = span p xs' in (x:ys,zs)
  | otherwise      = ([],xs)

The span function has:

A variable, p, which matches anything and binds p to the value matched against.
A wildcard pattern, _, which matches anything and binds nothing.
Two as patterns, xs@[] and xs@(x:xs'), which match something and bind xs to [] or appropriate values if [] and (x:xs') match something, respectively.

Now, let's take a look at the unzip function, which "transforms a list of pairs into a list of first components and a list of second components," as defined in GHC.List:

unzip :: [(a,b)] -> ([a],[b])
unzip = foldr (\(a,b) ~(as,bs) -> (a:as,b:bs)) ([],[])

The unzip function contains:

A lazy pattern, ~(as,bs), which matches anything, and binds as and bs to appropriate values if (as,bs) matches something, or to undefined otherwise.

Finally, let's consider the fromMaybe function, which takes a default value and a Maybe, and returns the value contained in the Maybe or the default value if there's nothing there, as defined in Data.Maybe:

fromMaybe :: a -> Maybe a -> a
fromMaybe d mx =
  case mx of
    Nothing -> d
    Just x  -> x

The fromMaybe function contains regular patterns inside a case expression. In reality, all patterns are transformed to case expressions, and the (formal) semantics of pattern matching are actually the semantics of case expressions, as described in the Haskell 2010 Language Report.

However, we can study how pattern matching works in terms of patterns other than the ones inside case expressions, that is, the informal semantics of pattern matching.

In short, the informal semantics of pattern matching are as follows:

Patterns are matched against values.
Pattern matching may
succeed,
fail, or
diverge (that is, undefined).
Pattern matching proceeds
from left to right and
from top to bottom.
Patterns can be
irrefutable, which always succeed, even when matched against undefined, or
refutable, which may succeed or fail, but always diverge when matched against undefined.

Additionally, there are five different types of irrefutable patterns: variables, wildcards, as patterns where the actual pattern is irrefutable, and lazy patterns¹. All other patterns (for instance, [] and Nothing, which are constructors defined by data) are refutable.

(For more information about pattern matching and the (formal and informal) semantics of pattern matching, see section 3.7 of the Haskell 2010 Language Report or section 4 of A Gentle Introduction to Haskell 98.)

There are subtle differences between refutable and irrefutable patterns. As examples, let's analyze two computations taken from the Haskell 2010 Language Report:

> (\ ~(x,y) -> 0) undefined
0

Here, matching the lazy pattern ~(x,y) against undefined succeeds, and binds both x and y to undefined because matching the pattern (x,y) against undefined would diverge. Since binding does not imply evaluation and neither x nor y are used anywhere else, the result of the computation is 0.

What happens if we remove the lazy pattern?

> (\  (x,y) -> 0) undefined
undefined

In this case, matching (x,y) against undefined diverges and, for that reason, the computation diverges, that is, undefined.

> (\  (x:xs) -> x:x:xs) undefined
undefined

Here, the computation diverges because matching (x:xs) against undefined diverges.

What happens if we add a lazy pattern?:

> (\ ~(x:xs) -> x:x:xs) undefined
undefined:undefined:undefined

Matching ~(x:xs) against undefined succeeds. Since matching (x:xs) against undefined would diverge, both x and xs are bound to undefined. Thus, x:x:xs becomes undefined:undefined:undefined, a list which GHCi would start printing and stop after the first undefined, that is, [undefined.

In section 4.2 of A Gentle Introduction to Haskell 98, the authors discuss two slightly different versions of the take function to show how subtle changes in patterns yield different functions.

To wrap up, let's compare two slightly different versions of the drop function:

drop1 :: Int -> [a] -> [a]
drop1 n xs     | n <= 0 = xs
drop1 _ []              = []
drop1 n (x:xs)          = drop1 (n - 1) xs

drop2 :: Int -> [a] -> [a]
drop2 _ []              = []
drop2 n xs     | n <= 0 = xs
drop2 n (x:xs)          = drop2 (n - 1) xs

(drop1 is just drop as defined in GHC.List.)

What happens if the first argument of drop is undefined?

> drop1 undefined []
undefined
> drop2 undefined []
[]

drop2 is more defined with respect to its first argument. In other words, drop1 is strict in its first argument, whereas drop2 is nonstrict².

What happens if the second argument of drop is undefined?

> drop1 0 undefined
undefined
> drop2 0 undefined
undefined

In this case, both functions are equally undefined, that is, both are strict in its second argument.

Even though both functions satisfy the specification of the drop function (they return the suffix of a list after a given number of elements), their meaning is slightly different. As Hudak, Peterson, and Fasel suggest, "it is difficult to say (...) which definition is better. (...) In certain applications, it may make a difference."

This is by no means a thorough examination of pattern matching in Haskell. If anything, these examples show that thinking in terms of the semantics of pattern matching, and, more specifically, in terms of undefined, irrefutable and refutable patterns, and nonstrict and strict functions, is interesting and can really help in better understanding pattern matching and Haskell in general.

Happy pattern matching!

Yes, that's four... Patterns of the form N ... where N is a constructor defined by newtype and the actual pattern is irrefutable are also irrefutable, but patterns of this form are beyond the scope of this blog post. ↩
A function f is strict if f undefined is undefined and nonstrict otherwise. ↩

Pattern Matching: Wot's... Uh the Deal?

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

My Speaker Experience at JSDay Canarias 2023

Celebrating 10 Years of Our Haskell Training Course

Enhancing Functional Programming in Python with Coconut

My Speaker Experience at JSDay Canarias 2023