First published: February 1, 2017

by Mark Karpov

tags: haskell

Tested with:

- Resolver: LTS Haskell 7.16 (ghc-8.0.1)

This tutorial will get you up to speed with GHC generics quickly. It should be noted that generics is not something academic and useless, quite the contrary, it's a very pragmatic way to reduce the amount of boilerplate (and associated with it errors) in your functional code with minimal mental overhead. In fact, by the time you get to the end of the tutorial, I hope you'll agree that generics are easy to use and easy “to get” as well. The tutorial is for intermediate-level Haskellers.

Let's start from the beginning. What are generics? **Generics**, like in other languages, **are a way to use the same code to manipulate different data types**. We should note that in this regard the technique is very close to the much more common notion of polymorphism, which in Haskell comes in two flavors:

**Parametric polymorphism**, when we have type variables in functions/data types. This allows the same function, such as`const`

to work with any types of arguments, as long as the more general types from a function's signature can be made equal to concrete types we want to work with by instantiating/fixing the type variables.**Ad-hoc polymorphism**, which allows us to perform a computation abstracted over instances of one or more type classes. We request that a type has some properties of interest and then describe the computation in terms of these properties. The code is then applicable to any data type that has the properties.

How are generics different? Generics allow us to describe transformation that works in terms of not concrete types, but general combinators that describe shape and meta information about a data type. This way, we can declare how to perform a computation on almost any data type!

Haskell's features that open the possibility for such a thing as generics are type classes and ad-hoc polymorphism. The ability to describe a data type in terms of a set of combinators is our property, encapsulated by the `Generic`

type class. A data type that is an instance of the type class can be processed by functions that work with a representation of data type, not the data type itself. These functions, however, are by definition polymorphic and also usually hidden behind a type class interface, so the whole thing usually takes the form of automatic deriving of a type class instance.

The process can be described like this:

A given data type gets

`Generic`

instance automatically, as it can be generated by the compiler with the help of the`DeriveGeneric`

language extension.There is a type class instance defined for instances of

`Generic`

(or`Generic1`

). Since our data type is now an instance of those type classes, that generics-based instance starts to work for us and we magically get all the useful methods of that type class.

If you know about `Data`

and `Typeable`

type classes, then you probably know that it's possible to do something similar using information which methods of those type classes provide. `Data`

and `Typeable`

are beyond the scope of this tutorial, but you read about them in this blog post by Chris Done if you're interested.

Step 2 is not as simple as it sounds though, at least not until we examine the details of this business more closely.

What could they look like? Well, if I showed you the data types as they are right now, you would probably run away in fear, cursing the tutorial and Haskell's generics. I have a better idea. Let's start with the simplest thing possible and then iterate asking ourselves how to work around some more interesting use cases that we might need to support. This will force us to make the data types more hairy but also more powerful step by step, till we arrive at the definitions that are actually used.

For better or worse, algebraic data types lock us into a view on the world that consists of sums and products (another Haskell's feature that makes the existence of generics framework possible; generics do not work with GADTs for example). So we need to be able to represent the following:

Data types without constructors at all: uninhabited types like

`Void`

. This can be described as`data V1`

(note: it has no constructors).Constructors without arguments. Let this be something like this:

`data U1 = U1`

.Sums:

`data (f :+: g) = L1 f | R1 g`

. If we have a sum data type with two alternatives we can represent other sum data types with any number of alternatives using nesting.Products:

`data (f :*: g) = f :*: g`

.

Not bad for a start. Let's try use this representation to derive a `Functor`

instance. Deriving a such an instance means that we should provide the `fmap`

function which looks like this:

`fmap :: (a -> b) -> Rep f -> Rep f`

Here, `Rep f`

maps to the type of generic representation of `f :: * -> *`

. There is a problem though. You see, the functor's inner type that goes from `a`

to `b`

in this example has no place to live in the representation! Our current approach will work only for type classes that work with things of kind `*`

, such as `Show`

, but if we want to use this system to derive a `Functor`

instance, we need to allow it to work with kind `* -> *`

.

We want to be able to derive `Functor`

(and similar) instances generically, so we have to fix things up. The solution is to add one more type parameter `p`

to all our combinators:

```
data V1 p
data U1 p = U1
data (f :+: g) p = L1 (f p) | R1 (g p)
data (f :*: g) p = (f p) :*: (g p)
```

This way `fmap`

would be:

```
fmap :: (a -> b) -> Rep f a -> Rep f b
-- p = a p = b
```

But what happens to the type classes that work with `*`

kinds? Here are our choices:

Have a separate set of combinator types for each case (

`*`

and`* -> *`

kinds).Use the most general form (with

`p`

), but for`*`

kinds just treat the extra`p`

parameter as a dummy type index that has no meaning.

The authors of the generic extension went with the second option, and I can't blame them. We'll see that there are a lot of wrappers already and we really should try to keep their number from exploding any further (this may make more sense on the second read).

Let's try to map from a data type to its representation and see if we're still missing something:

```
data Maybe a = Nothing | Just a
-- Interestingly, we could build a representation that works on ‘Maybe a’,
-- that is, a thing of kind *, if we wanted to derive something like ‘Show’.
-- At the same time if we wanted to derive ‘Functor’, we would work with
-- ‘Maybe’ of kind * -> *. This means that there are actually two different
-- possible representations depending on our aim. This is addressed with two
-- different generics type classes, as we will see later.
-- For kind *, things like ‘Show’:
-- type: (U1 :+: ?) p
```

Oh, how to represent `Just a`

? We need to way to allow it have an argument! Let's add the following:

`data Rec0 c p = Rec0 { unRec0 :: c }`

`Rec`

part in the type's name hints that this thing may be possibly recursive.

But I'm lying to you, due to some historical stuff, it's defined a bit differently:

```
type Rec0 = K1 R
newtype K1 i c p = K1 { unK1 :: c } -- c is the value, ‘a’ in ‘Maybe a’
-- ^ ^
-- | |
-- | +-------- dummy p
-- |
-- type-level tag, R or P
```

You see this `R`

type-level tag? There used to be another one, `P`

and this `type Par0 = K1 P`

type synonym, but forget it, it's deprecated. Bottom line: `Rec0`

is used for data constuctor arguments (fields), that **are not** `p`

parameter.

With `Rec0`

, we can finally build representation of `Maybe a`

:

```
-- This is the type of our representation: (U1 :+: Rec0 a) p
-- Examples of values for ‘Maybe Int’:
-- Nothing => L1 U1
-- Just 5 => R1 (K1 5) -- remember where K1 comes from?
-- ^
-- |
-- +--- L1 and R1 are from our representation of sum types
```

Let's derive a different representation that works with `* -> *`

kinds:

`-- Type of our representation: (U1 :+: ?) p`

We need a way to tell if we have an argument of `p`

type (like `a`

in `Functor f => f a`

) or some other type, if we're to write a generic `Functor`

function. For this reason the generics extension uses `Par1 p`

:

`newtype Par1 p = Par1 { unPar1 :: p } -- “par” stands for “parameter”`

`Par1`

is used to mark occurrences of `p`

. Our representation thus becomes:

```
-- The type of our representation: (U1 :+: Par1) p
-- Examples of values for ‘Maybe Int’:
-- Nothing => L1 U1 -- the same
-- Just 5 => R1 (Par1 5)
```

Almost there. The final example is that for lists. Given the standard definition of linked list:

`data List a = Nil | Cons a (List a)`

How do we build its representation (for kind `* -> *`

)? The tricky part is, of course, `List a`

, which is a recursive occurrence of entire functorish part with parameter inside it. If we mark occurrences of parameter by wrapping them with `Par1`

, then why not mark this thing with something too? `Rec1`

shall it be.

```
-- The type of representation: (U1 :+: (Par1 :*: Rec1 List)) p
-- Examples of values for ‘List Int’:
-- Nil => L1 U1
-- Cons 5 Nil => R1 (Par1 5 :*: Rec1 Nil)
-- Cons 5 (Cons 4 Nil) => R1 (Par1 5 :*: Rec1 (Cons 4 Nil))
```

If we had just arguments of a data constructor that are not related to parameter `p`

, plain `Rec0`

(`K1`

) would be used for both first and second arguments of `Cons`

.

And that's it, we have everything we need.

`Generic`

and `Generic1`

type classesType classes that map types to their representations are called `Generic`

(for type classes that work with `*`

kinds) and `Generic1`

(for type classes that work with `* -> *`

kinds). They live in the module called `GHC.Generics`

together with the types used to build the data type representations we've just discussed.

Let's see what these type classes have:

```
class Generic a where
type Rep a :: * -> *
from :: a -> Rep a p
to :: Rep a p -> x
class Generic1 f where
type Rep1 f :: * -> *
from1 :: f p -> Rep1 f p
to1 :: Rep1 f p -> f p
```

`from`

and `from1`

map values of data types to their generic representations. `Rep`

and `Rep1`

are associated type functions (the feature is a part of the `TypeFamilies`

GHC extension) that take the type of data we want to manipulate and yield the type of its representation. Of course, if we want to derive `Functor`

instances, we need a way to go back from representation to actual value of target data type. This is done via `to`

and `to1`

. The good thing about this, of course, that GHC can derive `Generic`

and `Generic1`

for us automatically with `DeriveGeneric`

language extension enabled.

Now let's open GHCi and try to infer `Rep`

of some type:

```
λ> :t (undefined :: Rep (List a) p)
(undefined :: Rep (List a) p)
:: D1
('MetaData "List" "GenericsTutorial" "main" 'False)
(C1 ('MetaCons "Nil" 'PrefixI 'False) U1
:+: C1
('MetaCons "Cons" 'PrefixI 'False)
(S1
('MetaSel
'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
(Rec0 a)
:*: S1
('MetaSel
'Nothing 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy)
(Rec0 (List a))))
p
```

Ah OK, there is just a little bit more to this representation thing…

A representation also has associated metadata. It may look messy and difficult to read, but I'll explain the logic behind it in a moment, it should help to clarify why it's done like this.

First of all, metadata should not get in our way if we don't want to deal with it. Thus all metadata is attached using the same simple wrapper:

```
newtype M1 i c f p = M1 { unM1 :: f p } -- ‘f p’ is what lives inside, U1 for example
-- ^ ^
-- | |
-- | +---- compiler-generated data type that allows us to get meta information
-- |
-- type-level tag, see below
```

The `i`

type-level tag can be one of the three:

`D`

for data type metadata`type D1 = M1 D`

`C`

for constructor metadata`type C1 = M1 C`

`S`

for record selector metadata`type S1 = M1 S`

It should be understandable why metadata is attached this way. If we want to, we can ignore metadata easily, since it's just a wrapper:

`f (M1 x) = f x`

If we want to look at a particular type of metadata we can specify the `i`

type-level tag, or we can leave it unspecified to deal with all metadata at once.

The `c`

type is auto-generated by the compiler and encodes metadata on type level. Why not store the metadata on value level, in `M1`

constructor? Well, if it were there, we would have to provide it if we wanted to generate some values, and providing metadata for already existing data type is certainly something that only the compiler can do properly. With the current approach, a given data type determines the type of its representation, including its metadata, so we don't have to bother.

The last thing to mention about metadata is which wrappers are generated for you. They are very predictable:

The entire representation is wrapped in

`D1`

, which provides datatype-level information (datatype name, module name, and whether it's a`newtype`

).Every constructor is wrapped in

`C1`

, which provides information about constructor (such as constructor name, fixity, and whether it's a record).Every argument of constructor is wrapped with

`S1`

(even if it's not actually a record selector), which tells us the selector name.

Look at the Haddock, to find out names of functions that help extract the metadata. Using them is straightforward: feed the wrapped data to functions like `datatypeName`

and get the information.

`Functor`

derivingOf course, after talking so much about `Functor`

instances and adding the clumsy `p`

parameter to support them, we absolutely must to derive `Functor`

instance for generics! Anyway, `Functor`

instance for generics is already defined in `GHC.Generics`

, so instead of re-implementing it, let's just go through the code:

```
-- If we have a parameter ‘p’, we just map over it, as expected:
instance Functor Par1 where
fmap f (Par1 p) = Par1 (f p)
-- The same with ‘Rec1’ (just use ‘fmap’ because the inner part is a ‘Functor’):
instance Functor f => Functor (Rec1 f) where
fmap f (Rec1 a) = Rec 1 (fmap f a)
-- A constructor without fields only can be returned untouched:
instance Functor U1 where
fmap _ U1 = U1
-- A field that is not ‘p’ parameter should not change:
instance Functor (K1 i c) where
fmap _ (K1 a) = a
-- Metadata has no effect, just unwrap it and continue with the inner value,
-- if the inner value is an instance of ‘Functor’:
instance Functor f => Functor (M1 i c f) where
fmap f (M1 a) = M1 (fmap f a)
-- When we have a sum, we should try to map what we get, provided that it
-- contains something that has ‘Functor’ instance:
instance (Functor l, Functor r) => Functor (l :+: r) where
fmap f (L1 a) = L1 (fmap f a)
fmap f (R1 a) = R1 (fmap f a)
-- The same for products:
instance (Functor a, Functor b) => Functor (a :*: b) where
fmap f (a :*: b) = fmap f a :*: fmap f b
```

I think now we're dangerous enough to implement a simple (and pretty useless) type class that will count constructor fields of a given value. (Yeah, I could go with the venerable `Encode`

example here, but I like to be “original”, so you get a dumber example instead :-D)

The type class looks like this:

```
class CountFields a where
-- | Return number of constuctor fields for a value.
countFields :: a -> Natural
```

We will start by implementing `countFields`

method that works on representations:

```
instance CountFields (V1 p) where
countFields _ = 0
instance CountFields (U1 p) where
countFields _ = 0
instance CountFields (K1 i c p) where
countFields _ = 1
instance CountFields (f p) => CountFields (M1 i c f p) where
countFields (M1 x) = countFields x
instance (CountFields (a p), CountFields (b p)) => CountFields ((a :+: b) p) where
countFields (L1 x) = countFields x
countFields (R1 x) = countFields x
instance (CountFields (a p), CountFields (b p)) => CountFields ((a :*: b) p) where
countFields (a :*: b) = countFields a + countFields b
```

Let's write a single function called, say, `defaultCountFields`

that does the whole thing:

```
defaultCountFields :: (Generic a, CountFields (Rep a)) => a -> Natural
defaultCountFields = countFields . from
```

But here is the catch, the code above does not compile. `CountFields`

has the kind `CountFields :: * -> Constraint`

, but we give it `Rep a`

of kind `* -> *`

. It's not going to take our abuse like this!

The solution, typical with this setting, is to have a helper class that works with things of `* -> *`

kind (this also removes the `p`

parameters from signatures):

```
class CountFields1 f where
countFields1 :: f p -> Natural
defaultCountFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
defaultCountFields = countFields1 . from
instance CountFields1 V1 where
countFields1 _ = 0
instance CountFields1 U1 where
countFields1 _ = 0
instance CountFields1 (K1 i c) where
countFields1 _ = 1
instance CountFields1 f => CountFields1 (M1 i c f) where
countFields1 (M1 x) = countFields1 x
instance (CountFields1 a, CountFields1 b) => CountFields1 (a :+: b) where
countFields1 (L1 x) = countFields1 x
countFields1 (R1 x) = countFields1 x
instance (CountFields1 a, CountFields1 b) => CountFields1 (a :*: b) where
countFields1 (a :*: b) = countFields1 a + countFields1 b
```

You could notice that some data types like `Par0`

, `Rec1`

didn't get their definitions. This is OK because we work with `Generic`

, not `Generic1`

here, but omitting definitions for some data types that generic representations are built from is normal. As `GHC.Generics`

docs say:

If no

`:+:`

instance is given, the function may still work for empty datatypes or datatypes that have a single constructor, but will fail on datatypes with more than one constructor.If no

`:*:`

instance is given, the function may still work for datatypes where each constructor has just zero or one field, in particular for enumeration types.If no

`K1`

instance is given, the function may still work for enumeration types, where no constructor has any fields.If no

`V1`

instance is given, the function may still work for any datatype that is not empty.If no

`U1`

instance is given, the function may still work for any datatype where each constructor has at least one field.

An `M1`

instance is always required (but it can just ignore the meta-information).

Having dealt with generic implementation of the functionality of interest, let's put it all together and use a special GHC extension to allow the user derive type classes without knowing anything about generics.

For a generic implementation to work without user's definition we need to provide it as the default definition. As you have already seen, a generic implementation often involves a `Generic`

constraint and it would be ugly to add it to every type class (on the left hand side of `=>`

) just to make deriving easier. The `default`

keyword, enabled by the `DefaultSignatures`

language extension, allows us to give a different type signature for default implementation of a type class:

```
class Functor f where
fmap :: (a -> b) -> f a -> f b
default fmap :: (Generic1 f, Functor (Rep1 f)) => (a -> b) -> f a -> f b
fmap = to1 . fmap . from1
class CountFields a where
countFields :: a -> Natural
default countFields :: (Generic a, CountFields1 (Rep a)) => a -> Natural
countFields = defaultCountFields
```

This way we can have our cake and eat it too: deriving is easy and otherwise no ugly details are visible!

Generics are a powerful means of automating error-prone and boring defining of type class instances by hand. However, this feature is helpful beyond just that use-case, as with a bit of creativity it allows us to reason about data types generically (for example, we could calculate the number of constructors at compile time), and generate values in a type safe way (useful, for example, in combination with Quick Check's value generation). Finally, there are quite a few very interesting packages that complement or build on top of GHC generics. Once you feel comfortable with vanilla generics, libraries like `generic-sop`

may be of interest.

A set of recommended resources on the topic:

- A generic deriving Mechanism for Haskell (paper)
- Documentation for
`GHC.Generics`

on Hackage - The page on Haskell Wiki
- The
`generics-sop`

package on Hackage - Typeable and Data in Haskell — the blog by Chris Done contains some straightforward examples of generic programming using the
`Typeable`

and`Data`

type class

Thanks for reading this tutorial! If you have any feedback, please join the discussion below, or open issues and pull requests on GitHub.