Skip to the content.

How does Prettyprinter print pretty?

There are many prettyprinter libraries in Haskell, but most are based on Phillip Wadler’s paper A prettier printer.

To learn Haskell libraries I usually peak at the implementation, but Prettyprinter defeated me twice. Partly because printing is rarely the most interesting part of a project. But Wadler’s approach also has some action-at-a-distance which makes it tricky to understand the pieces in isolation. This is a very brief and rough introduction into how Prettyprinter actually works. It is not a guide on how to use it, but after understanding the big ideas it should be easier to read the documentation and source code.

Example:

First a small example to understand what Prettyprinter does. This is a Haskell-style let binding:

pretty (Let binds body)
    = group ("let" <+> align letBinds <> line <> "in" <+> pretty body)
  where
    letBinds = encloseSep open close sep [pretty k <+> "=" <+> pretty v | (k,v) <- lets]
    open = flatAlt "" "{ "
    close = flatAlt "" " }"
    sep = flatAlt "" "; "

Crucially, it can print in two distinct styles:

>>> pretty $ Let [("x", 3), ("y", 5)] (Var "x" * Var "y")

-- if it fits into a single line:
let {x = 3; y = 5} in x * y

-- otherwise:
let x = 3
    y = 5
in x * y

Fancy! But a bit magical. What do the group and align functions do exactly?

The core trick

First let us summarize our goal. We want alternative layouts: The newline version is narrow but long, the flat version is short but wide. The layouting should pick the shortest version which fits some target width.

This means documents are non-deterministic, in the sense that lists are non-deterministic. Can we use lists?

newtype NondetDoc = ND [Doc]
newline = ND [Text " ", Newline]

instance Semigroup NondetDoc where
    (<>) (ND a) (ND b) = ND $ liftA2 (<>) a b

>>> "foo" <> newline <> "bar" <> newline <> "baz"
ND ["foo bar baz", "foo bar\nbaz", "foo\nbar baz", "foo\nbar\nbaz"]

This has two key problems:

The first problem is simple to solve. Using liftA2 multiplies the lists out, generating all combinations in advance. We have all choices on top, and the document content below that. Instead, we interweave choices and document content by embedding them as a new constructor. Same idea as ListT done right!

data Doc = Cat [Doc] -- concatentate documents
         | Text String
         | Alts Doc Doc -- alternative documents
         | Line -- Add a newline
instance Semigroup Doc where
    (<>) a b = Cat [a,b]

Now layouting can perform a greedy search, rather than the depth-first search of the list monad. When layouting Alts a b:

This is linear rather than exponential. First problem solved!

The second problem is a luxury, but a great one to have. We want to synchronize certain choices. For instance, the following layouts should be impossible:

let x = 3
    y = 5 in x * y

let {x = 3; y = 5}
in x * y

Intuitively, we take a sequence of choices like Cat [Alts a b, Alts c d] and turn it into a choice of sequences Alts (Cat [a,c]) (Cat [b,d]). But this is too synchronized. Either the entire output is one line, or nothing is flat.

We can solve this with two ingredients:

data Doc = Cat [Doc] -- concatentate documents
         | Text String
         | Alts Doc Doc -- alternative documents
         | Line -- Add a newline
         | Stop Doc -- Independent group, stop synchronizing

group :: Doc -> Doc
group a = Stop (Alts (shortVersion a) (longVersion a))


longVersion :: Doc -> Doc
longVersion (Stop a) = Stop a
longVersion (Alts long _) = long
longVersion (Cat a) = Cat (map longVersion a)
longVersion a = a

shortVersion :: Doc -> Doc
-- the short version should always be a single line, so keep going
shortVersion (Stop a) = shortVersion a
shortVersion (Alts _ flat) = flat
shortVersion (Cat a) = Cat (map shortVersion a)
shortVersion a = a

Here is an example of how group operates:

newline = Alts (Text " ") Line
instance IsString Doc where
    fromString = Text
    
test = group $ Cat ["foo", newline, "bar", newline, "foobar"]

>>> test
Stop 
  (Alts 
    (Cat ["foo", " ", "bar", " ", "foobar"])
    (Cat ["foo", Line, "bar", Line, "foobar"]))

it is important to note the asymmetry between shortVersion and longVersion. The shortVersion never contains newlines. But in longVersion nested groups are independent.

>>> group (Cat ["baz", newline, test])
Stop
  (Alts 
    (Cat ["baz", " ", shortVersion test])
    (Cat ["foo", Line, test])) -- test could be short or long

Our approach differs slightly from Prettyprinter: Prettyprinter has no Stop constructor. Instead, it has Union and FlatAlt. Union a b is Stop (Alts a b), FlatAlt is Alts a b. Prettyprinter additionally uses a fairly subtle trick: longVersion = id. We know shortVersion will remove all FlatAlt’s. This means all remaining FlatAlt's should be long and we can keep them in the document. The layouting step must work around them, reimplementing our longVersion.

This is clever but it adds an implicit longVersion at the toplevel; without group nothing is flattend. Many operators use group internally, but it’s something to keep in mind.

Indentation

The final piece to the puzzle is indentation. Docs track a current indentation, and each newline adds that many spaces. You can modify the indent with the nest offset innerBlock function. The align function sets the indent to the current column, the hang function to the current column+an offset.

Layouting

Turning a Doc into a something we can consume takes two steps

The first option has other candidates such as layoutSmart. The difference is small: When picking a Union, layoutPretty only checks if the first line of the flat version fits. Meanwhile layoutSmart checks if any future lines are too wide, stopping only when the indentation becomes flat. The latter is a heuristic because most syntax constructs end when indentation becomes flat - presumably to guard against non-linear behaviour.

For consumers we have many options, one for each output type. Thankfully this makes the choice simple. Just search the docs for the type you need. There are some external consumers on hackage, supporting fancy formats such as pandoc or terminals. If you want to dive deep you can add consumer specific annotations to documents, allowing for formatting or even onclick handlers.

The End

As mentioned, this is a pretty rough introduction to Prettyprinter’s internals. We haven’t discussed any of the actually useful combinators. Thankfully, the documentation is solid once you have understood grouping and indentation. Feel free to click view source if the documentation is unclear, most combinators are thin wrappers around the internals we discussed. Good luck with pretty printing!

To prove my point here are the helpers we used in the example:

(<+>) a b = a <> Char ' ' <> b
-- newline, but in `group` try a space instead
line = FlatAlt Line (Char ' ')

encloseSep l r sep ds = l <> mconcat (intersperse sepWithNL ds) <> line' <> r
  where sepWithNL = line' <> sep -- haskell style leading seperators
line' = FlatAlt Line Empty