# Ghosts in the machine

This is the second (and final) part of my examination of the State monad and its variations. If you are unfamiliar with the basic concepts surrounding the State monad, you may wish to read the first part before continuing.

## Backtracking State

One of the features of the `State`

monad worth looking at is how it implements a simple interpreter. Our current interpreter lets us write any sort of procedural code we needed, though it might get rather unwieldy (only one register and one memory location). While interesting, its not especially useful.

Recall my claim that State is the basis of more interesting monads. We are going to look at a modification to the code from `state-m`

that lets us backtrack if the computation fails to the last choice point.

The paragraph above introduces the three new features we need to provide:

- A way to define choice points
- A way to define computation failure
- A mechanism to handle backtracking.

The two former points happily coincide with the interface for `monad plus`

monads. Specifically the function `m-zero`

will represent our failure mode, and `m-plus`

will represent choice points. The last case (backtracking) will be implemented in `m-bind`

and `m-plus`

.

Before we go any further, here is an implementation I have cooked up based on the previously defined `state-m1`

monad.

```
(use 'clojure.contrib.monads)
(defmonad backtrack-state
[m-result (fn [value]
(fn [state]
[value state]))
m-bind (fn [computation func]
(fn [state]
(when-let [[value new-state] (computation state)]
((func value) new-state))))
m-zero (fn [new-state] nil)
m-plus (fn [left right]
(fn [state]
(if-let [result (left state)]
result
(right state))))])
```

Note that `m-plus`

is a function that returns a function with the same type as `m-bind`

and `m-result`

. However, `m-zero`

is strictly a value, which happens to be a function that always returns `nil`

.

`m-zero`

is our first hint at how this `backtrack-state`

is constructed. A failure is `nil`

, a successful computation returns the familiar `[value state]`

pair from the regular `state-m`

.

A small change has been made to `m-bind`

. In particular, where `state-m1`

would just use a let to destructure the result of `(computation state)`

, `backtrack-state`

uses `when-let`

. This form fuses `when`

and `let`

so that when the `let`

s binding-form's source is also the test for `when`

. If the test returns a *truthy* value (in our case not nil) then the body of the form is evaluated and returned. Otherwise `nil`

is returned.

An alternative formulation is presented here for clarity

```
(when (computation state)
(let [[value new-state] (computation state)]
((func value) new-state)))
```

This means that if 'computation' fails the result will be failure without having to evaluate 'func'.

The remaining logic is in `m-plus`

. This function takes two arguments, both potential computations to perform, and returns a state consuming function. `if-let`

is a cousin to `when-let`

, and provides a failure branch rather than just returning nil.

Lastly, the implementations of `get-state`

and `put-state`

are identical to the versions from `state-m1`

To illustrate the expressive power of this new monad we have written, we are going to look at a small library built on top of it. There are two parts to this, first is a basic set of parser tools, and then there is the specific parser we are going to write. At the end we will have a little parser that will take a string representing a number and return a pair of either `:int`

or `:float`

depending if there is a decimal component. If anything else is found, it will fail returning only `nil`

. So, the parser library:

```
(with-monad backtrack-state
(defn run-parser
"run-parser takes a top level parser and a string or seq'able
to run the parser on"
[parser input]
(parser (lazy-seq input)))
;; primatives
(defn get-one []
"Gets the next item from the input and returns it"
(domonad [input (get-state)
_ (put-state (rest input))]
(first input)))
(def eof
(domonad [remaining (get-state)
:when (= (count remaining) 0)]
nil))
;; simple parsers
;;
;; These parser's are built on top of the primatives above
(defn one-satisfying [p]
"This is the most basic matching parser. It tests the next
item in the sequence against the predicate provided. If
true, then it is returned, otherwise fails."
(domonad [one (get-one)
:when (p one)]
one))
(defn one-of [any-of]
"Matches any one item in the provided collection or string"
(one-satisfying (set any-of)))
(defn not-one-of [none-of]
"Matches any one item not in the provided collection or string"
(let [none-of (set none-of)]
(one-satisfying #(not (none-of %)))))
;; combinators
;;
;; These parsers combine other parsers into new parsers
(defn choice [& parsers]
"Choice takes one or more parsers (in order) and returns a new
parser that tries each one in turn until one matches. If
all fail, then choice fails"
(reduce m-plus parsers))
(defn many [p]
"Many matches the same parser 0 or more times until it it fails,
then it returns a sequence of the match results"
(choice (domonad [r p
rs (many p)]
(cons r rs))
(m-result '())))
(defn many1 [p]
"Many1 is like many, but must match at least 1 item"
(domonad [r p
rs (many p)]
(cons r rs)))
(defn optional
"This parser matches 0 or 1 of some parser. Optionally may take
a default result"
([p] (optional p nil))
([p default] (choice p (m-result default)))))
```

If you have never used a parser combinator library before, this might seem a bit strange. Possibly the most surprising thing initially is that there is no lexical analysis step. Next, you have infinite lookahead. If parsing fails, we just fail back too the last `m-plus`

point. Lastly, we build parsers by combining simpler parsers programmatically. The toolkit provides the simplest parsers and the basic parser combinators (parsers that are built by combining other parsers).

To put two parsers into sequence, we can just use `m-bind`

. This means we can leverage monad comprehensions (i.e. the `domonad`

form) to specify that one parser.

Looking at the functions in this little library you could come to the conclusion that its effectively a regular expression engine written verbosely. Certainly we have all the same primitives (one-of, many, many1, optional). The difference is that we have the full power of the host language (in this case Clojure) to call one. We'll come back to this later with a small hypothetical parser.

This library also has one very nice feature: it is completely data agnostic, nowhere in it does it care if its processing a string of characters, a vector of ints or anything else. As long as it can be converted into a sequence, and the items in that sequence are comparable it's good.

One last note about this particular implementation. Besides being tied to our `backtracking-state`

monad it also suffers from a very flawed implementation of `many`

. In particular, our implementation is elegant but relies on recursion. Remember that `domonad`

will transform our code into a heavily nested set of function calls, and you can see that this fundamental combinator would quickly cause our whole parser to blow stack.

So with this basic set of tools in hand, lets look at the specific parser implementation:

```
(with-monad backtrack-state
(def parse-digit (one-of "0123456789"))
(def parse-digits (many1 parse-digit))
(def parse-sign (one-of "+-"))
(def parse-dot (one-of "."))
(def parse-integer
(domonad [digits parse-digits]
[:int digits]))
(def parse-float
(domonad [whole-digits parse-digits
_ parse-dot
decimal-digits parse-digits]
[:float (concat whole-digits "." decimal-digits)]))
(def parse-number
(domonad [sign (optional parse-sign "+")
[type digits] (choice (domonad [int parse-integer
_ eof]
int)
(domonad [float parse-float
_ eof]
float))]
[type (apply str (cons sign digits))])))
```

This parser is very simple. It only has 2 backtracking points; the `optional`

for `parse-sign`

and the `choice`

for digits. Both of these backtracking points only use `m-plus`

indirectly, and you might have noticed that `get-state`

and `put-state`

are never needed outside the primitive `get-one`

.

To run this parser, we'll pass it and an input into `run-parser`

:

```
(map #(run-parser parse-number %) ["123"
""
"123.123"
"0"
"0.0"
"-1"
"1 2"])
;; => ([[:int "+123"] ()] nil [[:float "+123.123"] ()] [[:int "+0"] ()] [[:float "+0.0"] ()] [[:int "-1"] ()] nil)
```

You can see that each string we have parsed has returned a pair or `nil`

. `nil`

indicates a failure (no match), the pair is the complete parsed result containing the result of the parser and the remaining tokens to parse. Because we specified an `eof`

in our `parse-number`

parser we never have any remaining input.

Just as an illustration of how this style of parser is so expressive, here is a hypothetical that parses an xml element:

```
(with-monad backtracking-state
(defn open-tag []
(domonad [_ (one-of "<")
name (elem-name)
attrs (attributes)
_ (one-of ">")]
[name attrs]))
(defn xml-element
[]
(domonad
[[name attrs] (open-tag)
children (many xml-node)
_ (close-tag name)]
[:element name attrs children])))
```

Obviously, this is a very simplified ideal of xml, but you can see how leveraging the host language allows us to provide recursive parsers (`xml-node`

will eventually try `xml-element`

again), and pass information through (e.g. name from `open-tag`

is passed to `close-tag`

).

## Nondeterministic State

After `backtrack-state`

the step to `nondeterministic-state`

is very small. Conceptually, we want to evaluate the computation for ever possible choice we encounter. Once again we will use `monad plus`

to define choice points and failure. Unlike `backtrack-state`

we will need to change the type of data returned by the state machine from a single value (a pair or nil) to a collection of value-state pairs.

The choice of collection proves interesting. There are two obvious candidates: `vector`

and `set`

. The difference is small but significant. A vector will result in *every* result including any duplicates. This will be useful if you are searching for the total possible solutions. However if you only care for every *unique* result then the set makes more sense as it will reduce the total computation in the case of repetition. We will come back to this decision in the last section (Transformations).

Heres the code for a set based `nondetermistic-state`

monad.

```
(use 'clojure.contrib.monads)
(use 'clojure.set)
(defmonad nondeterministic-state
[m-result (fn [value]
(fn [state]
#{[value state]}))
m-bind (fn [computation func]
(fn [state]
(let [results (computation state)
result-sets (map (fn [[value new-state]]
((func value) new-state))
results)]
(apply union result-sets))))
m-zero (fn [new-state] #{})
m-plus (fn [left right]
(fn [state]
(union (left state) (right state))))])
(defn nd-get-state
[]
(fn [state] #{[state state]}))
(defn nd-put-state
[state]
(fn [_] #{[nil state]}))
```

The changes here are all fairly straight forward. Places we added logic for backtracking now find the union of sets, and `m-zero`

returns a empty set rather than `nil`

. The final change is that we have to use a new version of get-state and put-state that handles the set wrapper.

A simple example of this monad in action is the quadratic formula; square root fails for negative numbers, and returns both positive and negative values for any positive number. The follow implementation is a little rough but demonstrates the concept:

```
(defn sqrt [v]
(with-monad nondeterministic-state
(if (> v 0)
(m-plus (m-result (Math/sqrt v)) (m-result (* -1 (Math/sqrt v))))
m-zero)))
(defn quadratic
[a b c]
((domonad nondeterministic-state
[roots (sqrt (- (* b b) (* 4 a c)))]
(/ (+ (- b) roots) (* 2 a))) nil))
```

Here it is in action with some sample inputs:

```
(quadratic 1 20 4)
; => #{[-0.20204102886728847 nil] [-19.79795897113271 nil]}
(quadratic 1 2 4)
; => #{}
```

## Transformations

We have looked at three variations of the State concept. Recall that I mentioned there is a relationship between the simple monads: Identity, Maybe and Set. The type signatures for the Haskell equivalents make this particularly clear

```
newtype State s a = State { runState :: (s -> (a,s)) }
newtype BTState s a = BTState { runBRState :: (s -> Maybe (a,s)) }
newtype NDState s a = NDState { runNDState :: (s -> [(a,s)]) }
-- Note the use of list/seq instead of set, but provides similar behaviour
```

Since we started with `state-m1`

as our base, it is fairly obvious that this is going to relate to the Identity monad. `backtracking-state`

relates to the Maybe monad, and `nondeterminstic-state`

relates to the Set monad, alternatively it is possible to implement a `nondetermistic-state`

on top of a Sequence / List monad.

This idea that one monad can wrap another is interesting. Without going into too much detail, it is possible to write a function that takes a monad type as its argument (the inner monad), and wraps it up in another monad that has been written in terms of the standard monadic operations.

This idea is know as a *monad transformer*. The transformer is the function that transforms (or decorates) the inner monad. The Clojure monad library provides a macro for defining transforms called (unsurprisingly) `monad-transformer`

. We can shortcut all the hard work above with a couple of one liners!

```
(def backtrack-m (state-t maybe-m)) ; you might also see this called parser-m
(def nondeterministic-m (state-t set-m))
```

You could also choose to use `sequence-m`

in place of `set-m`

for the `nondeterministic-m`

. It is important to realise that the order of the wrapping makes a significant difference with regard to the semantics of the generated Monad.

The specific implementation details of creating a transformer are left to the reader to explore if they are interested. Jim Duey's second monad's tutorial covers transformers in more detail. Personally I think this is one area where Clojure's monad support is superior to Haskell's as we don't have to appease such a strict type system. This also makes Haskell based Monad Transformer tutorials a lot more complex.

## Wrapping up

We have now looked at three related monads in some detail, seen how they can be used to write some amazingly expressive code, and encountered the tools to make these sorts of operations much easier to wield.

The parser library provided here has an elegant interface but has a very naïve implementation. Recursion will cause problems with the implementation of `many`

, and the astute reader will have noticed that the design causes it to hold onto the head of the input sequence. To convert it from a generic backtracking monad to a specialized parser monad is an interesting project well worth exploring. Another exercise would be implementing a non-determistic monad without a state component.

Finally, a big thanks to Matt and Steven for their help proof reading these posts.

Discuss this post on the Hacker News thread

#### See Also

*Updated 2011-02-08:* I have written an introduction to `fnparse`

that takes the ideas presented here in a more practical direction.