Laziness

Most mainstream functional languages employ an eager (a.k.a. strict) evaluation strategy, where an expression is evaluated entirely even if its resulting value is not needed or only parts of it are needed. As we will see, there are sometimes advantages in lazily evaluating certain expressions. There are two important aspects of lazy evaluation:

1. suspending (a.k.a delaying) a computation until its result is actually needed (a.k.a demanded or forced); and

2. memoizing (a.k.a caching) the result of a suspended computation in case its value is demanded again.

Many eager languages, including Elm, offer additional language constructs for selectively introducing laziness. We will work through two example encodings — natural numbers and streams — that motivate and illustrate lazy evaluation.

Natural Numbers

We will work through a few encodings of natural numbers and define some simple operations on them.

First Version — `Nat.elm`

We start by inductively defining `Nat`ural numbers to be either `Z`ero or the `S`uccessor of some other `Nat`ural number.

``type Nat = Z | S Nat``

Next, let's define functions `toInt` and `fromInt` to convert `Nat`ural numbers to and from `Int`egers.

``````fromInt : Int -> Nat
fromInt n =
if n <= 0
then Z
else S (fromInt (n-1))``````

If we take `fromInt` for a spin, we see that it busts the stack rather quickly.

``````> fromInt 0
Z : Nat.Nat

> fromInt 1
S Z : Nat.Nat

> fromInt 10
S (S (S (S (S (S (S (S (S (S Z))))))))) : Nat.Nat

> fromInt 10000
RangeError: Maximum call stack size exceeded``````

Ah, right, we should define it tail-recursively so that it runs in constant stack space.

``````fromInt n =
let foo acc n =
if n <= 0
then acc
else foo (S acc) (n-1)
in foo Z n``````

That should do the trick...

``````> fromInt 10000
RangeError: Maximum call stack size exceeded``````

Or not. Okay, it's time for a little more investigative journalism. We could fire up Elm Reactor to start debugging. Or we can be lazy (pun intended) and continue to poke around at the REPL.

``````> let _ = fromInt 10000 in ()
() : ()``````

That's interesting. The call to `fromInt` was not the problem. So the act of printing the resulting `Nat` causes the stack overflow? Let's write our own tail-recursive printing function to test this hypothesis.

``````strNat : Nat -> String
strNat n =
let foo acc n = case n of
Z    -> acc
S n' -> foo ("S" ++ acc) n'
in foo "Z" n``````

Sure enough, that does the trick... until we run out of heap space, that is.

``````> fromInt 10000 |> strNat
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String

> fromInt 10000000 |> strNat
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String

> fromInt 100000000 |> strNat
FATAL ERROR: JS Allocation failed - process out of memory``````

Okay, now that we have sorted out our call stack and heap space concerns, let's return to the task of programming with `Nat`s. First, a bit of refactoring:

``````foldl : (b -> b) -> b -> Nat -> b
foldl f acc n =
case n of
Z    -> acc
S n' -> foldl f (f acc) n'

strNat = foldl ((++) "S") "Z"``````

And a function to convert `Nat`s to `Int`s:

``````toInt : Nat -> Int
toInt = foldl ((+) 1) 0``````

We can add two `Nat`s together by peeling `S`uccessor labels off of `y` one at a time and wrapping them around `x`.

``````plus : Nat -> Nat -> Nat
plus x y =
let foo acc n = case n of
Z    -> acc
S n' -> foo (S acc) n'
in foo x y``````

Or:

``plus x y = foldl S y x``

Or, better yet:

``plus = foldl S``

This `plus` function encodes the usual notion of addition for our `Nat` type.

``````> plus (fromInt 0) (fromInt 0) |> strNat
"Z" : String

> plus (fromInt 0) (fromInt 2) |> strNat
"SSZ" : String

> plus (fromInt 10) (fromInt 2) |> strNat
"SSSSSSSSSSSSZ" : String

> plus (fromInt 10) (fromInt 2) |> toInt
12 : Int``````

We can define `eq`uality for `Nat`s by peeling off one data constructor at a time.

``````eqNat : Nat -> Nat -> Bool
eqNat x y =
let foo x y = case (x, y) of
(Z, Z)       -> True
(S x', S y') -> foo x' y'
_            -> False
in foo x y``````

This seems to work just fine...

``````> eqNat (fromInt 0) (fromInt 0)
True : Bool

> eqNat (fromInt 0) (fromInt 2)
False : Bool

> eqNat (fromInt 10) (fromInt 2)
False : Bool``````

... but it is really slow for some comparisons that seem like they should be easy to decide quickly.

``````> eqNat (fromInt 10000) (fromInt 10000000)
False : Bool

> eqNat (fromInt 0) (fromInt 10000000)
False : Bool``````

The problem is that, under eager evaluation, both `Nat`ural numbers are evaluated completely before calling `eqNat`, which then very quickly decides the last two disequalities.

Delaying Evaluation with Thunks — `ThunkNat.elm`

A common approach to delaying the evaluation of an expression `e` of type `a` in an eager language is to define a function `\() -> e`, called a thunk, that waits for a dummy argument before evaluating the expression.

We will port the implementations above in order to delay computing the representations of natural numbers. In our new representation of `Nat`s, a `S`uccessor value stores the delayed computation of the `Nat` that it succeeds. The `force` function is used to evaluate a suspended computation.

``````type Nat = Z | S (Thunk Nat)

type alias Thunk a = () -> a

force : Thunk a -> a
force thunk = thunk ()``````

Note that implementing a function like the following is not a good idea, because a call to `delay` will force its argument to be evaluated!

``````delay : a -> Thunk a
delay e = \() -> e``````

To implement `fromInt`, we no longer need to implement a (tail-recursive) helper function because there are no direct recursive calls; instead, the latter case immediately returns a `S`uccessor value (which may some time later lead to a call to `fromInt`).

``````fromInt n =
if n <= 0
then Z
else S (\_ -> fromInt (n-1))``````

Notice that our new representation of non-`Z`ero numbers is quite different from before:

``````> import Nat as N
> import ThunkNat exposing (..)

> N.fromInt 10
S (S (S (S (S (S (S (S (S (S Z))))))))) : Nat.Nat

> fromInt 10
S <function> : ThunkNat.Nat``````

Unlike `fromInt`, `toInt` does need to make recursive calls immediately, because the resulting type (`Int`) does not have the notion of delayed computation built in to its representation. Therefore, we will want to employ the tail-recursive helper strategy.

``````toInt n =
let foo acc n = case n of
Z    -> acc
S n' -> foo (1 + acc) (force n')
in foo 0 n``````

As before, `fromInt` and `toInt` are inverses:

``````> fromInt 100000000 |> toInt
100000000 : Int``````

Notice how `toInt` uses `force` to evaluate all of the nested suspensions that are stored within a `Nat`. A function like this is called monolithic, whereas a function like `fromInt` is called incremental because it does not trigger the evaluation of all nested suspensions.

Another example of a monolithic function is `strNat`.

``````strNat n =
let foo acc n = case n of
Z    -> acc
S n' -> foo ("S" ++ acc) (force n')
in foo "Z" n``````

However, this function is no longer needed for its original purpose above, because printing the representation of a `S`uccessor value is now very quick.

``````> fromInt 10000
S <function> : ThunkNat.Nat

> fromInt 10000 |> strNat
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String``````

We can now return to our motivation for delaying the evaluation of `Nat`s.

``````eqNat x y =
let foo x y = case (x, y) of
(Z, Z)       -> True
(S x', S y') -> foo (force x') (force y')
(_, _)       -> False
in foo x y``````

When `x` and `y` represent the same number, `eqNat` evaluates all of the nested suspensions in both `x` and `y`. Otherwise, it evaluates only enough of their delayed representations in order to demonstrate a difference in corresponding data constructors. As a result, all of the following comparisons run quickly, unlike with our original `Nat.elm` implementation.

``````> fromInt 0 `eqNat` fromInt 0
True : Bool

> fromInt 0 `eqNat` fromInt 2
False : Bool

> fromInt 10 `eqNat` fromInt 2
False : Bool

> fromInt 10000 `eqNat` fromInt 10000000
False : Bool

> fromInt 0 `eqNat` fromInt 10000000
False : Bool``````

We finish porting the original module with the following incremental implementation of `plus`.

``````plus x y = case y of
Z    -> x
S y' -> S (\_ -> plus x (force y'))``````

As a result, the following comparison evaluates quickly.

``````> fromInt 0 `eqNat` (fromInt 10000000 `plus` fromInt 10000000)
False : Bool``````

Digression

What happens if we ask Elm to compare `Nat`s using built-in equality?

``````> fromInt 0 == fromInt 0
True : Bool

> fromInt 0 == fromInt 1
False : Bool

> fromInt 1 == fromInt 2
Error: Equality error: general function equality is undecidable,
and therefore, unsupported``````

Elm throws a run-time error when trying to compare two different function values. Fair enough. Notice that "physical equality" between function values is supported, however.

``````> fromInt 1 == fromInt 1
Error: Equality error: general function equality is undecidable,
and therefore, unsupported

> let foo = fromInt 1 in foo == foo
True : Bool``````

Memoizing Thunks — `LazyNat.elm`

Defining suspensions and incremental functions can be really valuable techniques, but there's no free lunch. The representation of a thunk is a closure, which is a function to evaluate along with bindings for all free variables referred to by the function. Delaying computations willy nilly, then, can lead to a huge number of these closures building up. So one should restrict the use of thunks to situations where the benefits of being able to define incremental functions outweights the overheads associated with delayed computations.

Another concern is that the same delayed computation may be demanded more than once. If the computation takes significant resources to evaluate, then redoing the work every time is undesirable. In a pure language with only strict evaluation, there is no recourse: every time a thunk is forced, it must be re-evaluated. As a result, many strict languages offer special-purpose constructs for manipulating delayed computations with the guarantee that the result of forcing a delayed computation is cached in case it is forced again in the future. The term lazy evaluation is often used to describe support for delayed computations with the guarantee of evaluating any such computation at most once.

In Elm, the `Lazy` library provides support for lazy evaluation. (`Lazy` is a community library, so an `elm-package.json` file is required to declare this dependency.) The `lazy` function turns a `Thunk a` into a `Lazy a` value, which `force` evaluates, reusing the result of any previous call to `force`.

``````lazy  : (() -> a) -> Lazy a
force : Lazy a -> a``````

The native JavaScript implementation of `lazy` uses a mutable variable (called `isForced`) to track whether the particular thunk has been evaluated and a mutable variable (called `value`) to store this result.

It is simple to port `ThunkNat.elm` to use the `Lazy` library in order to obtain the benefits of memoization. First, we redefine the type of `Nat` as follows.

``type Nat = Z | S (Lazy Nat)``

Then, we sprinkle a call to `lazy` in front of every thunked value. The resulting implementation can be found in `LazyNat.elm`. (Use `diff` or `vimdiff` to see how similar the two files are.)

Using this implementation, we now expect that `force`-ing an expensive suspension for the second time should be practically instantaneous. As we discussed above, the worst case for `eqNat` is when both its arguments are equal. So let's use a call to `eqNat` as an example of a slow computation.

``````> import LazyNat exposing (..)

> foo i = fromInt i `eqNat` fromInt i
<function> : Int -> Bool

> foo 100000
True : Bool

> foo 1000000
True : Bool``````

The last operation above is quite slow. So, we should be able to delay its evaluation, `force` and memoize its result, and reevaluate it a second time nearly instantaneously. But the second `force` is just as slow as the first!

``````> slow = lazy (\_ -> foo 1000000)
Lazy <function> : Lazy.Lazy Bool

> force slow
True : Bool

> force slow   -- still slow... :-(
True : Bool``````

Good thing we still have our investigative journalist hats on.

``````> force slow
True : Bool

> (force slow, force slow, force slow, force slow, force slow)
(True,True,True,True,True) : ( Bool, Bool, Bool, Bool, Bool )``````

These two expressions require about the same amount of time to evaluate, which suggests that caching is kicking in for the latter case. So it appears that the native memo tables do not persist across REPL operations.

Optional Exercise — Write an Elm program that measures the time it takes to evaluate the previous two expressions (for example, using `Time.fps`).

Streams

A common data structure that incorporates laziness is a stream (a.k.a. lazy list). Having worked through laziness in Elm in detail using the previous examples, our discussion of streams here will be brief, mainly focusing on picking the right representation.

First Attempt — `NotSoLazyList.elm`

One possibility for representing `LazyList`s is the following type.

``````type LazyList a
= Nil
| Cons (Lazy a) (LazyList a)``````

This datatype describes lists that are not very lazy, however. We can define a function `range : Int -> Int -> LazyList Int` and demonstrate how a `LazyList` of n elements immediately builds n `Cons` cells.

``````> range 1 10
Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>)
(Cons (Lazy <function>) Nil)))))))))
: NotSoLazyList.LazyList Int``````

Second Attempt — `PrettyLazyList.elm`

Another option is the following.

``````type LazyList a
= Nil
| Cons a (Lazy (LazyList a))``````

This is pretty good, but notice that a non-`Nil` list must have its first value evaluated. Consider what the representation of a `range` of `Int`s looks like.

``````> range 1 10
Cons 1 (Lazy <function>) : PrettyLazyList.LazyList Int``````

Final Attempt — `LazyList.elm`

What we really want is for all elements in the list, including the first, to be delayed until needed. We can achieve this as follows.

``````type alias LazyList a = Lazy (LazyListCell a)

type LazyListCell a
= Nil
| Cons a (LazyList a)``````

Thought Exercise: Why didn't we use a similar strategy in defining the the lazy `Nat`s before?

The `range` function is incremental. Notice the trivial suspension `lazy (\_ -> Nil)`.

``````range : Int -> Int -> LazyList Int
range i j =
if i > j
then lazy (\_ -> Nil)
else lazy (\_ -> Cons i (range (i+1) j))``````

We can also describe infinite streams.

``````infinite : Int -> LazyList Int
infinite i = lazy (\_ -> Cons i (infinite (i+1)))``````

The `take` function is incremental.

``````take : Int -> LazyList a -> LazyList a
take k l = case (k, force l) of
(0, _)         -> lazy (\_ -> Nil)
(_, Nil)       -> lazy (\_ -> Nil)
(_, Cons x xs) -> lazy (\_ -> Cons x (take (k-1) xs))``````

A lazier version of `take`:

``````take k l =
if k == 0
then lazy (\_ -> Nil)
else case force l of
Nil       -> lazy (\_ -> Nil)
Cons x xs -> lazy (\_ -> Cons x (take (k-1) xs))``````

Incremental function in action:

``````> infinite 1
Lazy <function> : Lazy.Lazy (LazyList.LazyListCell Int)

> infinite 1 |> take 10
Lazy <function> : Lazy.Lazy (LazyList.LazyListCell Int)

> infinite 1 |> take 10 |> toList
[1,2,3,4,5,6,7,8,9,10] : List Int``````

Converting a stream to a `List` is monolithic:

``````toList : LazyList a -> List a
toList l =
let foo acc l = case force l of
Nil       -> acc
Cons x xs -> foo (x::acc) xs
in
List.reverse <| foo [] l``````

The `drop` function is also monolithic, but does not necessarily force every suspension in the stream.

``````drop : Int -> LazyList a -> LazyList a
drop k l =
let foo k l =
if k == 0 then l
else case force l of
Nil       -> lazy (\_ -> Nil)
Cons _ xs -> foo (k-1) xs
in
foo k l``````

For example:

``````> infinite 1 |> drop 10 |> take 10 |> toList
[11,12,13,14,15,16,17,18,19,20] : List Int``````

Can make `drop` slightly lazier by delaying the initial recursive call:

``````  ...
in
lazy (\_ -> force (foo k l))``````

Combining two streams using `append` is incremental.

``````append : LazyList a -> LazyList a -> LazyList a
append xs ys = case force xs of
Nil        -> ys
Cons x xs' -> lazy (\_ -> Cons x (append xs' ys))``````

Reversing a stream is monolithic. Notice that `lazy (\_ -> Cons x acc)` is another example of a trivial thunk. The values `x` and `acc` have already been evaluated, so building the `Cons` value does not force any additional computations.

``````reverse : LazyList a -> LazyList a
reverse l =
let foo acc xs = case force xs of
Nil        -> acc
Cons x xs' -> foo (lazy (\_ -> Cons x acc)) xs'
in
foo (lazy (\_ -> Nil)) l``````

As with `drop`, we can delay the initial recursive call:

``````  ...
in
lazy (\_ -> force (foo (lazy (\_ -> Nil)) xs))``````

(UPDATE 3/3: See `lazylist.ml` for an implementation in OCaml; the `reverse` function runs as intended.)

Our final monolithic example function checks for equality.

``````eq : LazyList a -> LazyList a -> Bool
eq x y =
let foo x y = case (force x, force y) of
(Cons x xs, Cons y ys) -> if x == y then foo xs ys else False
(Nil, Nil)             -> True
_                      -> False
in
foo x y``````

As with equality on `Nat`s, our implementation of equality for `LazyList`s can decide disequalities more quickly than for regular `List`s.

``````> [] == [1..10000000]
False : Bool

> [] == [1..100000000]
FATAL ERROR: JS Allocation failed - process out of memory

> import LazyList exposing (..)

> range 1 0 `eq` range 1 1000000
False : Bool

> range 1 0 `eq` range 1 10000000
False : Bool

> range 1 0 `eq` range 1 1000000000000000
False : Bool

> range 1 0 `eq` infinite 1
False : Bool``````