What makes implementing efficient data structures harder in a purely functional setting compared to an imperative one?

The lack of

*imperative*, or*destructive*, updates leads to copying possibly large parts of the data structure, andAll versions of a data structure are expected to be

*persistent*and can be used even after updates.

Any practical compiler for a purely functional language aggressively looks for opportunities to *share* the representation of data that is common to multiple data structures. Nevertheless, space overhead is unavoidable when compared to similar data structures in imperative settings. However, as we will see later in the course, with clever design and a dash of lazy evaluation, many efficient data structures can be designed for purely functional settings. But first, some basics...

The following function updates the element in `xs`

at index `i`

, if one exists, with the value `y`

:

```
update : List a -> Int -> a -> List a
update xs i y =
case (xs, i) of
([], _) -> []
(x::xs', 0) -> y :: xs'
(x::xs', _) -> x :: update xs' (i-1) y
```

As with the `find`

function that walks through a list one element at a time, so too does `update`

by keeping track of how many more elements `i`

to traverse. The worst-case running time of `update`

is *O(n)*, where *n* is the size of `xs`

. Why isn't the worst-case running time infinite, when `i`

is negative?

In addition to running times, we also need to pay attention to the space overheads incurred. That is, how much additional data needs to be represented during and after a call to the function.

In the second case, notice that the output list `y :: xs'`

refers to `xs'`

, which is identical to the tail of the input list. This is an opportunity for the language implementation, which will of course use pointers and pointer manipulation, to share the underlying representation of `xs'`

among the representations of the input and output lists.

In the third case, on the other hand, there is no such opportunity for sharing; the resulting list is built from `x`

and some unknown list that results from the recursive call to `update'`

. As a result, a new memory cell must be allocated for `x`

in the output list and concatenated to the list that results from the recursive call. This case embodies the worst case with respect to space (in addition to time). If `update`

needs to update the last element in the list, then the first *n-1* elements of `xs`

are copied and cons-ed on to the singleton list `y :: []`

. Thus, `update`

allocates *O(n)* space in the worst case.

Another list operation to consider is `append`

:

```
append : List a -> List a -> List a
append xs ys =
case xs of
[] -> ys
x::xs' -> x :: append xs' ys
```

Let *n* be the size of `xs`

and *m* be the size of `ys`

. Notice that the elements of `ys`

are not traversed at all (so its number of elements cannot contribute to execution time) and `ys`

is referred to by name in the first case, where the language implementation can share its representation in the input and output lists. Thus, the worst-case time and space cost of `append`

is *O(n)*.

In practice, it is common to use a single variable *n* to describe the combined size of the lists, or the size of the larger list. In either case, the worst-case time and space costs are still *O(n)*.

Recall the `findBST`

function that searches for an element in a binary-search-ordered tree, making a recursive call only in the subtree where the element can possibly appear. The insertion algorithm that preserves the binary search property is similar:

```
insert : comparable -> Tree comparable -> Tree comparable
insert x t = case t of
Empty -> Node x Empty Empty
Node y left right ->
if x == y then t
else if x < y then Node y (insert x left) right
else {- x > y -} Node y left (insert x right)
```

As with `findBST`

, `insert`

runs in *O(h)* time, where *h* is the height of the tree. The height *h* of a tree *t* is equal to its size *n* in the worst case, where every `Node`

has only one element (that is, *t* is a crooked list). And *h* is *O(n·log(n))* in the best case, when *t* is (roughly) balanced.

Regarding space, the `insert`

function copies into the output tree all of the node values `y`

that are observed until an `Empty`

is reached. But the `right`

subtree in the second case and the `left`

subtree in the third case are common to both the input and output trees, so the implementation has the opportunity to share this representation. Because at most *h* nodes are traversed, `insert`

allocates *O(h)* space in the worst case.

`Persistence.elm`

contains the `List`

and `Tree`

examples above.

- Okasaki, Chapters 1 and 2