Let us review the basic mathematical techniques for describing the time- and space-efficiency of algorithms. The "costs" of algorithmic operations are typically specified as functions of (properties of) their inputs. As a result, tools are required to "compare" functions.

*f(n)* is *O(g(n))* if there exist constants *c > 0* and *N > 0* such that for all *n ≥ N*, *f(n) ≤ c·g(n)*. This is read "*f(n)* is Big-O of *g(n)*". Definitions sometimes vary in their uses of *>, ≥, <, and ≤*, but these differences do not matter much.

**Example**

*f*_{1}(n) = n/100*f*_{2}(n) = 100n*f*_{3}(n) = 100n + 23

*f _{1}(n) is O(n)* because

*f*for all_{1}(n) ≤ n*n ≥ 0*.

*f _{2}(n) is O(n)* because

*f*for all_{2}(n) ≤ 100·n*n ≥ 0*.

*f _{3}(n) is O(n)* because

*f*for all_{3}(n) ≤ 101·n*n ≥ 23*, or*f*for all_{3}(n) ≤ 102·n*n ≥ 12*, or*f*for all_{3}(n) ≤ 111·n*n ≥ 3*, or*f*for all_{3}(n) ≤ 200·n*n ≥ 0*, or ...

In computer science settings, *g(n)* is often referred to as Big-Omega of *f(n)* (written *g(n)* is *Ω(f(n))*) if *f(n)* is *O(g(n))*. However, this definition conflicts with other definitions of Big-Omega.

A Big-O relationship specifies an asymptotic upper bound but provides no information about how "tight" the bound is. For example, *n ^{2}* is

It is often useful to establish a lower bound *L(n)* for a function *f(n)* (such that *L(n)* is *O(f(n))*) in addition to an upper bound *U(n)* (such that *f(n)* is *O(U(n))*). When *L = U*, the bound is referred to as tight.

Tight bounds are described by Big-Theta notation. That is, *f(n)* is *Θ(g(n))* if *f(n)* is *O(g(n))* and *g(n)* is *O(f(n))*.

**Optional Exercise** — Prove that for all *i* and *j*, *log _{i}(n)* is

Unless otherwise noted, we write *O(log(n))* as shorthand for *O(log _{2}(n))*. Considering the fact above, this convention is justified with respect to asymptotic growth rates.

**Optional Exercise** — Define an ordering *[f _{1}, f_{2}, f_{3,}...]* of the functions below such that for all

*2*^{n}*n**4**sqrt(n)**n*^{2}+3n*log(n)**n·log(n)**n!**n*^{5}- n^{4}*1.5*^{n}

We will consider a couple simple algorithms that traverse data structures looking for a particular element.

Here is a simple `find`

function that traverses the list `xs`

:

```
find x xs = case xs of
[] -> False
y::xs' -> if x == y then True else find x xs'
```

We want to define a function *T(n)* to describe the time cost of `find`

, where *n* is the size of (i.e. number of elements in) the `xs`

list.

There are three cases to consider:

If

`xs`

is`[]`

, then`find`

returns after some constant*c1*amount of work for the pattern matching. So,*T(n) = c1*.If the head

`y`

of the list equals`x`

, then`find`

returns after some constant*c2*amount of work for the pattern matching and equality test (note that we are assuming that`(==)`

is a constant-time operation, which may not always be a reasonable assumption). So,*T(n) = c2*.Otherwise,

`find`

performs some constant*c3*amount of work for the pattern matching and comparison*and*also recursively calls`find`

with the rest of the list`xs'`

. So,*T(n) = c3 + T(n-1)*.

To compute an upper bound on the running time of `find`

, we must consider the worst case, which means traversing the entire list without finding the desired element `x`

. By unrolling the three cases above for the entire list, we obtain the following recurrence relation below:

```
T(n) = c3 + T(n-1)
T(n-1) = c3 + T(n-2)
T(n-2) = c3 + T(n-3)
...
T(1) = c3 + T(0)
T(0) = c1
```

To compute a closed form solution for *T(n)*, we sum the system of equations and obtain:

` T(n) = n*c3 + c1`

Because, *T(n)* is *O(n)*, we say that `find`

runs in worst-case *O(n)* time.

What about the running time of `find`

if the input list `xs`

is sorted? Even so, the worst case for the function is that `x`

is absent from the list, and the only way to be sure is to traverse all of its elements.

Note that this *linear search* algorithm is guided by the constraint that, in a purely functional language, only the *head* of a list can be accessed at once. In a language with pointers or references, *arbitrary* elements of a list can be accessed in constant-time (such lists are usually called arrays). The cost of the *binary search* algorithm on arrays has a much smaller asymptotic growth rate. (Later, we will see how purely functional languages typically provide an array data structure.)

An analogous lookup function on binary trees:

```
type Tree a = Empty | Node a (Tree a) (Tree a)
findBT x t = case t of
Empty -> False
Node y left right -> x == y || findBT x left || findBT x right
```

Without knowing anything about the structure of the tree, `findBT`

must search the entire tree, by recursively traversing the `left`

and `right`

subtrees in the worst case. If the size of `t`

= `Node y left right`

is *n*, then the combined size of `left`

and `right`

is *n-1*. Therefore, making both recursive calls takes time *T(n-1)* in total. This leads to the following recurrence relation, where we make use of constants *c* and *c'* for operations involved in the worst (requiring both recursive calls) and best (inspecting the `Empty`

tree) cases, respectively.

```
T(n) = c + T(n-1)
T(n-1) = c + T(n-2)
T(n-2) = c + T(n-3)
...
T(1) = c + T(0)
T(0) = c'
-------------------
T(n) = n*c + c'
```

As a result, `findBT`

runs in worst-case *O(n)* time (in fact, in *Θ(n)* time).

A binary tree satisfies the *binary search order* property if for every subtree of the form `Node x left right`

, all `Node`

s in `left`

store values `y`

such that `y < x`

and all `Node`

s in `right`

store values `y`

such that `x < y`

.

Assuming that `t`

satisfies this ordering property, `findBST`

is defined to look *only* in the subtree that contains the desired element `x`

:

```
findBST x t = case t of
Empty -> False
Node y left right ->
if x == y then True
else if x < y then findBST x left
else {- x > y -} findBST x right
```

The recurrence relation for `findBST`

, however, looks just like the one for `findBT`

; even though only one recursive call is made in the worst case (either to `left`

or `right`

), the worst case size of the traversed subtree is *n-1*. Therefore, `findBST`

runs in *O(n)* time.

If a binary-search ordered tree is also *balanced*, where every subtree's children are roughly the same size (there are various reasonable definitions, as we will see later in the course), then the recurrence relation for `findBST`

is:

```
T(n) = c + T(n/2)
T(n/2) = c + T(n/4)
T(n/4) = c + T(n/8)
...
T(1) = c + T(0)
T(0) = c'
-----------------------
T(n) = (log n)*c + c'
```

As opposed to *n* recursive calls in the worst case before, there are now at most *log _{2}(n)* recursive calls, because each child tree is (roughly) half the size. Therefore,

`findBST`

runs in worst-case Notice that we are not being thorough in tracking exactly what the sizes of subtrees are, and we are simply taking them to be half the size of the entire tree. Although this can and should be rigorously worked out, in the end it does not matter, because of the wiggle room provided by the constants and inequalities in the definitions of rates of growth. If you are not already comfortable with this material, you may wish to read more background on these topics.

A more long-winded primer on asymptotic analysis and recurrence relations (using Java programming examples) can be found in Chapters 7 and 8 of these notes.

A comprehensive presentation of these topics can be found in the classic CLRS textbook.