# Red-Black Trees

Red-black trees are binary search ordered trees that are roughly balanced, resulting in O(log n) membership, insertion, and deletion operations. The code for this lecture can be found in `RedBlackTrees.elm`.

All nodes in a red-black tree are colored red or black. Empty nodes are considered to be black.

``````type Color = R | B
type Tree  = E | T Color Tree Int Tree

color t = case t of T c _ _ _ -> c
E -> B``````

## Invariants

A tree `t` is a valid red-black tree if:

1. `t` satisfies the binary search order property. That is, `bso t == True`, where

``````bso t = case t of
E         -> True
T _ l x r -> (l == E || root l < x) &&
(r == E || x < root r) &&
bso l && bso r

root t = case t of T _ _ x _ -> x
E -> Debug.crash "root"``````
2. No red node in `t` has a red child. That is, `noRedRed t == True`, where

``````noRedRed t = case t of
E                   -> True
T R (T R _ _ _) _ _ -> False
T R _ _ (T R _ _ _) -> False
T _ l _ r           -> noRedRed l && noRedRed r``````
3. Every path from the root of `t` to a leaf contains the same number of black nodes. That is, `okBlack t == True`, where

``````okBlack t = case blackHeight t of
Just _ -> True
Nothing -> False

blackHeight t = case t of
E -> Just 0
T c l _ r ->
case (blackHeight l, blackHeight r) of
(Just n, Just m) -> if n /= m then Nothing
else if c == B then Just (1 + n)
else Just n
_                -> Nothing``````

Note that we do not include `E` nodes in path lengths. When `blackHeight t == Just n`, we refer to `n` as the black height of `t`.

``````bh t =
case blackHeight t of
Just n  -> n
Nothing -> Debug.crash "bh"``````
4. The root node of `t` is black. That is, `color t == B`. This requirement is not strictly necessary, but it makes the relationship between height and black height, below, more intuitive.

To summarize the invariants:

``rb t = bso t && noRedRed t && okBlack t && root t == B``

## Properties

A consequence of the `noRedRed` invariant is that the length of the longest path from root to leaf in a red-black tree `t` is at most twice the length of the shortest path. Thus, `height t <= 2 * bh t`, where

``````height t = case t of
E         -> 0
T _ l _ r -> 1 + max (height l) (height r)

size t = case t of
E         -> 0
T _ l _ r -> 1 + size l + size r``````

In-Class Exercise

• Prove: `t`. `rb t``size t``2^(bh t) - 1`
• Prove: `t`. `rb t``height t``2(log(1 + size t))`

Thus, the height of a red-black tree `t` of size `n` is O(`log n`).

## Membership

Finding an element in a red-black tree proceeds just like finding an element in an unbalanced binary search tree (cf. `findBST`).

``````member : Int -> Tree -> Bool
member x t = case t of
E -> False
T _ l y r ->
if x == y then True
else if x < y then member x l
else member x r``````

## Insertion

When not worrying about maintaining the balancedness of a binary search tree, the insertion procedure walks down a path in the tree making left and right turns as necessary according to the order property. Then, if the element is found nowhere in the tree, it is added as a leaf.

This naive approach could simply add a black node at this final position, satisfying the `noRedRed` invariant, but unfortunately does not preserve the `blackHeight` property.

Instead, the idea behind the insertion algorithm is to color the new node red, possibly resulting in temporary red-red violation, and then walk back up the search path fixing and propagating any violations upwards. The algorithm maintains the invariant that at most one red-red violation is allowed at a time.

The `ins` function walks down the search path, inserts a red node as the new leaf, and walks back up the search path calling `balance` to fix any temporary red-red violations.

``````ins : Int -> Tree -> Tree
ins x t =
case t of
E -> T R E x E
T c l y r ->
if x == y then t
else if x < y then balance c (ins x l) y r
else balance c l y (ins x r)``````

The `balance` function looks for red-red violations, which can occur in one of four configurations. In each case, the solution is the same.

``````balance : Color -> Tree -> Int -> Tree -> Tree
balance c l val r =
case (c, l, val, r) of
(B, T R (T R a x b) y c, z, d) -> T R (T B a x b) y (T B c z d)
(B, T R a x (T R b y c), z, d) -> T R (T B a x b) y (T B c z d)
(B, a, x, T R (T R b y c) z d) -> T R (T B a x b) y (T B c z d)
(B, a, x, T R b y (T R c z d)) -> T R (T B a x b) y (T B c z d)
_                              -> T c l val r``````

The `balance` function fixes a red-red violation when processing a black parent node that contains it. If `ins` propagates a red-red violation all the way up to the root, there is no call to `balance` to fix it. Therefore, the last step in the insertion algorithm is to unconditionally color the new root node black.

``````insert : Int -> Tree -> Tree
insert x t =
let (T _ l y r) = ins x t in
T B l y r``````

## Contracts

Although it is great to prove properties of pseudocode on paper, it is often useful to dynamically check that the input and output behavior of a function is as expected.

One style of such testing is referred to as design by contract, where a function may be decorated with predicates (`Bool`-valued functions) that describe preconditions on arguments that callers are expected to satisfy and a postconditions on return values that the callee is expected to establish.

For example, the function

``\x -> e``

may expect `x` to satisfy some predicate `pArg` and may intend `e` to satisfy some predicate `pRet`. If either of these expectations is violated at run-time, an error should be reported (one that, ideally, can help to identify what part of the program is responsible for the failure).

Below is one way to rewrite the above function with such pre- and postconditions. Notice how this function is a "wrapper" around the original function.

``````\x ->
let _ = if pArg x then () else Debug.crash "..." in
let ret = e in
if pRet ret then ret
else Debug.crash "..."``````

The insertion algorithm is a bit tricky, so let's define wrapper versions with contracts to gain more confidence that the implementation corresponds to the analysis that we have done on paper.

We start by defining some helper functions.

``````check s p x =
if p x
then x
else Debug.crash (s ++ "\n" ++ toString x)

checkArg s = check ("ARG CONTRACT ERROR: " ++ s)
checkRet s = check ("RET CONTRACT ERROR: " ++ s)``````

Now, we'll define versions of the previous functions (labeled with a "`'`") that perform contract checks on each invocation. In practice, we would probably want to include more descriptive error strings to report upon failure.

The overall `insert'` algorithm expects its argument to satisfy `rb` and returns an updated tree that satisfies `rb`.

``````insert' x t_ =
let t = checkArg "[insert']" rb t_ in
let ret =
let (T _ l y r) = ins' x t in
T B l y r
in
checkRet "[insert']" rb ret``````

The precondition for `ins'` is interesting, because its argument ought to satisfy all the red-black tree invariants except possibly that the color of the root is red. The resulting tree has the same `blackHeight` as the original tree and might have one red-red violation. We'll omit the `bso` post-condition from `ins'` (and `balance'` below) to save a bit of run-time overhead.

``````ins' x t_ =
let t = checkArg "[ins']" rbExceptRoot t_ in
let ret =
case t of
E -> T R E x E
T c l y r ->
if x == y then t
else if x < y then balance' c (ins' x l) y r
else balance' c l y (ins' x r)
in
checkRet "[ins']" (\t' -> bh t' == bh t && maybeOneRedRed t') ret

rbExceptRoot t = bso t && noRedRed t && okBlack t

maybeOneRedRed t = oneRedRed t || noRedRed t

oneRedRed t = case t of
E                             -> False
T R (T R _ _ _) _ (T R _ _ _) -> False
T R (T R l1 _ r1) _ r         -> noRedRed l1 && noRedRed r1 && noRedRed r
T R l _ (T R l2 _ r2)         -> noRedRed l && noRedRed l2 && noRedRed r2
T _ l _ r                     -> False``````

The `balance'` function expects the subtrees `l` and `r` to have the same `blackHeight` and that at most of them has a red-red violation. The resulting tree has the same `blackHeight` as a tree constructed directly with the `T` data constructor rather than via the different rotations. If one of the four rebalancing cases is triggered, then the resulting tree `t'` has `noRedRed` violation. Otherwise, the (untransformed) tree `t'` may have one. Therefore, the postcondition is `maybeOneRedRed`.

``````balance' c l val r_ =
let r =
checkArg "[balance']" (\arg ->
(bh l == bh arg) &&
(xor [  noRedRed l &&  noRedRed arg
, oneRedRed l &&  noRedRed arg
,  noRedRed l && oneRedRed arg ])) r_
in
let ret =
case (c, l, val, r) of
(B, T R (T R a x b) y c, z, d) -> T R (T B a x b) y (T B c z d)
(B, T R a x (T R b y c), z, d) -> T R (T B a x b) y (T B c z d)
(B, a, x, T R (T R b y c) z d) -> T R (T B a x b) y (T B c z d)
(B, a, x, T R b y (T R c z d)) -> T R (T B a x b) y (T B c z d)
_                              -> T c l val r
in
checkRet "[balance']"
(\t' -> maybeOneRedRed t' && bh t' == bh (T c l val r)) ret

xor bs = List.filter (\b -> b == True) bs == [True]``````

Because pre- and post-conditions are evaluated on every function call (and the predicates we have supplied are linear in the size of their arguments), these versions run much slower than the original versions.

``````> t = buildRandom insert 1000        -- fast

> t' = buildRandom insert' 1000      -- slow

> t == t'
True : Bool``````

During program development, however, we are often willing to trade performance in exchange for help with testing and identifying bugs.

Try introducing some bugs into the implementation in order to trigger run-time contract violations.

Some languages, such as Racket, provide extensive support for specifying and checking contracts. One might imagine an extension to Elm that allowed us to write the following more precise function types, which automatically get translated to the wrapped functions we have manually written above. Notice that, in this syntax, types are "refined" by predicate functions.

``````insert' : x:Int -> t:Tree{rb t} -> t':Tree{rb t'}

ins' : x:Int -> t:Tree{rb t} -> t':Tree{bh t' == bh t && maybeOneRedRed t'}

balance' : c:Color
-> l:Tree
-> val:Int
-> r:Tree{bh l == bh r && xor [ ... ]}
-> t':Tree{bh t' == bh (T c l val r) && maybeOneRedRed t'}``````

Better still would be for the language to statically figure out that these contracts will never violated at run-time and, hence, do not need to be checked dynamically.

Alas, in this course, we will not cover software verification techniques or dependent type systems that are able to reason about certain fine-grained program invariants like these. But this is a longstanding and very active area of research.

Stay tuned...