Red-Black Trees

Red-black trees are binary search ordered trees that are roughly balanced, resulting in O(log n) membership, insertion, and deletion operations. The code for this lecture can be found in RedBlackTrees.elm.

All nodes in a red-black tree are colored red or black. Empty nodes are considered to be black.

type Color = R | B
type Tree  = E | T Color Tree Int Tree

color t = case t of T c _ _ _ -> c
                    E -> B

Invariants

A tree t is a valid red-black tree if:

  1. t satisfies the binary search order property. That is, bso t == True, where

    bso t = case t of
      E         -> True
      T _ l x r -> (l == E || root l < x) &&
                   (r == E || x < root r) &&
                   bso l && bso r
    
    root t = case t of T _ _ x _ -> x
                       E -> Debug.crash "root"
  2. No red node in t has a red child. That is, noRedRed t == True, where

    noRedRed t = case t of
      E                   -> True
      T R (T R _ _ _) _ _ -> False
      T R _ _ (T R _ _ _) -> False
      T _ l _ r           -> noRedRed l && noRedRed r
  3. Every path from the root of t to a leaf contains the same number of black nodes. That is, okBlack t == True, where

    okBlack t = case blackHeight t of
      Just _ -> True
      Nothing -> False
    
    blackHeight t = case t of
      E -> Just 0
      T c l _ r ->
        case (blackHeight l, blackHeight r) of
          (Just n, Just m) -> if n /= m then Nothing
                              else if c == B then Just (1 + n)
                              else Just n
          _                -> Nothing

    Note that we do not include E nodes in path lengths. When blackHeight t == Just n, we refer to n as the black height of t.

    bh t =
      case blackHeight t of
        Just n  -> n
        Nothing -> Debug.crash "bh"
  4. The root node of t is black. That is, color t == B. This requirement is not strictly necessary, but it makes the relationship between height and black height, below, more intuitive.

To summarize the invariants:

rb t = bso t && noRedRed t && okBlack t && root t == B

Properties

A consequence of the noRedRed invariant is that the length of the longest path from root to leaf in a red-black tree t is at most twice the length of the shortest path. Thus, height t <= 2 * bh t, where

height t = case t of
  E         -> 0
  T _ l _ r -> 1 + max (height l) (height r)

size t = case t of
  E         -> 0
  T _ l _ r -> 1 + size l + size r

In-Class Exercise

  • Prove: t. rb tsize t2^(bh t) - 1
  • Prove: t. rb theight t2(log(1 + size t))

Thus, the height of a red-black tree t of size n is O(log n).

Membership

Finding an element in a red-black tree proceeds just like finding an element in an unbalanced binary search tree (cf. findBST).

member : Int -> Tree -> Bool
member x t = case t of
  E -> False
  T _ l y r ->
    if x == y then True
    else if x < y then member x l
    else member x r

Insertion

When not worrying about maintaining the balancedness of a binary search tree, the insertion procedure walks down a path in the tree making left and right turns as necessary according to the order property. Then, if the element is found nowhere in the tree, it is added as a leaf.

This naive approach could simply add a black node at this final position, satisfying the noRedRed invariant, but unfortunately does not preserve the blackHeight property.

Instead, the idea behind the insertion algorithm is to color the new node red, possibly resulting in temporary red-red violation, and then walk back up the search path fixing and propagating any violations upwards. The algorithm maintains the invariant that at most one red-red violation is allowed at a time.

The ins function walks down the search path, inserts a red node as the new leaf, and walks back up the search path calling balance to fix any temporary red-red violations.

ins : Int -> Tree -> Tree
ins x t =
  case t of
    E -> T R E x E
    T c l y r ->
      if x == y then t
      else if x < y then balance c (ins x l) y r
      else balance c l y (ins x r)

The balance function looks for red-red violations, which can occur in one of four configurations. In each case, the solution is the same.

balance : Color -> Tree -> Int -> Tree -> Tree
balance c l val r =
  case (c, l, val, r) of
    (B, T R (T R a x b) y c, z, d) -> T R (T B a x b) y (T B c z d)
    (B, T R a x (T R b y c), z, d) -> T R (T B a x b) y (T B c z d)
    (B, a, x, T R (T R b y c) z d) -> T R (T B a x b) y (T B c z d)
    (B, a, x, T R b y (T R c z d)) -> T R (T B a x b) y (T B c z d)
    _                              -> T c l val r

The balance function fixes a red-red violation when processing a black parent node that contains it. If ins propagates a red-red violation all the way up to the root, there is no call to balance to fix it. Therefore, the last step in the insertion algorithm is to unconditionally color the new root node black.

insert : Int -> Tree -> Tree
insert x t =
  let (T _ l y r) = ins x t in
    T B l y r

Contracts

Although it is great to prove properties of pseudocode on paper, it is often useful to dynamically check that the input and output behavior of a function is as expected.

One style of such testing is referred to as design by contract, where a function may be decorated with predicates (Bool-valued functions) that describe preconditions on arguments that callers are expected to satisfy and a postconditions on return values that the callee is expected to establish.

For example, the function

\x -> e

may expect x to satisfy some predicate pArg and may intend e to satisfy some predicate pRet. If either of these expectations is violated at run-time, an error should be reported (one that, ideally, can help to identify what part of the program is responsible for the failure).

Below is one way to rewrite the above function with such pre- and postconditions. Notice how this function is a "wrapper" around the original function.

\x ->
  let _ = if pArg x then () else Debug.crash "..." in
  let ret = e in
    if pRet ret then ret
    else Debug.crash "..."

The insertion algorithm is a bit tricky, so let's define wrapper versions with contracts to gain more confidence that the implementation corresponds to the analysis that we have done on paper.

We start by defining some helper functions.

check s p x =
  if p x
    then x
    else Debug.crash (s ++ "\n" ++ toString x)

checkArg s = check ("ARG CONTRACT ERROR: " ++ s)
checkRet s = check ("RET CONTRACT ERROR: " ++ s)

Now, we'll define versions of the previous functions (labeled with a "'") that perform contract checks on each invocation. In practice, we would probably want to include more descriptive error strings to report upon failure.

The overall insert' algorithm expects its argument to satisfy rb and returns an updated tree that satisfies rb.

insert' x t_ =
  let t = checkArg "[insert']" rb t_ in
  let ret =
    let (T _ l y r) = ins' x t in
       T B l y r
  in
  checkRet "[insert']" rb ret

The precondition for ins' is interesting, because its argument ought to satisfy all the red-black tree invariants except possibly that the color of the root is red. The resulting tree has the same blackHeight as the original tree and might have one red-red violation. We'll omit the bso post-condition from ins' (and balance' below) to save a bit of run-time overhead.

ins' x t_ =
  let t = checkArg "[ins']" rbExceptRoot t_ in
  let ret =
    case t of
      E -> T R E x E
      T c l y r ->
        if x == y then t
        else if x < y then balance' c (ins' x l) y r
        else balance' c l y (ins' x r)
  in
  checkRet "[ins']" (\t' -> bh t' == bh t && maybeOneRedRed t') ret

rbExceptRoot t = bso t && noRedRed t && okBlack t

maybeOneRedRed t = oneRedRed t || noRedRed t

oneRedRed t = case t of
  E                             -> False
  T R (T R _ _ _) _ (T R _ _ _) -> False
  T R (T R l1 _ r1) _ r         -> noRedRed l1 && noRedRed r1 && noRedRed r
  T R l _ (T R l2 _ r2)         -> noRedRed l && noRedRed l2 && noRedRed r2
  T _ l _ r                     -> False

The balance' function expects the subtrees l and r to have the same blackHeight and that at most of them has a red-red violation. The resulting tree has the same blackHeight as a tree constructed directly with the T data constructor rather than via the different rotations. If one of the four rebalancing cases is triggered, then the resulting tree t' has noRedRed violation. Otherwise, the (untransformed) tree t' may have one. Therefore, the postcondition is maybeOneRedRed.

balance' c l val r_ =
  let r = 
    checkArg "[balance']" (\arg ->
      (bh l == bh arg) &&
      (xor [  noRedRed l &&  noRedRed arg
           , oneRedRed l &&  noRedRed arg
           ,  noRedRed l && oneRedRed arg ])) r_
  in
  let ret =
    case (c, l, val, r) of
      (B, T R (T R a x b) y c, z, d) -> T R (T B a x b) y (T B c z d)
      (B, T R a x (T R b y c), z, d) -> T R (T B a x b) y (T B c z d)
      (B, a, x, T R (T R b y c) z d) -> T R (T B a x b) y (T B c z d)
      (B, a, x, T R b y (T R c z d)) -> T R (T B a x b) y (T B c z d)
      _                              -> T c l val r
  in
  checkRet "[balance']"
    (\t' -> maybeOneRedRed t' && bh t' == bh (T c l val r)) ret

xor bs = List.filter (\b -> b == True) bs == [True]

Because pre- and post-conditions are evaluated on every function call (and the predicates we have supplied are linear in the size of their arguments), these versions run much slower than the original versions.

> t = buildRandom insert 1000        -- fast

> t' = buildRandom insert' 1000      -- slow

> t == t'
True : Bool

During program development, however, we are often willing to trade performance in exchange for help with testing and identifying bugs.

Try introducing some bugs into the implementation in order to trigger run-time contract violations.

Some languages, such as Racket, provide extensive support for specifying and checking contracts. One might imagine an extension to Elm that allowed us to write the following more precise function types, which automatically get translated to the wrapped functions we have manually written above. Notice that, in this syntax, types are "refined" by predicate functions.

insert' : x:Int -> t:Tree{rb t} -> t':Tree{rb t'}

ins' : x:Int -> t:Tree{rb t} -> t':Tree{bh t' == bh t && maybeOneRedRed t'}

balance' : c:Color
        -> l:Tree
        -> val:Int
        -> r:Tree{bh l == bh r && xor [ ... ]}
        -> t':Tree{bh t' == bh (T c l val r) && maybeOneRedRed t'}

Better still would be for the language to statically figure out that these contracts will never violated at run-time and, hence, do not need to be checked dynamically.

Alas, in this course, we will not cover software verification techniques or dependent type systems that are able to reason about certain fine-grained program invariants like these. But this is a longstanding and very active area of research.

Deletion

Stay tuned...


Reading

Required

  • Okasaki, Chapter 3.3