# Red-Black Tree Delete

Deleting an element from a red-black tree is considerably harder than inserting one. Matt Might presents a deletion algorithm that extends the temporary-invariant-violation plus bubble-and-rotate approach for insertion. Today, we'll reformulate his approach using slightly different terminology and drawing conventions to match the ones we used last time. The code below is found in `RedBlackDelete.elm`.

There will be new kinds of temporary invariant violations, tracked using three new kinds of nodes:

• the black empty tree `BE` which counts as one black node,
• the double black node `T BB left x right` which counts as two black nodes, and
• the double red node `T RR left x right` which counts as negative one black node.

We extend our `Tree` type as follows (differences in terminology and weights compared to the original are noted in parentheses):

``````type Tree  = E    --  0  "empty"        ("black leaf"        +1)
| BE   -- +1  "black empty"  ("double black leaf" +2)

| T Color Tree Int Tree

type Color = BB   -- +2  "double black" ("double black")
| B    -- +1
| R    --  0
| RR   -- -1  "double red"   ("negative black")``````

We extend our convention of drawing square boxes for `B`lack nodes and round circles for Red nodes to the new kinds of `Tree`s:

## Balance: Redux

Before we start, let's split the original `balance` function into pieces, which will set us up for some changes later. The `Balance` alias is shorthand for the type of `balance` as before, and the new `BalanceMaybe` alias describes balancing functions that may or may not perform a transformation:

``````type alias Balance = Color -> Tree -> Int -> Tree -> Tree
type alias BalanceMaybe = Color -> Tree -> Int -> Tree -> Maybe Tree``````

The four rotation cases of `balance` can be factored into a one function ...

``````balance_B_R_R : BalanceMaybe
balance_B_R_R color l val r =
case (color, l, val, r) of
(B, T R (T R a x b) y c, z, d) -> Just <| T R (T B a x b) y (T B c z d)
(B, T R a x (T R b y c), z, d) -> Just <| T R (T B a x b) y (T B c z d)
(B, a, x, T R (T R b y c) z d) -> Just <| T R (T B a x b) y (T B c z d)
(B, a, x, T R b y (T R c z d)) -> Just <| T R (T B a x b) y (T B c z d)
_                              -> Nothing``````

... and the default case can be handled as follows:

``````balance : Balance
balance c l val r =
(balance_B_R_R c l val r)
`maybeElse` (T c l val r)``````

We use the helper function

``````maybeElse : Maybe a -> a -> a
maybeElse mx x =
case mx of
Just x  -> x
Nothing -> x``````

to provide a "default" value in case the `balance_B_R_R` function does not perform a rotation. (The `maybeElse` function is the same as `flip Maybe.withDefault`.)

We will define additional re-balancing functions later. To stitch them together easily (that is, to try one after another until one succeeds), we define another helper function for "adding" `Maybe` values:

``````maybePlus : Maybe a -> Maybe a -> Maybe a
maybePlus mx my =
case mx of
Just x  -> mx
Nothing -> my``````

## Deletion Algorithm: Overview

There are three phases in the algorithm.

### 1. Removal

To remove an element `y`, we first need to traverse the `Tree`, making left or right turns as dictated by the search order property until we find `y` or reach an `E`mpty.

#### Empty

If we reach `E`mpty, there is nothing to remove.

#### Node with 0 Children

If we find `y` at a node with zero children, there are two cases.

 This is the case where a black empty is created, to track the removal of a black node.

#### Node with 1 Children

Now, let's say we find `y` at a node with one child. The following six cases are impossible, the first two because the original tree cannot have red-red violations and the last four because the original tree would not have satisfied the black height property.

There are only two cases that can appear. In both cases, we can locally re-establish the red-black tree invariants:

#### Node with 2 Children

Lastly, let's say we find `y` at a node with two children. We can reduce this case to removing `y` from a node with zero or one child. See the article for an example (look for the diagram with blue and green nodes).

### 2. Bubbling

If the removal phase converts a black node to a black empty, the black empty must be bubbled upwards and dealt with. There are six configurations in which a black node could have been transformed to `BE`.

 1a The black empty becomes empty. This may introduce a `R`/`R` violation, which can be handled with the original `balance_B_R_R` function. 2a The black node becomes double black. This may introduce a `R`/`R` violation below a `BB` node, which will be handled by a new `balance_BB_R_R` function. 3a The black node becomes double black. The red node becomes double red. A new `balance_BB_RR` function will eliminate `RR` nodes, which always appear below `BB` ones.

There are three cases analogous to the above:

 1b 2b 3b

Some transformations above introduce double black nodes that, like black empties, must be bubbled upwards. So, there are six more cases analogous to the ones above.

In each case above, the transformation re-colored the parent with one more unit of black and re-colored each child with one less. The same strategy will apply to the new cases.

 1a' 1b' 2a' 2b' 3a' 3b'

### 3. Rebalancing

The last aspect of the algorithm is to implement the two new re-balancing functions to clean up the effects of bubbling.

Recall the original `balance_B_R_R` (below, left). The four cases for `balance_BB_R_R` (below, right) are similar.

Because only red nodes turn into double red ones, there are only two cases that `balance_BB_RR` must handle.

In both cases, the original `balance_B_R_R` must be called (at most once) to handle a new `R`/`R` violation that may be introduced.

## Deletion Algorithm: Code

We will now implement the three steps.

### 3. Rebalancing

The `balance_BB_R_R` function is quite similar to `balance_B_R_R`:

``````balance_BB_R_R : BalanceMaybe
balance_BB_R_R color l val r =
case (color, l, val, r) of
(BB, T R (T R a x b) y c, z, d) -> Just <| T B (T B a x b) y (T B c z d)
(BB, T R a x (T R b y c), z, d) -> Just <| T B (T B a x b) y (T B c z d)
(BB, a, x, T R (T R b y c) z d) -> Just <| T B (T B a x b) y (T B c z d)
(BB, a, x, T R b y (T R c z d)) -> Just <| T B (T B a x b) y (T B c z d)
_                               -> Nothing``````

One more new balancing function:

``````balance_BB_RR : BalanceMaybe
balance_BB_RR color l val r =
case (color, l, val, r) of
(BB, T RR (T B a w b) x (T B c y d), z, e) -> Just <| T B (balance B (T R a w b) x c) y (T B d z e)
(BB, a, w, T RR (T B b x c) y (T B d z e)) -> Just <| T B (T B a w b) x (balance B c y (T R d z e))
_                                          -> Nothing``````

Finally, we stitch them all together:

``````balance : Balance
balance c l val r =
(balance_B_R_R c l val r)
`maybePlus` (balance_BB_R_R c l val r)
`maybePlus` (balance_BB_RR c l val r)
`maybeElse` (T c l val r)``````

### 2. Bubbling

We define two helper functions for incrementing and decrementing a node's color:

``````incr c =
case c of
BB -> Debug.crash "incr BB"
B  -> BB
R  -> B
RR -> R

decr c =
case c of
BB -> B
B  -> R
R  -> RR
RR -> Debug.crash "decr RR"``````

The 12 cases for our bubbling function are organized according to the three tables above. If we wanted to, we could also factor this function into smaller pieces like we did for `balance`.

``````bubble_BB t =
case t of

-- cases 1a, 2a, 3a
T c1 (T c2 a x b) y BE ->
case (c1, c2) of
(R, B) -> balance (incr c1) (T (decr c2) a x b) y E
(B, B) -> balance (incr c1) (T (decr c2) a x b) y E
(B, R) -> balance (incr c1) (T (decr c2) a x b) y E
_      -> t

-- cases 1b, 2b, 3b
T c1 BE y (T c3 c z d) ->
case (c1, c3) of
(R, B) -> balance (incr c1) E y (T (decr c3) c z d)
(B, B) -> balance (incr c1) E y (T (decr c3) c z d)
(B, R) -> balance (incr c1) E y (T (decr c3) c z d)
_      -> t

-- cases 1a', 1b', 2a', 2b', 3a', 3b'
T c1 (T c2 a x b) y (T c3 c z d) ->
case (c1, c2, c3) of
(R, B, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
(R, BB, B) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
(B, B, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
(B, BB, B) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
(B, R, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
(B, BB, R) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
_          -> t

_ -> t``````

### 1. Removal

Lastly, the top-level removal procedure. As with `insert`, a helper function `rem` does the traversal, and then the last step is to color the resulting root node black.

``````remove : Int -> Tree -> Tree
remove x t =
case rem x t of
T _ l y r -> T B l y r
_         -> Debug.crash "remove"

rem n t =
case t of

BE -> Debug.crash "rem"
E -> E

-- 0 children
T R E x E -> if n == x then BE else t
T B E x E -> if n == x then T BB E x E else t

-- 1 child
T B (T R E x E) y E -> if n == y then T B E x E else t
T B E y (T R E z E) -> if n == y then T B E z E else t
T _ E _ _           -> Debug.crash "rem"
T _ _ _  E          -> Debug.crash "rem"

-- 2 children
T c l y r ->
if n < y then balance c (bubble_BB (rem n l)) y r
else if n > y then balance c l y (bubble_BB (rem n r))
else {- n == y -} Debug.crash "TODO"``````

Calls to `bubble_BB` may return trees with red roots (via `balance_B_R_R` via `balance`), so we propagate these red-red violations upwards and call `balance` to fix downward violations. We leave the case for removing a node with two children as an exercise. We use `Debug.crash` for the cases that should never appear.