Heaps

Watch the video first, and try to work through and code up the various pieces that come up during the pauses. If you get stuck somewhere, you can peek at the reading and solutions below (scroll slowly!). But then get back to working through the tasks before reading everything below in full.

Here is the starter ArrayHeap.elm file (mostly) as shown in the video. This file also has a bare-bones tester — paired with ArrayHeap.html — which uses TreantJS to draw heaps. You'll also need these files three in the same directory. Run elm make ArrayHeap.elm --output=ArrayHeap.js and view ArrayHeap.html. Play with the hard-coded theTestCase. As you develop your ArrayHeap module, perhaps you'll find the urge to turn this into a fancier visualization and testing tool.


Regularly scheduled programming below...


The min-heap abstraction below is defined for any element type that is comparable. The following type signatures define our Heap interface:

type Heap a

empty     : Heap comparable
isEmpty   : Heap comparable -> Bool
findMin   : Heap comparable -> Maybe comparable
deleteMin : Heap comparable -> Maybe (comparable, Heap comparable)
insert    : comparable -> Heap comparable -> Heap comparable
merge     : Heap comparable -> Heap comparable -> Heap comparable

Max-heaps are defined similarly.

Heaps as Arrays

One way to represent a min-heap is with a complete binary tree — where every level except possibly the last is completely full and the nodes in the last level are as far left as possible — that satisfies the min-heap property — which requires that every node be no bigger than its children.

Complete binary trees can be implemented using arrays: The nodes...

    Depth 0              0

    Depth 1        1           2

    Depth 2     3     4     5     6

               . .   . .   . .   . .

... can be stored in in breadth-first order in an array a:

      i:     0   1   2   3   4   5   6   ...
           ----------------------------------
   a[i]:   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | ...
           ----------------------------------

And then navigating the parent and child edges for a node stored in a[i] is easy:

  • the parent is stored in a[(i-1)//2],
  • the left child (if any) is stored in a[(2*i)+1], and
  • the right child (if any) is stored in a[(2*i)+2].

In functional languages, purely functional arrays are often implemented in terms of balanced search trees, which typically provide O(log n) access to arbitrary elements. We will talk about how to implement balanced search trees later.

In imperative languages, reading and writing an arbitrary element in an array takes O(1) time. In Elm, the Array library is implemented by using native JavaScript arrays. But because this implementation of set copies the input array in order to provide a functional update API, it does not run in O(1) time. (Check out the implementation if you're curious.)

Below, we're going to assume O(1) time for set anyway, just to keep the discussion of the algorithm as it would be in an imperative language. After reading through, it would be a good exercise to reconsider the running times for insert and deleteMin taking into account the actual running times of the Array primitives. Later in the course, we'll see how to implement purely functional arrays.

Internal Representation vs. External Interface

We will make use of the Array library to implement the heap abstraction. If we choose to define and export

type alias Heap a = Array a

then clients of our module will be able to see that we have implemented Heaps using Arrays and will have access to the Array values that we will use to represent Heaps. If we wish to hide the representation from clients, we can instead define a new type

type Heap a = Heap (Array a)

and export only those functions that we want clients to use:

module Heaps exposing
  (Heap, empty, isEmpty, findMin, deleteMin, insert, merge)
  ...

We can't create a Heap value directly...

> import ArrayHeap exposing (..)
> import Array
> Heap Array.empty
-- NAMING ERROR ------------------------------------------------------------ elm

I cannot find a `Heap` constructor:

6|   Heap Array.empty
     ^^^^

because the module declaration exposes only the Heap type and not the Heap data constructor. We could change the module declaration to expose the data constructors, too (notice the Heap(..)):

module Heaps exposing
  (Heap(..), empty, isEmpty, findMin, deleteMin, insert, merge)
  ...

Then we would be able to use the Heap data constructor to directly create Heap values from outside the module:

> Heap Array.empty
Heap (Array.fromList []) : Heap a

However, we choose not to expose the internal representation of the Heap type to the client. Furthermore, we will not provide any functions of type Heap comparable -> Array comparable that allow accessing the internal representation. So, clients will not be able to manipulate Heaps directly with Array operators.

Creating abstraction boundaries like this facilitate the software engineering process, by preventing clients from (intentionally or accidentally) violating invariants that the correctness of the module implementation depends on, as well as facilitating changes to the implementation by limiting and making explicit the boundaries between modules.

To implement the Heap abstraction, we will use the following Array operators with assumed bounds for their running times:

Array.empty   : Array a                           -- O(1)
Array.isEmpty : Array a -> Bool                   -- O(1)
Array.length  : Array a -> Int                    -- O(1)
Array.push    : a -> Array a -> Array a           -- O(1) assumed, for simplicity
Array.get     : Int -> Array a -> Maybe a         -- O(1)
Array.set     : Int -> a -> Array a -> Array a    -- O(1) assumed, for simplicity
Array.slice   : Int -> Int -> Array a -> Array a  -- O(1) when called with 0 -1

The Array.push function creates a new array that is the same as the input array except that it contains an additional element at the end. Let's "rename" it so that we remember which end it works with.

addToEnd : a -> Array a -> Array a
addToEnd = Array.push

For convenience, we will define an analogous function called removeFromEnd that retrieves ("pops") the last element and creates a new array without it.

removeFromEnd : Array a -> Maybe (a, Array a)
removeFromEnd array =
  let n = Array.length array in
  case Array.get (n-1) array of
    Nothing   -> Nothing
    Just last -> Just (last, Array.slice 0 -1 array)

We will start with the simple operators.

empty : Heap comparable
empty =
  Heap Array.empty

isEmpty : Heap comparable -> Bool
isEmpty (Heap array) =
  Array.isEmpty array

Notice that the definition of isEmpty uses pattern matching to, at once, deconstruct the argument value and bind the underlying array value to the variable array. This definition is equivalent to all of the following:

isEmpty h = case h of Heap array -> Array.isEmpty array

isEmpty = \h -> case h of Heap array -> Array.isEmpty array

isEmpty = \h -> let (Heap array) = h in Array.isEmpty array

isEmpty = \(Heap array) -> Array.isEmpty array

Because values of type Heap can only be constructed by the Heap data constructor, we can use patterns in bindings (rather than case expressions) and be sure that we have covered all patterns.

The findMin implementation is also straightforward.

findMin : Heap comparable -> Maybe comparable
findMin (Heap array) =
  Array.get 0 array

Insertion

Let's now look at the first non-trivial operator, inserting an element into a Heap. The idea is to add the element to the next position in the complete binary tree (conveniently represented as the last element in the array) and then "bubble" or "percolate" the element up the tree until it is no longer bigger than its parent.

insert : comparable -> Heap comparable -> Heap comparable
insert x (Heap array) =
  array
    |> addToEnd x
    |> bubbleUp (Array.length array)
    |> Heap

As we manipulate the underlying Array representations, we will make sure to always access elements within bounds. So we define an "unsafe" version of get:

justGet : Int -> Array a -> a
justGet i array =
  case Array.get i array of
    Nothing -> Debug.todo ("justGet: " ++ Debug.toString i)
    Just x  -> x

We also define a helper function to swap two elements in an Array:

swap : Int -> Int -> Array a -> Array a
swap i j array =
  let ai = justGet i array in
  let aj = justGet j array in
  array |> Array.set i aj |> Array.set j ai

The bubbleUp function is defined to swap node i with its parent node (i-1)//2 if the parent is larger and, if so, recursively traverses up the tree. Notice how bubbleUp is a function that works directly with the internal representation and is not exposed to clients.

parentIdx i = (i-1) // 2

bubbleUp : Int -> Array comparable -> Array comparable
bubbleUp i array =
  let
    child  = justGet i array
    parent = justGet (parentIdx i) array
  in
  if parent <= child
    then array
    else array |> swap i (parentIdx i) |> bubbleUp (parentIdx i)

Let's now consider the worst-case time cost of the insertion algorithm. The insert function pushes an element on the end (assumed to run in O(1) time), computes the length of the Array (also assumed to run in O(1) time), and calls bubbleUp.

The bubbleUp function makes use of justGet and swap. Array.get (and thus justGet) and Array.set are assumed to run in O(1) time. The swap function makes several calls to Array.set and justGet, taking O(1) time. The bubbleUp function visits at most O(log n) elements because the index i is divided in half before each recursive call. Therefore, there are O(log n) calls to bubbleUp, each of which performs O(1) work. So the running time of bubbleUp, and hence insert, is O(log n).

Deletion

To delete the minimum element, which is stored at index 0, we first overwrite the root with the value currently stored in the last position (conveniently stored in the last element of the Array). We pop the last element because its value is now stored in the root and then "bubble" or "percolate" this value down as long as necessary. When recursively bubbling down, we choose the child tree whose root is smaller in order to maintain the heap order property.

deleteMin : Heap comparable -> Maybe (comparable, Heap comparable)
deleteMin (Heap array) =
  case removeFromEnd array of
    Nothing ->
      Nothing
    Just (lastElement, choppedArray) ->
      let
        minElement =
          justGet 0 array
        newArray =
          choppedArray
            |> Array.set 0 lastElement
            |> bubbleDown 0
      in
      Just (minElement, Heap newArray)

If choppedArray has no elements, then the minimum element is the last element, so there is nothing left to bubble down. Rather than checking for this case explicitly, however, we rely on the fact that Array.set and bubbleDown leave the array unchanged if the index argument (i.e. 0) is out of bounds.

For a given index i, the index of the left child is j = 2*i + 1 and of the right child as k = 2*i + 2.

leftIdx i  = 2*i + 1
rightIdx i = 2*i + 2

If there is no left child (in which case, neither is there a right one), then there is nothing left to bubble down:

bubbleDown i array =

  if leftIdx i >= Array.length array then
    array

If there is a left child but not left child, we need to compare the value of the left child to the current to decide whether or not to bubble down to the left:

  else if rightIdx i >= Array.length array then 
    let
      this  = justGet i array
      left  = justGet (leftIdx i) array
    in
    if this <= left
      then array
      else bubbleDownLeft ()

If both children are defined, there are several cases:

    let
      this  = justGet i array
      left  = justGet (leftIdx i) array
      right = justGet (rightIdx i) array
    in
    if this <= left && this <= right then array
    else if left < this && this <= right then bubbleDownLeft ()
    else if right < this && this <= left then bubbleDownRight ()
    else {- left <= this && right <= this -}
      if left <= right
        then bubbleDownLeft ()
        else bubbleDownRight ()

In the above, we defined a few helpers to reduce some of the commonalities in bubbling down to the left and right in various cases:

  let swapAndRecurse j = array |> swap i j |> bubbleDown j in
  let bubbleDownLeft () = swapAndRecurse (leftIdx i) in
  let bubbleDownRight () = swapAndRecurse (rightIdx i) in

(Note: how we use functions with dummy arguments so that we can give names to expressions that we may want to evaluate in one or more subsequent expressions. Functions with dummy arguments, called thunks, are often useful in functional programming to "delay" the evaluation of an expression until later.)

Nevertheless, this has become overly verbose. Instead, we can define a single operation (called smaller below) to check if each child index is in bounds and, if so, whether its value is less than the others.

bubbleDown : Int -> Array comparable -> Array comparable
bubbleDown i array =
  let
    n =
      Array.length array
    smaller j acc =
      if j < n && justGet j array < justGet acc array
        then j
        else acc
    smallest =
      i |> smaller (leftIdx i) |> smaller (rightIdx i)
  in
  if i == smallest
    then array
    else array |> swap i smallest |> bubbleDown smallest

If the value at index i is smaller than both children, the output array is unchanged. Otherwise, the value at index i is swapped with the smaller value among the two subtrees and bubbleDown recurses on that subtree.

The analysis of bubbleDown and deleteMin is similar to the insertion algorithm, resulting in a O(log n) worst-case time cost.

Merging

We will not go through the algorithm for merging heaps that are as arrays in this course. But you may be curious to go read about it if you have not seen it before.

In our Elm implementation, we will pretend that we implement the merge function faithfully, but instead we will always trigger a run-time error.

merge _ _ = Debug.todo "merge not implemented"

The ArrayHeap.elm file contains the implementation above.

Testing

The ArrayHeapTest.elm file provides a modicum of testing. (The "Elm Test" portions of this file relate to the optional Testing notes.)

> import ArrayHeapTest exposing (..)

> simpleHeapSort [100,123,235,235,1,999,998]
[1,100,123,235,235,998,999] : List number

And the starter files (ArrayHeap.elm and ArrayHeap.html) also supported a bit of testing.

In general, it would be a good idea to design your own testing infrastructure to help sanity check your implementations of the various data structures we will study. Later we will learn about the Elm Test package to help with this goal.



Polymorphic Types for Heaps

Haskell-style Type Classes

Recall that Elm provides the special polymorphic type variable comparable. We may want to define interfaces in terms of types that come with operators besides those specified in the comparable "type class." Elm does not provide a way for the programmer to define an interface that describes a set of types.

In Haskell, user-defined type classes can be used to specify the heap abstraction as follows.

class Ord a where
  (<)  :: a -> a -> Bool
  (>=) :: a -> a -> Bool
  (>)  :: a -> a -> Bool
  (<=) :: a -> a -> Bool
  max  :: a -> a -> a
  min  :: a -> a -> a

class Ord a => Heap a where
  empty     : Heap a
  isEmpty   : Heap a -> Bool
  findMin   : Heap a -> Maybe a
  insert    : a -> Heap a -> Heap a
  deleteMin : Heap a -> Maybe (a, Heap a)
  merge     : Heap a -> Heap a -> Heap a

The second class definition says that a type a "is a" Heap if (1) a "is a" Ord and (2) the six functions specified can be defined for that type. Particular implementations are then defined to "implement" the Heap "interface".

type ArrayHeap a = Heap (Array a)

instance ArrayHeap (Heap a) where
  insert = ...
  ...

For simplicity, the definition above does not mention the Haskell Eq type class.

ML-style Modules

In Standard ML and OCaml, signatures are used to describe modules, which are structures that contain values and types.

signature ORDERED =
sig
  type T

  val eq  : T -> T -> bool
  val lt  : T -> T -> bool
  val leq : T -> T -> bool
end

signature HEAP =
sig
  structure Elem : ORDERED

  type H

  val empty     : H
  val isEmpty   : H -> bool
  val insert    : Elem.T -> H -> H
  val merge     : H -> H -> H
  val findMin   : H -> Elem.t option
  val deleteMin : H -> (Elem.T * H) option
end

An implementation satisfies the Heap signature by defining a functor that takes a module as an argument and returns another module. Note that these ML functors are the not same thing as Haskell Functors.

functor MyHeap (Element : ORDERED) : HEAP =
struct
  structure Elem = Element

  datatype H = ...

  fun insert = ...
  ...
end