CS221/321 Lecture 3, Oct 11, 2011
----------------------------------

Preview
-------
Continuing to explore dynamic semantics (evaluation) of SAE.
  1. Contextual reductions (system CR)
  2. Abstract machine (system CK)

Adding variables
  1. Let expressions: local variable bindings (Lecture 4)
    * substitutions
    * free variable capture
    * alpha conversion (change of bound variables)
    * static semantics (forbiding free variables)
      - values still all in Nat
    * environments (lazy substitutions)
    * extension of dynamic semantics (big-step, small-step, etc.)

  2. functions
    * lambda abstraction and function application
    * beta reduction
    * call-by-value and call-by-name
    * recursive functions (tying a knot)
    * static semantics?
      - functions are a new kind of value (heterogeneous values)
    * extension of dynamic semantics (big-step, small-step, etc.)
      - dynamic type checking

Adding types
  1. new kinds of values
     * e.g. booleans, pairs, ...
  2. type errors
  3. a type language
  4. static semantics
     * static type checking
  5. relating static semantics and dynamic semantics
     * type soundness (Progress and Preservation Theorems)

More language features

  * references, stores, side-effects
  * continuations
  * exceptions
  * coroutines and threads
  * objects
  * modules

Proof assistants

======================================================================

SAE dynamic semantics (continued)

Terminology:  An expression that cannot be further evaluated (by
the rules of a given dynamic semantics) is called a "value".  For
SAE, the value expressions are those of the form Num(n).


Contextual reductions
---------------------

New reduction rules

Take the reduction rules from small-step semantics:

  (1)  Plus(Num n1, Num n2)  ↦  Num p   where p = n1 + n2

  (2)  Times(Num n1, Num n2)  ↦  Num p   where p = n1 * n2
  
Recall that expressions matching either of the left-hand-sides
are called "redexes".

We can factor any non-value expression into a redex and a context.
E.g.:

    Plus(2, Times(13,4))  =  C[Times(13,4)]
            -----------        -----------

where 

    C = Plus(2, [])

Here the "[]" is called the hole. It is understood to be filled by a
redex.

Contexts C are defined by an abstract grammar:

    C := []
      |  Plus(C, e)  |  Plus(Num(n), C)
      |  Times(C, e) |  Times(Num(n), C)

Given a non-value expression e (i.e. an expression not of the form
Num(n)), there is a unique way of expressing e as C[r] where r is a
redex subexpression. The hole will identify the leftmost redex when e
contains more than one redex (why?).

Now there is just one general contextual reduction rule instead of the
4 search rules in the earlier small-step semantics.

  (3)  e = C[r] ↦ C[r'] = e'    where r ↦ r'

where r is a redex and r' is the expression it reduces to by
rule (1) or (2) above.  Observe that (3) applies even when e itself
is a redex, since in that case the context C will be [].

Exercise: How do we show that the factorization of an expression into
the form C[r] is unique, and that the subterm r is the leftmost redex?


After a given reduction by rule (3), the resultant expression e' may be
further reduced, unless it is a value.  To reduce e' by (3), it has to be
refactored into the form C'[r''] for some new context C' and redex r''.

Note that in general, r'' is not r'. In fact for SAE, r' will always
be a value, so r'' will never be the same as r'.  In more interesting
languages, such as the λ calculus, r' may contain a redex, and may even
be a redex itself.

----------------------------------------------------------------------
Fig 1.5: Example: evaluate 2 + ((5 + 8) * 4) using CR
----------------------------------------------------------------------

   e = Plus(2,Times(Plus(5,8),4))    [omitting Num constructors]

     = C1[Plus(5,8)]  where C1 = Plus(2,Times([],4))

     ↦ C1[13]

     = Plus(2,Times(13,4))

     = C2[Times(13,4)]  where C2 = Plus(2,[])

     ↦ C2[52]

     = Plus(2,52)
 
     = C3[Plus(2,52)]  where C3 = []

     ↦ C3[54]

     = 54
----------------------------------------------------------------------

See how each transition involves refactoring the expression into
context and redex.

Comparing a reduction context with the corresponding transition derivation,
such as Fig. 1.3, we see that each layer used to construct the context 
corresponds to a search rule instance in the transition derivation.

Question: Is the contextual reduction system given by rules (1), (2),
and (3) equivalent to the earlier big-step and small-step semantics.
If so, how do we prove it?

To Summarize:

----------------------------------------------------------------------
Fig 1.6: SAE[CR] - Contextual Reductions for SAE
----------------------------------------------------------------------
Contexts:

    C := []
       |  Plus(C, e)  |  Plus(Num(n), C)
       |  Times(C, e) |  Times(Num(n), C)

Redex rules:

  (1)  Plus(Num n1, Num n2)  ↦  Num p    where p = n1 + n2

  (2)  Times(Num n1, Num n2)  ↦  Num p   where p = n1 * n2

Contextual reduction:

  (3)  C[r] ↦ C[r']  where r a redex and r ↦ r'
----------------------------------------------------------------------


----------------------------------------------------------------------
Implementation: Evaluation by contextual reduction

This is how we can implement the factoring of an expression into a
context and redex in SML and use that for evaluation.

----------------------------------------------------------------------
Program 1.2: Contextual reduction
----------------------------------------------------------------------
(* representation of contexts
 * -- why can't we just use Plus and Times again? *)
datatype context
  = Hole
  | CPlusL of context * expr
  | CPlusR of int * context     (* expr should be a value *)
  | CTimesL of context * expr
  | CTimesR of int * context    (* expr should be a value *)
 
(* factor: expr -> (context * expr) option
 * factor a nonvalue expression into a context and redex,
 * returning NONE if expression is a value *)
fun factor (e as Num _) = NONE
  | factor (e as Plus(Num _, Num, _)) = SOME(Hole, e)
  | factor (e as Times(Num _, Num, _)) = SOME(Hole, e)
  | factor (Plus(Num n, e2)) = 
    (case factor e2  (* e2 not Num _ *)
       of SOME(c,e') => SOME(CPlusR(n, c), e')
        | NONE => raise Fail "factor")
 | factor (Times(Num n, e2)) = 
    (case factor e2  (* e2 not Num _ *)
       of SOME(c,e') => SOME(CTimesR(n, c), e')
        | NONE => impossible "factor")
 | factor (Plus(e1,e2)) = 
    (case factor e1  (* e1 not Num _ *)
       of SOME(c,e') => SOME(CPlusL(c, e2), e')
        | NONE => impossible "factor")
 | factor (Times(e1,e2)) = 
    (case factor e1  (* e2 not Num _ *)
       of SOME(c,e') => SOME(CTimesL(c, e2), e')
        | NONE => impossible "factor")

(* recombine a context and redex expr into an expr *) 
fun wrap (Hole,e) = e
  | wrap (CPlusL(c,e'), e) = Plus(wrap(c,e), e')
  | wrap (CPlusR(n,c), e) = Plus(Num n, wrap(c,e))
  | wrap (CTimesL(c,e'), e) = Times(wrap(c,e), e')
  | wrap (CTimesR(n,c), e) = Times(Num n, wrap(c,e))

fun reduce (Plus(Num m, Num n)) = Num(m+n)
  | reduce (Times(Num m, Num n)) = Num(m*n)
  | reduce _ = raise Fail "reduce - nonredex"

fun transition e =
    case factor e
      of NONE => NONE
       | SOME(c,r) => SOME(wrap(c,reduce r))

fun eval e =
    case transition e
      of NONE => e  (* e is already a value *)
       | SOME e' => eval e'
----------------------------------------------------------------------


======================================================================


Abstract Machine (the CK machine)

The factoring and wrapping at each transition is quite inefficient,
especially for a large expression. How can we avoid all this redundant
work?  Refocus by moving incrementally around the expression tree,
maintaining a factoring into focus expression and context.

Lets view contexts as being built-up in layers.  For instance, in
the example above for 2 + ((5 + 8) * 4), the context (supressing Num
constructors for brevity)

   C1 = Plus(2,Times([],4))

for the (5 + 8) redex consists of two nested layers:

   Plus(2, [])
   Times([], 4)
   ------------
   Plus(5, 8)

The line separates the stack of context layers from the focus 
expression, i.e. the redex.

Reducing the redex evaluates the first argument of the inner Times
layer. The next task is to shift the focus to the next redex, which
will be the Times expression. We can perform this shift
incrementally:

   Plus(2, [])     Plus(2, [])     Plus(2, [])
   Times([], 4)    Times([], 4)    Times*(13, [])
   ------------    ------------    --------------
   Plus(5, 8)      13              4
  
After evaluating the redex Plus(5,8), we shift focus to the right
argument of Times (4), which is already evaluated, but we annotate
the Times constructor with an asterisk to indicate that we know
that the left argument has been evaluated.

In the next step we recognize that the right argument of the Times*
context layer is a value, so the Times* is a redex and can be reduced:

   Plus(2, [])       Plus(2, [])     Plus(2, [])
   Times*(13, [])    -----------     -----------
   --------------    Times(13,4)     52
   4

If we were paying attention as we constructed the context as a set
of layers, we would have noticed that the left argument of the outer
Plus layer was a value, so we should have used the anotated Plus*
form to indicate this:

   Plus*(2, [])      Plus*(2, [])    54
   Times*(13, [])    -----------
   --------------    52
   4

Here are a set of rules for carrying out the context analysis
and reductions:

----------------------------------------------------------------------
Fig 1.7: SAE[CK] - CK-machine for SAE
----------------------------------------------------------------------
Frames (context layers): 

     F ::= Plus([],e2) | Plus(Num(n), []) |
           Times([], e2) | Times(Num(n), [])

Stack/Context (innermost layers first):

     k = nil | F :: k

Machine state:  (e,k) ∈ expr * k

Transition judgement:  (e,k) => (e',k')

Initial states: 

   (e, [])   (where e is an expression to be evaluated)

Final states:

   (Num(n), [])   (where n ∈ Nat is the result value)
 
Transition rules:  (read n as Num(n))
 
(1)  (Plus(e1,e2), k)      =>  (e1, Plus([],e2)::k)
(2)  (Times(e1,e2), k)     =>  (e1, Times([],e2)::k)

(3)  (n, Plus([],e2)::k)   =>  (e2, Plus*(n,[])::k)
(4)  (n, Times([],e2)::k)  =>  (e2, Times*(n,[])::k)

(5)  (n, Plus*(m,[])::k)   =>  (p, k)    where p = m+n
(6)  (n, Times*(m,[])::k)  =>  (p, k)    where p = m*n

----------------------------------------------------------------------

In the name CK for this system, C stands for "Control", which is
the current focus expression that we are trying to evaluate.  K stands
for "Context" or "Continuation", which determines the context in which
we are evaluating the focus expression. The machine works by first
moving the focus inward until it finds a redex. Then it reduces that
redex to a number, and shifts the focus (outward and rightward)
looking for the next redex.

Transition rules (1) and (2) are analysis rules, removing layers from
the expression and adding them to the context.  Rules (3) and (4) are
shift rules, moving the focus after we have (a) found a constant
subexpression that is already evaluated, or (b) have just finished 
reducing a redex to a constant.  Rules (5) and 6) are reduction rules
used when we have discovered a redex.


----------------------------------------------------------------------
Example: 2 + ((5 + 8) * 4)

Here we label the transitions with A for analysis, S for shift,
and R for reduction (of a redex).

(Plus(2, (Times(Plus(5,8), 4))), []) =>           [A]

(2, Plus([], (Times(Plus(5,8), 4)))::[]) =>       [A]

(Times(Plus(5,8),4), Plus*(2,[])::[]) =>          [A]

(Plus(5,8), Times([],4)::Plus*(2,[])::[]) =>      [A]

(5, Plus([],8)::Times([],4)::Plus*(2,[])::[]) =>  [S]

(8, Plus*(5,[])::Times([],4)::Plus*(2,[])::[]) =>  [R]

(13, Times([],4)::Plus*(2,[])::[]) =>              [S]

(4, Times*(13,[])::Plus*(2,[])::[]) =>             [R]

(52, Plus*(2,[])::[]) =>                           [R]

(54, [])    ( = 54)
----------------------------------------------------------------------


----------------------------------------------------------------------
Program 1.3: CK machine for SAE  (SAE-context-reduction.sml)
----------------------------------------------------------------------
(* stack frames used to build evaluation contexts *)
datatype frame
  = PlusL of expr
  | PlusR of int
  | TimesL of expr
  | TimesR of int

type context = frame list    (* a stack of frames *)

(* CK abstract machine states *)
type state = expr * context

(* runCK : state -> int *)
fun runCK (Num n, []) = n
  | runCK (Plus(e1,e2), k) = runCK(e1, PlusL(e2)::k)
  | runCK (Num m, PlusL(e)::k) = runCK(e, PlusR(m)::k)
  | runCK (Num n, PlusR(m)::k) = runCK(Num(m+n), k)
  | runCK (Times(e1,e2), k) = runCK(e1, TimesL(e2)::k)
  | runCK (Num m, TimesL(e)::k) = runCK(e, TimesR(m)::k)
  | runCK (Num n, TimesR(m)::k) = runCK(Num(m*n), k)

fun eval e = runCK(e,[])

----------------------------------------------------------------------


Equivalence of SAE[CR] and SAE[CK].
-----------------------------------

Note: It would have been somewhat simpler to ask for the equivalence
of the ordinary small-step transition relation (SEA[SS]) and the 
CK machine, since we haven't seen examples before where we have to
reason about CR transitions.

Here is the formal statement of equivalence:

Theorem 3.1. ∀n. e ↦! Num(n) in CR <=> (e,[]) =>* (Num(n), []).

Intuition: The evaluation under CR and under CK are not synchronized
step by step; that is, the transition sequence for the CK machine has
more steps than that for the CR system. But there is a one-to-one
correspondenc between the CR transitions and the redex reduction
transitions performed by the CK machine. In between these reduction
transitions, the CK machine performs additional administrative steps
concerned with either (1) analyzing an expression to find the next
redex to reduce (these are analagous to SEA[SS] search rules), or
(2) returning partial results and shifting focus to another
subexpression.

The file CR-CK-correspondence.pdf gives a picture of how CR and
CK transitions are related.  Each CR transition can expand into 
several CK transitions that are required to move from the state
after a reduction to the next reduction.

It is clear that states in the CR system correspond simply to
expressions, which are factored into context and redex for each
transition.

It is convenient to define a function e$C that reconstructs an
expression from a context and and expression. We will treat the
notation C[e] as equvalent to e$C, but it is sometimes useful to have
an explicit operator symbol for this wrapping function.

Defn 4: $ : expr * context -> expr
   e $ [] = e
   e $ Plus(C,e2) = Plus(e$C, e2)
   e $ Plus(n1, C) = Plus(n1, e$C)
   e $ Times(C,e2) = Times(e$C, e2)
   e $ Times(n1, C) = Times(n1, e$C)

To say that (r,C) is a factorization of expression e means that e =
r$C, and r is a redex. We also know, by Homework 2.2 above, that for
any nonvalue SAE expression e there is a unique factorization of e
into a redex/context pair (r,C). Call the function from e to its
unique factorization factor(e). We can define it as follows:

Defn 1:
   factor : expr -> expr * context 

   factor(Num n) = undefined    (n a number)
   factor(Plus(n1,n2)) = (Plus(n1,n2), [])
   factor(Plus(n1,e2)) = (r, Plus(n1,C))  where (r,C) = factor(e2)
   factor(Plus(e1,e2)) = (r, Plus(C,e2))  where (r,C) = factor(e1)
   factor(Times(n1,n2)) = (Times(n1,n2), [])
   factor(Times(n1,e2)) = (r, Times(n1,C))  where (r,C) = factor(e2)
   factor(Times(e1,e2)) = (r, Times(C,e2))  where (r,C) = factor(e1)

Then factor and $ are inverses on the set of closed, nonvalue
expressions:

Lemma 0: (1) If e not a value, and factor(e) = (r,C), then e = r$C.
 (2) If (r,C) is a redex/context pair, then (r,C) = factor(r$C).

Proof: (1) by straightforward induction on the structure of e.
(2) by straightforward induction on the structure of C.

-----------------

CK machine states can also be translated into expressions. We can
express this translation by an infix operator @ that takes the
expression ("control") and stack ("context") of a CK machine state and
produces the corresponding full expression (here supressing the Num
constructors for brevity):

Defn 2:
   @ : expr * stack -> expr

   e @ [] = e
   e1 @ (Plus([],e)::k) = Plus(e1,e2) @ k
   e2 @ (Plus*(n1,[])::k) = Plus(n1,e2) @ k
   e1 @ (Times([],e)::k) = Times(e1,e2) @ k
   e2 @ (Times*(n1,[])::k) = Times(n1,e2) @ k
   
By performing some evaluation experiments with CK, we can observe
that if (e,k) => (e',k') is an administrative (nonreduction)
transition, then e@k = e'@k', i.e. the full expression corresponding
to a machine state does not change in an administrative transition.
It is also the case that if (e,k) => (e',k') is a reduction
transition, it will be the case that e@k factors into a context/redex
pair C[r], which transitions under SAE[CR] to C[r'], where r' is the
contractum of r, and then it will turn out that C[r'] represents the
same expression as e'@k'.  Actually, we could say even more: at a
reduction transition (e,k) => (e',k'), we could derive the context
C such that e@k = C[r] from the stack k, by translating the frames
of tail(k) into context layers (where tail(k) is the stack resulting
from removing the first frame of k). Here is the function mapping
a CK stack k to a CR context C.

Defn 3a:
   cont : stack -> context

   cont([]) = []    (empty stack yields the hole context)
   cont(Plus([],e2)::k) = C[Plus([],e2)] where C = cont(k)
   cont(Plus*(n1,[])::k) = C[Plus(n1,[])] where C = cont(k)
   cont(Times([],e2)::k) = C[Times([],e2)] where C = cont(k)
   cont(Times*(n1,[])::k) = C[Times(n1,[])] where C = cont(k)

where the notation (e.g.) C[Plus([],e2)] represents building a
context from the inside out.  We could also express this by building
the context directly using the reverse of the stack:

Defn 3b:
   cont' : stack -> context    (* stack reversed *)

   cont'([]) = []
   cont'(Plus([],e2)::k) = Plus(cont'(k),e2)
   cont'(Plus*(n1,[])::k) = Plus(n1,cont'(k))
   cont'(Times([],e2)::k) = Times(cont'(k),e2)
   cont'(Times*(n1,[])::k) = Times(n1,cont' k)

   cont(k) = cont'(rev k)

Since stacks are finite sequences, we can also view them as
being constructed from the right end by a "right-cons" operation
(sometimes called "snoc"). We'll use the infix symbol % for this
right-cons operation, with % being left associative while :: is
right associative:

    Plus*(n1,[])::(Plus([],e2)::[]) 
  = ([] % Plus*(n1,[])) % Plus([],e2)

Using % (snoc) we get the simplest definition of cont:

Defn 3c:
   cont : stack -> context

   cont([]) = []
   cont(k % Plus([],e2)) = Plus(cont(k),e2)
   cont(k % Plus*(n1,[])) = Plus(n1,cont(k))
   cont(k % Times([],e2)) = Times(cont(k),e2)
   cont(k % Times*(n1,[])) = Times(n1,cont k)

--------------------

Define CK "redex states" as those states (e,k) that match the lhs
patterns of rules =>(5) and =>(6) (Fig. 1.7). We can go further and
define a function factorCK that maps a redex state (e,k) to a
redex/context pair (r,C) such that e@k = C[r].

   factorCK : state -> expr * context
   factorCK(n, Plus(m,[])::k) = (Plus(m,n), cont(k))
   factorCK(n, Times(m,[])::k) = (Times(m,n), cont(k))


Having discovered all this, how do we set up the formal proof?  We could
try to attack it directly by a single induction over the structure of
the expression e. This would require several lemmas about how
transition sequences (both CR and CK) for argument subexpressions
could be lifted to produce transition sequences for compound Plus
and Times expressions.

Another approach is to break the proof of the equivalence into two
parts, for the left-to-right implication, and for the right-to-left
implication.

First we need a few useful lemmas relating @, stack concatenation
(k1^k2), % and $.  Stack concatenation is defined the same as list
concatenation:

   []^k = k
   (f::k1)^k2 = f::(k1^k2)


Lemma 1: e@(f::k) = (e@[f])@k.
Proof: Immediate from the defn of @, by simple case analysis on f.


Lemma 2: e@(k1^k2) = (e@k1)@k2
Proof: By induction on k1 (as a ::-list).
Base: k1 = []. 
   e@(k1^k2) = e@k2               [Defn ^]
             = (e@k1)@k2.         [Defn @]  [X]
Ind: k1 = f::k1'. 
   IH: ∀e.∀k2. e@(k1'^k2) = (e@k1')@k2
   e@(k1^k2) = e@((f::k1')^k2)    [Hyp.]
             = e@(f::(k1'^k2))    [Defn ^]
             = (e@[f])@(k1'^k2)   [Lemma 1]
             = ((e@[f])@k1')@k2   [IH]
             = (e@(f::k1'))@k2    [Lemma 1]
             = (e@k1)@k2.         [Hyp.]    [X]
[QED Lemma 2]
 

Lemma 3: e@(k%f) = (e@k)@[f].
Proof: By calculation:
  e@(k%f) = e@(k'^[f])     [Defns %, ^]
          = (e@k')@[f]     [Lemma 2]
[QED Lemma 3]


Lemma 4: For any e, k, e @ k = e $ cont(k)
Proof: by induction on k (as left recursive % list)
Base: k = []. e@k = e = e $ [].
Ind: k = k' % Plus([],e2).

  IH: e @ k' = e $ cont(k').

  e @ k = (e @ k') @ [Plus([],e2)]
        = Plus(e@k', e2)

  e $ cont(k) = e $ Plus(cont(k'), e2)   [Defn 3c]
              = Plus(e $ cont(k'), e2)   [Defn $]
	      = Plus(e@k', e2)           [IH]

Thus e@k = Plus(e@k',e2) = e % cont(k).  [X]
Inductive cases for other frames are similar.
[QED Lemma 4].

Now back to the main proof.  The following lemma relates redex
transitions in the CK machine to CR reductions on the corresponding
expression.

Lemma 5: If (e,k) => (e',k') by a reduction rule =>(5) or =>(6) of
SAE[CK], then factorCK(e,k) = (r,C) is a factoring such that 
e@k = C[r], and furthermore, e'@k' = C[r'] where r' is the contractum
of r (by rule (1) or (2) of SEA[CR]).

Proof: We prove this by cases on the reduction rule giving
(e,k) => (e',k'), or equivalently on the two rules defining the
function factorCK. The proofs for the two rules are essentially
the same, switching Plus and Times.

Case: Rule =>(5). Then (e,k) = (n, Plus(m,[])::k') for some n, m, and
k'. Let (r,C) = factorCK(e,k). Then r = Plus(m,n), a redex, C =
cont(k'), and r' = Num(m+n). 


Now we show that e@k = C[r] by calculation and Lemma 3.

   e@k = n @ (Plus(m,[])::k')
       = Plus(m,n) @ k'

   C[r] = Plus(m,n) $ C
	= Plus(m,n) $ cont(k')
     
But Plus(m,n)@k' = Plus(m,n)$cont(k') by Lemma 4. [X]

The resulting CK state is (Num(p), k'), where p=m+n. Num(p) = r',
the contractum of r by SEA[CR] Rule (1). Again we have

   e' @ k' = Num(p) @ k' = Num(p) % cont(k') = C[r'].

Inductive cases for the other frames are similar.
[QED Lemma 5].

Now we prove that for administrative (nonreduction) transitions, the
expression represented by CK states does not change.

Lemma 6: (e,k) => (e',k') by a nonredex rule => e@k = e'@k'.

Proof: By case analysis on the SAE[CK] rulse giving (e,k) => (e',k').
We'll do the cases of rules (1) and (3) of SEA[CK].

Case (1): Suppose (e,k) => (e',k') by rule (1):

(1)  (Plus(e1,e2), k)  =>  (e1, Plus([],e2)::k)

Then e@k = Plus(e1,e2)@k.
     e'@k' = e1 @ (Plus([],e2)::k)
           = Plus(e1,e2)@k.  [Defn @]

Thus e@k = Plus(e1,e2)@k = e'@k'.

Case (2): Suppose (e,k) => (e',k') by rule (3):

(1)  (n, Plus([],e2)::k)  =>  (e2, Plus(n,[])::k)

Then e@k = n @ (Plus([],e2)::k)
         = Plus(n,e2)@k
     e'@k' = e2 @ (Plus(n,[])::k)
           = Plus(n,e2)@k.         [Defn @]

Thus e@k = Plus(n,e2)@k = e'@k'.
[QED Lemma 6]

Corollary 7: If (e,k) => (e',k'), then either e@k = e'@k'
   or e@k ↦ e'@k' in SAE[CR].


Lemma 8: e ↦ e' (CR) ==>  ∀k. (e',k) =>* (Num(n),[]) ==>
         (e,k) =>* (Num(n), []).

Proof: By induction on the context C used in e ↦ e'.

Base: C = [], meaning that e is a redex. Lets assume that
e = Plus(n1,n2), so e' = p = n1+n2. Then lets assume that for
some stack k, (p, k) =>* (n,[]). We have

    (e,k) = (Plus(n1,n2), k)
         => (n1, (Plus([],n2)::k))
         => (n2, (Plus(n1,[])::k))
         => (p, k)                  [p = n1+n2]
         =>* (n, [])    [X]

Ind: e = C[r] where C = Plus(C1,e2).
Then e = Plus(e1,e2) where e1 = C1[r], and e' = C[r'], e1' = C1[r'].

IH: ∀k. (e1',k) =>* (n,[]) ==> (e1,k) =>* (n, []).

Suppose

    (e',k) =>* (n,[])

where e' = Plus(e1',e2) with e1' = C1[r']. Then

    (Plus(e1',e2), k) => (e1', Plus([],e2)::k) =>* (n,[])

Then by IH,

    (e1, Plus([],e2)::k) =>* (n,[])

    (e,k) = (Plus(e1,e2),k)
         => (e1, Plus([],e2)::k)
         =>* (n,[])                  [X]

Ind: e = Plus(n1,e2), C = Plus(n1,C2), e = C[r], e2 = C2[r],
     e' = C[r'], e2' = C2[r'].

IH: ∀k. (e2',k) =>* (n,[]) ==> (e2,k) =>* (n, []).

Suppose

    (e',k) =>* (n,[])

where e' = Plus(n1,e2') with e2' = C2[r']. Then

    (Plus(n1,e2'), k) => (e2', Plus(n1,[])::k) =>* (n,[])

Then by IH,

    (e2, Plus(n1,[])::k) =>* (n,[])

    (e,k) = (Plus(n1,e2),k)
         => (e2, Plus(n1,[])::k)
         =>* (n,[])                  [X]

The inductive cases for Times context constructors are similar.
[QED Lemma 8]

Now we are finally ready to prove the desired equivalence.

Theorem 5: e ↦* n (CR)  <==>  (e,[]) =>* (n,[]).

Proof: We prove the two implications separately.

Part 1(==>): We prove this by induction on the defn of ↦*.

Base: e = n (zero steps). Then (e,[]) =>* (n,[]) in zero steps as well.

Ind: e ↦ e' & e' ↦* n. 

  IH: (e',[]) =>* (n, [])

Then (e, []) =>* (n, []) by Lemma 8 and the IH. [X]

Part 2(<==): We prove a slightly stronger statement:

  ∀k. (e,k) =>* (n,[])  ==>  e@k ↦* n  (CR).

We prove this also by induction on =>*.

Base: (e,k) =>* (n,[]) in zero steps, meaning e = n and k = [].
Then e@k = e ↦* n in zero steps as well.

Ind: Suppose (e,k) => (e',k') and (e',k') =>* (n,[]).

IH: e'@k' ↦* n.

By Lemma 6, e@k = e'@k' or e@k ↦ e'@k'. In the first case, we
have e@k ↦* n by the IH. In the second case we have e@k ↦* n
by IH and the definition of ↦*. [X]

Now we get the <== for the Theorem by taking k = [] and noting
that e@[] = e.
 
[QED]