An introduction to trees. These are recursive structures. Organizing information that fits in a hierarchy. Think of directories: MacLab Resources Applications Courses Utilities ... Spring 1998 Old Courses CS102-01 CS102-01 CS 111-01 CS117-01 CS117-02 Drop Box Programs Assignment2 Let's introduce some notation. Node Root Arc Parent Child Leaf Interior Node Subtree Height Binary trees. There's just a left branch and a right branch. The advantage is the structure is simple. Implementing sets as binary trees gives n performance for both union and intersection...follows the way that merge does. We can have logn retrieval from the set. Number of leaves for height n is 2^n. Number of nodes in a tree of height n is 2^(n+1) - 1. Implementing as vectors, let the children for node i be stored at 2(i+1) and 2(i+2) respectively. Best for a complete tree. Binary trees are good structures for search. Think about the process of binary search. If I know that the value of the elements on the left is always less than or equal to the value of elements on the right, then I can at least determine at each level which side the element I'm looking for is. Further, if I know that the elements on the left is always equal to or at most one more or less than the number of elements on the right, then I cut the number of elements to search through in half at each level. Binary search. Ok let's assume that we have a sorted vector: 5 8 23 67 77 99 106 130 we want to find out if 23 is in the set. Since it's sorted, we'll split go to the mid point, in this case 77, then it's a simple matter of asking if we should look to the left or right of the vector. We should look to the left. So we can do a binary search on the left half of the array. So this is a --- procedure. Ok, so we can use iterators to do this. template Iterator lower_bound (Iterator start, Iterator stop, ValueType & value) { unsigned low= 0; unsigned max= stop - start; unsigned high= max; while (low < high){ unsigned mid = (low + high) /2; if (start[mid] < value) low = mid + 1; else high = mid;} if (low < max) return start + low; return stop;} Binary search trees: 1. each node has associated with it some key (e.g. integer value) 2. key values are unique 3. at any node key value is > all keys in left subtree, and less than all keys on the right. Looking for a value is easy if we apply this strategy. What about insertion into the tree? Let's go through creating a binary tree using the dq data, one at a time: 11 37 0 9 5 We'll say that a binary tree is balanced when: 1) number of elements on either side of root is within one of the elements on the other side of root. The necessity to add data to tree means that vector is not always the best implementation. Alternatively template class node { public: node (T & v, node * par, node * left, node * right) : value(v), parent(par), leftChild(left), rightChild(right) { } // operations node * copy ( node *); void release (); int count (T & testElement); void insert (node *); node find(T &); int size (); node * merge (node *, node *); // data fields T value; node * parent; node * leftChild; node * rightChild; }; template int node::size() // count number of elements in subtree rooted at node { int count = 1; if (leftChild != 0) count += leftChild.size(); if (rightChild != 0) count += rightChild.size(); return count; } template void node::insert (node * newNode) // insert a new element into a binary search tree { if (newElement < value) if (leftChild != 0) leftChild->insert (newNode); else { newNode->parent = this; leftChild = newNode; } else if (rightChild != 0) rightChild->insert (newNode); else { newNode->parent = this; rightChild = newNode; } } template node * node::find (T & value) // search for a node with a particular key in a binary tree. { if (this->value == value) return this; if (this->value < value) { if (this->leftChild == 0) return this; this->leftChild.find(value);} else { if (this->rightChild == 0) return this; this->rightChild.find(value);}} The assumption breaks down if the tree isn't balanced: 1 2 3 4 5 6 inserted in order. The tree is balanced if the left node and right node count differ by at most one. There are strategies to balance trees as elements are inserted, for example as implemented in AVL. We'll get back to this in a bit. But right now what I want to do is go to parse trees. Operator precedence parsing. Let's go over an example. Suppose you wanted to implement a calculator. Instead of RPN, what about "normal" infix notation: 6 + 7 Parsing. So the heart of the exercise is the process of reading in an expression, figuring out if it's vald, and then executing (or running it). And this can be expressed in a simple loop: void main(void) { expressionInfo *exp; int value; while(TRUE){ exp= ReadExp(); val= EvalExp(exp); cout << "Value is " << value; }} So the step of evaluation is taking the expression in the appropriate order and computing the appropriate result. So what does this Reading process consist of? 1. Read in the input. 2. Lexical analysis: divide an input line into tokens. 3. Parsing. Do the tokens make sense? Do the tokens constitue a legal expression and if so, what is it's structure. Here's an expression to consider for Monday: Consider an expression y = 3 * ( x + 1 ) How do we develop a parse of it? How do we verify that it constitutes a valid expression (whatever that is). Well let's say an expression is 1. an INTEGER constant 2. a VARIABLE value 3. An EXPRESSION enclosed in parentheses 4. A sequence of two EXPRESSIONS separated by an OPERATOR.