CMSC 11700-01 Spring 2002
Sharon Salveter

Homework #1
Due Tuesday 9 April at the beginning of class

Markov Algorithms

Markov algorithms transform strings by substituting one substring for another according to a set of rules (often called productions.) Here is such a set:

NumLeftRightNextRule
0  [ABBBAABA1
1[AAA[A1
2[ABB[A1
3[BAA[B1
4[BBB[B1
5[[:6
6A[[A6
7B[[B6
8[:  10
9[[1
10: 10

In general, to execute rule number n, you search for Left(n) in MainString. If you find it, replace the leftmost occurrance of Left(n) by Right(n). This operation might lenghten or shorten MainString. The next rule is NextRule(n). If you fail to find Left(n) in MainString, the next rule is n+1. If there is no such rule, you are done.

To start the Markov algorithm, set MainString to empty. By convention, the first rule, number 0, has an empty Left part. Therefore, the first rule always succeeds, since it is easy to find the empty string, Left(0), in the initially empty MainString. You first execute rule 0, and its effect is to set MainString to the Right part (in this example [ABBBAABA). You then proceed to NextRule(0) (in this example, number 1).

Here is a partial trace of the successful rules for the example above:

ruleresulting MainString
0[ABBBAABA
2B[ABBAABA
2BB[ABAABA
2BBB[AAABA
1BBBA[AABA

The algorithm continues for many steps.

Write a program that readsand then executes one Markov algorithm. The input format is as follows:
First, on a separate line, comes an integer that tells how many rules to expect. (If the integer is n, the rules will be numbered 0 through n-1.) [Hint: think dynamic array.] Then come the rules themselves, each on a separate line. Each rule is represnted by two strings and an integer. Strings are delimited in the input by single quotation marks ('). The first string is the left part, the second the right part, and the integer is the next rule number. A single blank separates the left part from the right, and the right part from the next rule number. Rules arrive in order, starting at number 0. For example,

2
'' '[ABBBAABA' 1
'[AA' 'A[A' 1

represents the Markov algorithm consisting of just the first two rules of the above example. (In the second line, the left string is the empty string, which is represented by two successive single quote marks, nota single double quote mark.)

The program you turn in must run on the test data I provide, which will be available on the website Monday 8 April.

Your program should first list the input rules in a tabular format similiar to that shown above. It should then perform the corresponding Markov algorithm, printing out each production number that succeeds and the state of MainString after the substitution has occurred, just as in the example above.

You may assume that the only characters that appear in the rules are letters (allletters, not just A and B) and the special characters '[' and ':'. Include a cut-off (say at 100) on the number of substitutions allowed. Print an error message and halt if this cut-off is exceeded. Anything left unspecified here may be specified however you want, but your program must work on reasonable data. In particular, you may notassume that the Left parts are always non-empty. The empty string matches MainString before the first character of MainString.

You must implement a linked list solution. You may notuse the C++ standard string class, or any STL classes. Don't, of course, use the book's implementation of a list ADT. The idea is that you will implement MainString as a linked list of nodes, where the data in each node is a single character. (See text page 200 - 205.) You should also manage your own availalbe space, invoking new only when necessary. Try to make you string matching algorithm efficient for cases where MainString is long and the search fails (the usual case).

********************************************

Extra credit: Extend your program by implementing special charactersin Left sides of productions.

"." matches any character,
"*" matches as many of the previous character as possible (maybe zero).

The pattern "AB*.*C." would match an A followed by any number of Bs, followed by any number of any character, then a C, then any character. For example, it would match "ABBBDDCA" and "ACB", but would not match "ABBBB" or "ABC".