Admin: Expect hw5 written on Friday, including a KMP question like:
   For the pattern string ABABCABCD:
    (a) diagram the KMP DFA for this pattern.
    (b) diagram the KMP NFA for this pattern.
        (Or equivalently, give the next[] matrix.)

For lack of time, we will not consider the remaining two string
matching algorithms in Section 5.3 (Rabin-Karp and Boyer-Moore).
We will also skip 5.4 (regular expressions and NFA's).

Next topic: 5.5, Compression

We mentioned text compression as an application of BWT.

Slides (as usual): 55DataCompression.pdf

   Today we talk about some classic "lossless" compression algorithms:
     Run-Length encoding: a warmup (repeat bits, or other chars)
     Huffman: Suppose we want to encode symbols by a prefix-free code,
        and we know the symbol frequencies.  This 'greedy' algorithm
        computes the optimal code (as a binary trie).
     LZW: Transmit a stream of text, looking for repeated substrings
        from the recent past. This is dynamic coding, we do not need
        to see the entire input before we can start transmitting.

Remark: textbook presents Huffman as two pass (i.e. look at entire
input before you generate the code), but there are also dynamic
variants of Huffman coding.

We have demos for the last two.

There is working code, for example:
  cd ../book
  cat abra.txt | run LZW - | run LZW +