Admin: Expect hw5 written on Friday, including a KMP question like: For the pattern string ABABCABCD: (a) diagram the KMP DFA for this pattern. (b) diagram the KMP NFA for this pattern. (Or equivalently, give the next[] matrix.) For lack of time, we will not consider the remaining two string matching algorithms in Section 5.3 (Rabin-Karp and Boyer-Moore). We will also skip 5.4 (regular expressions and NFA's). Next topic: 5.5, Compression We mentioned text compression as an application of BWT. Slides (as usual): 55DataCompression.pdf Today we talk about some classic "lossless" compression algorithms: Run-Length encoding: a warmup (repeat bits, or other chars) Huffman: Suppose we want to encode symbols by a prefix-free code, and we know the symbol frequencies. This 'greedy' algorithm computes the optimal code (as a binary trie). LZW: Transmit a stream of text, looking for repeated substrings from the recent past. This is dynamic coding, we do not need to see the entire input before we can start transmitting. Remark: textbook presents Huffman as two pass (i.e. look at entire input before you generate the code), but there are also dynamic variants of Huffman coding. We have demos for the last two. There is working code, for example: cd ../book cat abra.txt | run LZW - | run LZW +