No new homework yet, probably not until Friday or Monday (last regular homework of semester, probably a fast suffix array algorithm). We detour from the textbook (and Chapter 5) to present the QuadTree data structure. This topic is not in our textbook, but see: http://en.wikipedia.org/wiki/Quadtree (the "point" variant) http://algs4.cs.princeton.edu/92search/ (Chapter 9!? under construction) The QuadTree gives us a way to construct the geometric graphs needed in hw4/. A QuadTree generalizes the naive BST: a tree node holds a point (like a BST key), and its children are four subtrees, holding points from the four quadrants around the top point. I'll describe it further on the blackboard. In particular, if we insert N points in random order, then the same argument as for the naive BST shows that the expected depth of the tree is O(lg N). There is a pile of code (with Makefile) in subdirectory quad/. We have the slow GeomGraph class (like in hw4), and then also a faster version GeomGraphQuad, that uses a QuadTree. Looking at quad/GeomGraphQuad.java, we see that the QuadTree gives us one new thing: a faster way to iterate over the closest neighbors of a given query point (the iterator uses the prebuilt QuadTree structure, and a MinHeap keeping track of the unexplored subtrees). It gives us about a 60x speedup for input mona-20k.txt (with D=15): cs323000@lab0z:~/share/0409/quad$ make ARGS='mona-20k.txt 15' both java -cp '.:book.jar' -da GeomGraphQuad mona-20k.txt 15 Reading mona-20k.txt ... 20000 points in 0.361 secs Creating G (no edges yet) ... done in 0.047 secs Calling addEdges(D=15) ... 168598 edges in 0.678 secs 1.089 seconds overall java -cp '.:book.jar' -da GeomGraph mona-20k.txt 15 Reading mona-20k.txt ... 20000 points in 0.386 secs Creating G (no edges yet) ... done in 0.002 secs Calling addEdges(D=15) ... 168597 edges in 67.421 secs 67.814 seconds overall cs323000@lab0z:~/share/0409/quad$ Two more things to try in quad/: make anim # an animation of the QuadTree iterator make circuit # compare the two graphs, possible bug? Coming back to last topic of Section 5.1: How hard is it to sort the suffixes of a String of length N? For English text (like Moby Dick), we see that naive sorting already works pretty well in practice, but this would be quadratic (O(N*N) time) in the worst case. There are some better results. See slides, starting at page 61. 1. There exists an O(N) time "suffix tree" algorithm, but it is too complicated to present here. 2. There is a relative simple O(N lg N) algorithm of Manber, I'll just sketch the idea. If there is time, we'll start on Section 5.2, Tries. When using strings as keys, we want a symbol table with some flexibility like a BST (with ordered keys, search by prefix, etc.) but with speed of hashing, i.e. time roughly O(length(key)). For this there are multiple "trie" structures, we'll look at some of them. As usual, more slides coming.