No new homework yet, probably not until Friday or Monday (last regular
homework of semester, probably a fast suffix array algorithm).

We detour from the textbook (and Chapter 5) to present the QuadTree
data structure.  This topic is not in our textbook, but see:

  http://en.wikipedia.org/wiki/Quadtree   (the "point" variant)
  http://algs4.cs.princeton.edu/92search/ (Chapter 9!? under construction)

The QuadTree gives us a way to construct the geometric graphs needed
in hw4/.  A QuadTree generalizes the naive BST: a tree node holds a
point (like a BST key), and its children are four subtrees, holding
points from the four quadrants around the top point.  I'll describe it
further on the blackboard.  In particular, if we insert N points in
random order, then the same argument as for the naive BST shows that
the expected depth of the tree is O(lg N).

There is a pile of code (with Makefile) in subdirectory quad/.  We
have the slow GeomGraph class (like in hw4), and then also a faster
version GeomGraphQuad, that uses a QuadTree.

Looking at quad/GeomGraphQuad.java, we see that the QuadTree gives us
one new thing: a faster way to iterate over the closest neighbors of a
given query point (the iterator uses the prebuilt QuadTree structure,
and a MinHeap keeping track of the unexplored subtrees).  It gives us
about a 60x speedup for input mona-20k.txt (with D=15):

cs323000@lab0z:~/share/0409/quad$ make ARGS='mona-20k.txt 15' both
java -cp '.:book.jar' -da GeomGraphQuad mona-20k.txt 15
Reading mona-20k.txt ... 20000 points in 0.361 secs
Creating G (no edges yet) ... done in 0.047 secs
Calling addEdges(D=15) ... 168598 edges in 0.678 secs
1.089 seconds overall

java -cp '.:book.jar' -da GeomGraph mona-20k.txt 15
Reading mona-20k.txt ... 20000 points in 0.386 secs
Creating G (no edges yet) ... done in 0.002 secs
Calling addEdges(D=15) ... 168597 edges in 67.421 secs
67.814 seconds overall

cs323000@lab0z:~/share/0409/quad$


Two more things to try in quad/:
     make anim        # an animation of the QuadTree iterator
     make circuit     # compare the two graphs, possible bug?


Coming back to last topic of Section 5.1:

How hard is it to sort the suffixes of a String of length N?
For English text (like Moby Dick), we see that naive sorting
already works pretty well in practice, but this would be
quadratic (O(N*N) time) in the worst case.   There are some
better results.

See slides, starting at page 61.

  1. There exists an O(N) time "suffix tree" algorithm,
     but it is too complicated to present here.
  2. There is a relative simple O(N lg N) algorithm of Manber,
     I'll just sketch the idea.

If there is time, we'll start on Section 5.2, Tries.

When using strings as keys, we want a symbol table with some
flexibility like a BST (with ordered keys, search by prefix, etc.) but
with speed of hashing, i.e. time roughly O(length(key)).  For this
there are multiple "trie" structures, we'll look at some of them.

As usual, more slides coming.