This is a review session schedule for 11am to 12:30 in W302, on
Wednesday 5/2 (reading day).  I'll answer questions, and also
give solutions to the written hw5 problems.

See images in this directory, for solutions as presented on the
blackboard.  Some remarks:

I omit 3(b), since I have decided not to ask you to design a KMP NFA
on the exam, not even as "extra credit".  The point being, it was not
taught well (our only proper description was the code).  You still
might be expected to know that a KMP NFA uses only O(M) space (for a
pattern of length M), compared to a KMP DFA which needs space O(MR).
They both need only O(N) time to search a text of length N.

For 4(c), you only had to figure out the number of bits, but I went
ahead and worked out the string as well.   Of course the details of the
string depend on trie you chose, but the length is always 59 bits.


Here are a few solutions rendered in ASCII (view in fixed-width font):

1(a)

no  pa  ai
is  pe	al
th  of	co
ti  th	fo
fo  th	go
al  th	is
go  ai	no
pe  ti	of
to  al	pa
co  co	pe
to  fo	th
th  go	th
ai  no	th
of  to	ti
th  to	to
pa  is	to

1(b).

Original:

now
is
the
time
for
all
good
people
to
come
to
the
aid
of
their
party

After first pass (sort on char 0), we get these groups.
Note groups of size one are "done", they need no more work.

all
aid

come

for

good

is

now

of

people
party

the
time
to
to
the
their

Note we may name groups by the prefix we know they have in common.
Our unfinished groups above are "a", "p", and "t".

After one more recursive level of sorting (on the second char), groups
"a" and "p" are finished (they split into groups of size one).  Group
"t" splits into "th", "ti", and "to", of which only "ti" is finished:

aid

all

party

people

the
the
their

time

to
to

In the next step (the third char, at index 2), we realize the "to"
group is finished, because they both terminate (we imagine a
terminator char value of "-1", one past the end of the string).  The
"th" group does not split at all, but it becomes the "the" group (all
the strings had an "e"):

to  [saw terminator, group is done]
to

the
the
their

In one more step, we find the two strings that both terminate as "the",
and the third string that differs with an "i".

the  [saw terminator, group is done]
the

their

Remark: I presented the MSD operations here "level by level", but
really MSD operates by a typical "depth first" recursion.  For
example, it would completely finish sorting the "a" and "p" groups,
before taking its first look at the "t" group.


1(c).  Here we indicate the chars that were not inspected by MSD
in uppercase:

aiD
alL
cOME
fOR
gOOD
iS
nOW
oF
paRTY
peOPLE
the
the
theiR
tiME
to
to


3(a).
Here is the DFA transition table in tabular form, this is really
just another way to present the state diagram.

   A  B
0  0  1
1  2  1
2  0  3
3  2  4
4  5  1
5  0  6
6  7  4
7  8  3

Recall 0 is the start state, and 8 is the goal state.  If we ever
reach 8, we know we have just found the pattern.


4(a,b).  Note there were many other ways we could have built our
Huffman code trie, but in part (b) we should always end up with
a bit string of 34 bits.

4(c).  Our code trie can be encoded in 59 bits.  Note this depends
only on the number of leaves in the trie, not its shape.  Here is the
string for our particular trie (see also in 4c_extra.jpg):
 0 0 1 'A' 1 'B' 0 1 'O' 0 0 1 '!' 1 'Y' 1 'D'

Each 'X' denotes an ASCII char in 8 bits.  For example 'A' is 65, or
01000001.  I will not expect you to know ASCII codes for the exam.


5.  On the blackboard, I did not make it very clear when each code is
discovered by the compressor, versus the decompressor (or expander).
Here it is, presented a bit more clearly.

5(a). LZW compressor

step  transmits    adds to table
 1.   41 = A       81 = AB
 2.   42 = B       82 = BA
 3.   81 = AB      83 = ABA
 4.   83 = ABA     84 = ABAA
 5.   84 = ABAA    85 = ABAAB
 6.   42 = B       nothing
 7.   80 = end     nothing

5(b). LZW decompressor

step  receives     adds to table
 1.   41 = A       nothing
 2.   42 = B       81 = AB
 3.   81 = AB      82 = BA
 4.   83 = ?       83 = ABA  (by tricky case, AB+A)
 5.   84 = ?       84 = ABAA  (by tricky case, ABA+A)
 6.   42 = B       85 = ABAAB
 7.   80 = end     nothing

In steps 4 and 5, the decompressor figures out the received string by
the "tricky case": the code must stand for the previously received
string, plus its first letter.