|
|
Notice that:
|
|
because the records may not be sorted in that order
|
|
|
|
while ( R ≠ empty and S ≠ empty ) do
{
Read the index of R with the smallest join key in buf(R):
|
|
|
Hence the name zig-zag join algorithm.....
|
|
Graphically:
|
(It will not zigzag as long as the search key is the same, it only zig-zag when it moves from one serach key value to the next search key value)
relation R(X,Y) relation S(Y,Z)
--------------------------------------------
B(R) = 1000 blks B(S) = 500 blocks
T(R) = 10000 tuples T(S) = 5000 tuples
R is clustered S is clustered
No index on R
Clustering sorted index on S
V(S,Y) = 100
|
|
The one-pass Join algorithm is not applicable !!!
|
According to this webpage click here, the cost of (version 2) join algorithm based on TPMMS is:
# disk IOs = 3 B(R) + 3 B(S) // because R and S are clustered
= 3 × 1000 + 3 × 500
= 4500 blocks
|
|
However, we could adapt the zig-zag join...
|
We proceed as follows:
|
Cost of this Join algorithm:
We need to scan all sorted chunks of R:
we will use: B(R) disk I/Os
Because the index on S is clustering, we will scan S once (will little skipping around):
we will use: B(S) disk I/Os
|
Total cost of the Join algorithm:
First sort R and save sorted chunks on disk: 2 × B(R)
Join sorted chunks of R and S, using clusting index: B(R) + B(S)
Total cost = 3 B(R) + B(S)
= 3 × 1000 + 500
= 3500 blocks
(We beat TPMMS based join which has cost: 4500 block disk IOs)
|
Note:
|
relation R(X,Y) relation S(Y,Z)
--------------------------------------------
B(R) = 1000 blks B(S) = 500 blocks
T(R) = 10000 tuples T(S) = 5000 tuples
R is clustered S is clustered
Clustering sorted index on R
Clustering sorted index on S
V(S,Y) = 100
|
|
We use:
|
Use the clustering index of R and scan R once
we will use: B(R) disk I/Os (there is few skipping around
because the index is clustering)
Use the clustering index of S and scan S once
we will use: B(S) disk I/Os (there is few skipping around
because the index is clustering)
|
Note:
|