|
|
Example:
|
Example:
|
|
|
The join operation will produce no output tuples !!!
Input 1: Input 2: Output: --------- ---------- ------------ (a1,b0) (a2,c0) (a2,b1) (a1,c1) (a1,b0,a1,c1), (a1,b0,a1,c2), ..., (a1,b0,a1,ck), (a2,b2) (a1,c2) (a2,b1,a2,c0), (a2,b2,a2,c0), ..., (a2,bk,a2,c0) ... ... (a2,bk) (a1,ck) |
There is only a single tuple (a1,b0) in R1
There is only a single tuple (a1,b0) in R1
|
Answer:
|
|
Example:
|
|
w(i) = relative weight of item i // Must be given a priori
N = 0; // N = number of tuples selected
while ( not EOF )
{
t = read_next_tuple(); // Get next tuple from input
// Select tuple
if ( random() < w(t) )
{
Sample[N] = t;
N++;
}
}
|
|
f = 1; // f*w(t) = selection probability
S = empty; // S = Concise Sample
while ( not EOF )
{
t = next input value;
if ( random() < f*w(t) )
{
if ( t ∈ S )
{
increase count of the t value in S
}
else
{
add (t,1) to S
}
}
/* -------------------------------------------
Deletion step:
Adjust sample when it gets too large...
------------------------------------------- */
if ( size(S) > MaxSize )
{
f' = β × f; // New selection probab
// β < 1
for ( each sample t ∈ S ) do
{
for ( i = 1; i <= t.count; i++ )
{
if ( random() < 1-β )
t.count--;
}
if ( t.count == 0 )
delete t from S;
}
f = f';
}
}
|
|
|
The sample S is clearly a weight random sample.
It is not obvious that S' will be a weighted random sample.
So, just like the uniform case, we must show that S' will also be a weighted random sample
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ...
| | | | | Selected with
| | | | | probab = f*w(x1)
v v v v v
S: x1 x1
|
Each value x1 is included with probability f*w(x1)
So:
|
Graphically:
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ...
| | | | | Selected with
| | | | | probab = f*w(x1)
v v v v v
S: x1 x1
| Selected with
| probab = β
v
S': x1
|
Input stream: ... x1 ... x1 ... x1 ... x1 .... x1 ...
| | | | |
| | | | |
v v v v v
| Selected with
| probab = β * f * w(x1)
v
S': x1
|
The resulting sample is also weighted