|
This step requires computer science...
Their paper: click here
|
This problem is solved by a brute force search
(A smart brute force search :-))
|
Meaning of the figure:
|
|
Therefore:
|
Optimal Histogram for [a,b] using k buckets
=
Optimal Histogram of [a..x-1] using k-1 buckets
|
|
Answer:
|
|
Optimal Histogram for [a,b] using k buckets = minx=a..b{ Optimal Histogram of [a..x-1] using k-1 buckets + last bucket is [x..b]} |
After processing 1 item (1):
|
After processing 2 items (2):
|
After processing 3 item (3):
|
After processing 4 item (4):
|
Procedure V-Opt
BestError[i][k] = best error of histogram using k buckets on (1..i)
// Squared Error Function:
SqError(int a, int b)
{
s2 = SqSum[b] - SqSum[a];
s1 = Sum[b] - Sum[a];
return (s2 - s1*s1/(b-a+1));
}
// Prepare arrays to compute error efficiently
Sum[0] = 0;
SqSum[0] = 0;
for (i = 1; i <= N; i++)
{
Sum[i] = Sum[i-1] + xi
SqSum[i] = SqSum[i-1] + xi2
}
// The dynamic algorithm to find the best histogram
//
// k = # buckets
// i = current item - items processed are: (1..i)
// BestError[i][k] = min. error in histogram of k buckets for f1..fi
for (k = 1; k <= B; k++)
{
// Find optimal histograms for [1..k]
for (i = 1; i <= N; i++)
{
if ( k == 1 )
BestErr[i][k] = SqError(1,i); // Single bucket (easy)
else
{
// Multiple buckets
BestError[i][k] = INFINITE; // Start value
// Try every possible last bucket
for (j = 1; j <= i-1; j++) // Last bucket is [j..i]
{
if ( BestError[j][k-1] + SqError(j+1,i) < BestError[i][k] )
{
BestError[i][k] = BestError[j][k-1] + SqError(j+1,i);
}
}
}
}
}
|