|
This step requires computer science...
Their paper: click here
|
This problem is solved by a brute force search
(A smart brute force search :-))
|
Meaning of the figure:
|
|
Therefore:
|
Optimal Histogram for [a,b] using k buckets
=
Optimal Histogram of [a..x-1] using k-1 buckets
|
|
Answer:
|
|
Optimal Histogram for [a,b] using k buckets = minx=a..b{ Optimal Histogram of [a..x-1] using k-1 buckets + last bucket is [x..b]} |
|
Problem: construct a V-optimal histogram with B = 3 buckets
Histogram with 1 bucket:
Values: 1..1 | 1..2 | 1..3 | 1..4 | 1..5 | 1..6 | 1..7 | 1..8 |
-------+------+------+------+------+------+------+------+---
Min Error: 0.0 | 2.0 | 2.0 | 8.75 | 10.0 | 13.3 | 63.7 | 161.5|
|
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 ] [ 3 ] ===> MinError[1][2] + 0
[ 4 ] [ 2 3 ] ===> MinError[1][1] + (2 - 2.5)2 + (3 - 2.5)2
| |
+-------+
1 bucket optimal
histogram
Using the result from the 1 bucket optimal histogram:
[ 4 2 ] [ 3 ] ===> 2.0 + 0 = 2.0
[ 4 ] [ 2 3 ] ===> 0.0 + 0.5 = 0.5 <---- Min
|
Result:
Input: 4 2 3 6 5 6 12 16 |
[ 4 2 3 ] [ 6 ] ===> MinError[1][3] + 0
[ 4 2 ] [ 3 6 ] ===> MinError[1][2] + (3 - 4.5)2 + (6 - 4.5)2
[ 4 ] [ 2 3 6 ] ===> MinError[1][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2
| |
+---------+
1 bucket optimal
histogram
Using the result from the 1 bucket optimal histogram:
[ 4 2 3 ] [ 6 ] ===> 2.0 + 0 = 2.0 <--- Min
[ 4 2 ] [ 3 6 ] ===> 2.0 + 4.5 = 6.5
[ 4 ] [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666
|
Result:
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 } [ 6 ] ===> MinError[2][3] + 0
{ 4 2 } [ 3 6 ] ===> MinError[2][2] + (3 - 4.5)2 + (6 - 4.5)2
{ 4 } [ 2 3 6 ] ===> MinError[2][1] + (2 - 3.66)2 + (3 - 3.66)2 + (6 - 3.66)2
| |
+---------+
2 bucket optimal
histogram
Using the result from the 2 bucket optimal histogram:
{ 4 2 3 } [ 6 ] ===> 0.5 + 0 = 0.5 <---- Min
{ 4 2 } [ 3 6 ] ===> 0.0 + 4.5 = 4.5
{ 4 } [ 2 3 6 ] ===> 0.0 + 8.666 = 8.666
|
Result:
Input: 4 2 3 6 5 6 12 16 |
{ 4 2 3 6 } [ 5 ] ===> MinError[2][4] + 0
{ 4 2 3 } [ 6 5 ] ===> MinError[2][3] + (6 - 5.5)2 + (5 - 5.5)2
{ 4 2 } [ 3 6 5 ] ===> MinError[2][2] + (3 - 4.66)2 + (6 - 4.66)2 + (5 - 4.66)2
{ 4 } [ 2 3 6 5 ] ===> MinError[2][1] + (2 - 4)2 + (3 - 4)2 + (6 - 4)2 + (5 - 4)2
| |
+-----------+
2 bucket optimal
histogram
Using the result from the 1 bucket optimal histogram:
{ 4 2 3 6 } [ 5 ] ===> 2.0 + 0
{ 4 2 3 } [ 6 5 ] ===> 0.5 + 0.5 = 1.0 <--- Min
{ 4 2 } [ 3 6 5 ] ===> 0.0 + 4.666 = 4.666
{ 4 } [ 2 3 6 5 ] ===> 0.0 + 10.0 = 10.0
|
Result:
Input: 4 2 3 6 5 6 12 16 |
Input: 4 2 3 6 5 6 12 16 |
/* ------------------------------------------------
Help function to compute Error in a bucket
------------------------------------------------ */
SqError(int a, int b)
{
s2 = PP[b] - PP[a];
s1 = P[b] - P[a];
return (s2 - s1*s1/(b-a+1));
}
/* ----------------------------------------------
Prepare arrays to compute error efficiently
---------------------------------------------- */
P[0] = 0;
PP[0] = 0;
for (i = 1; i <= N; i++)
{
P[i] = P[i-1] + xi
PP[i] = PP[i-1] + xi2
}
/* ---------------------------------------------
Compute the best error for 1 bucket histogram
--------------------------------------------- */
for (i = 1; i <= N; i++)
{
// Single bucket: use error formula...
BestErr[k][i] = SqError(1,i);
}
/* ---------------------------------------------------------
Now we compute the V-opt. histogram with B buckets
Output:
BestError[k][i] = best error of histogram
using k buckets
on data points (1..i)
--------------------------------------------------------- */
// The dynamic algorithm uses these variables:
//
// k = # buckets
// i = current item - items processed are: (1..i)
// BestError[k][i] = min. error in histogram of k buckets for f1..fi
for (k = 1; k <= B; k++)
{
// Find optimal histogram using k buckets
for (i = 1; i <= N; i++)
{
// Multiple buckets: search
BestError[k][i] = INFINITE; // Start value
// Try every possible size for the last bucket
for (j = 1; j <= i-1; j++) // Last bucket is [j..i]
{
if ( BestError[k-1][j] + SqError(j+1,i) < BestError[k][i] )
{
BestError[k][i] = BestError[k-1][j] + SqError(j+1,i);
// Better division found
}
}
}
}
|
/* ------------------------------------------------
Help function to compute Error in a bucket
------------------------------------------------ */
SqError(int a, int b)
{
s2 = PP[b] - PP[a];
s1 = P[b] - P[a];
return (s2 - s1*s1/(b-a+1));
}
/* ----------------------------------------------
Prepare arrays to compute error efficiently
---------------------------------------------- */
P[0] = 0;
PP[0] = 0;
for (i = 1; i <= N; i++)
{
P[i] = P[i-1] + xi
PP[i] = PP[i-1] + xi2
}
/* ---------------------------------------------
Compute the best error for 1 bucket histogram
--------------------------------------------- */
for (i = 1; i <= N; i++)
{
// Single bucket: use error formula...
BestErr[k][i] = SqError(1,i);
}
index[1] = 1; // First index *************
/* ---------------------------------------------------------
Now we compute the V-opt. histogram with B buckets
Output:
BestError[k][i] = best error of histogram
using k buckets
on data points (1..i)
--------------------------------------------------------- */
// The dynamic algorithm uses these variables:
//
// k = # buckets
// i = current item - items processed are: (1..i)
// BestError[k][i] = min. error in histogram of k buckets for f1..fi
for (k = 1; k <= B; k++)
{
// Find optimal histogram using k buckets
for (i = 1; i <= N; i++)
{
// Multiple buckets: search
BestError[k][i] = INFINITE; // Start value
// Try every possible size for the last bucket
for (j = 1; j <= i-1; j++) // Last bucket is [j..i]
{
if ( BestError[k-1][j] + SqError(j+1,i) < BestError[k][i] )
{
BestError[k][i] = BestError[k-1][j] + SqError(j+1,i);
// Better division found
index[i] = j+1; // ********************
}
}
}
}
/* ---------------------------------
Print bucket boundaries
--------------------------------- */
i = B;
j = n;
while (i >= 2)
{
int end_point;
end_point = j;
j = min_index[j];
System.out.println("[" + j + " .. " + end_point + "]");
j--;
i--;
}
System.out.println("[" + 1 + " .. " + j + "]");
|