Problem 6: [30pts] Dynamic Programming for k-Mean Square Clustering
Let X = (x1, x2, . . . , xn) be a sequence of reals.
The problem is to partition X into k non-overlapping subsequences so as to minimize the sum of the total mean square error over all of the subsequences.
This is a common first-step in approximating large data sequences.
As an example, the diagram below shows two ways of partitioning the same sequence of 8 items into three subsequences. The one on the right has a smaller mean-square error cost. In fact, the partition on the right provides the minimum cost over all possible 3-partitions
i |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
xi |
6 |
0 |
-6 |
0 |
-6 |
12 |
-6 |
6 |
I1 = (0, 3, 6, 8)
x¯1,3 = 0 x¯4,6 = 2 x¯7,8 = 0
MSC(1, 3) = 62 + 02 + 62 = 72
MSC(4, 6) = 22 + 82 + 102 = 168
MSC(7, 8) = 62 + 62 = 72
M3(I1) = 72 + 168 + 72 = 312
The formal definitions are below:
I satisfies i0 = 0, ik = n and i0 < i1 < i2 < · · · < ik. [xt−1 + 1, xt] is the tth interval of the partition.
In is the set of all k-partitioning index lists for [1 . . . n]
Given index list I, the k-Cost MSΣCk(I) of X is the sum of the MSC’s
The problem is to find the partition that minimizes MSCk(I) over all pos- sible partitions. More specifically use Dynamic Programming to find
OPTk(X) = min MSCk(I)Set Xm = (x1, x2, . . . , xm) .
In the example on the previous page X4 = (6, 0, −6, 0).
For 1 ≤ t ≤ k and 1 ≤ m ≤ n define
M(m : t) = OPTt(Xm).
This is the (minimum) cost of the best t-partition for Xm. Note that
OPTk(Xn) = M (n : k), so filling in the M (m : t) table solves the problem.
M (m : t) =MSC(1, m) if t = 1.
?????????????? otherwise
k), based on your recurrence relation from (b).
Your code may assume that you have a procedure for calculating
MSC(i, j) in O(1) time (even if you did not solve part (a)).
Analyze the running time of your algorithm. For full marks, your algorithm should run in O(kn2) time.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of