Seite 1 von 1

### Assignment 3 - Problem 2 - K-Means

Verfasst: 25. Jun 2013 16:56
Hi, short question: Are we supposed to implement k-means by ourselves or can we take the Matlab implementation? In case we are allowed to use the Matlab implementation, how can I tell Matlab to initialize lost clusters with random data points? According to the documentation there is only an option to assign the point furthest away from the centroid ('singleton'). So I guess this should be an argument for our own implementation ?

----> OK, accidentally skipped the first page.... Forget about the question

### Re: Assignment 3 - Problem 2 - K-Means

Verfasst: 25. Jun 2013 21:25
So we're supposed to implement our own k-means, right? however, I am still a bit confused: are the mean-values vectors (because x_j is a vector, isn't it? It's a 128 dimensional point...)? so to choose the minimum, I would compare the distances per dimension or simply sum them up and compare sums?
Appreciate any hints.

### Re: Assignment 3 - Problem 2 - K-Means

Verfasst: 25. Jun 2013 22:18
I think it is stated on the exercise sheet. To compare two of this vectors(128 elements), you just use the 2-norm

### Re: Assignment 3 - Problem 2 - K-Means

Verfasst: 26. Jun 2013 09:54
Thank you, think that helped

### Re: Assignment 3 - Problem 2 - K-Means

Verfasst: 26. Jun 2013 13:55
Does your kmeans algorithm terminate? Mine doesn't, but I think my implementation is right. I choose the cluster centroids, as stated, totally random. It happens each iterration that there is a centroid which has no points associated with it. Choosing a new andom point does not solve this. There's always a centroid without points. Do I have to limit the range of the random variables?

Ok. I think I got it. I should read the tasks more carefull
The task states: "replace a centroid with no points with a random data point". If doing so, the performance increases.

### Re: Assignment 3 - Problem 2 - K-Means

Verfasst: 26. Jun 2013 14:21
hymGo hat geschrieben:Does your kmeans algorithm terminate? Mine doesn't, but I think my implementation is right. I choose the cluster centroids, as stated, totally random. It happens each iterration that there is a centroid which has no points associated with it. Choosing a new andom point does not solve this. There's always a centroid without points. Do I have to limit the range of the random variables?
Hi, two thoughts here:

1) In the beginning only you choose totally random points. Clusters eventually become empty, that is, some clusters do not get any assignments at all. In this case, as stated in the task, you are supposed to pick random points from the data set, however, you are NOT supposed to choose totally random points, again. Having said that, you should make sure that new picked centroids are unique. More clearly: If you have N empty clusters, you should pick a permutation of size N from the set of points M. randperm may be useful here!

2) In order to debug it's probably a good idea to print out the number of changed assignments in each iteration. This number should get smaller and smaller with increasing iterations (not necessarily monotonously). KMeans may take a couple of seconds to terminate depending on your number of clusters. Also I would recommend to debug with 2D points rather than directly solving the clustering in 128 dimensions because you can VISUALIZE what the hell is going on

Cheers lustiz