Seite **1** von **1**

### Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 11:41**

von **hymGo**

Hi,

I thought it would be interesting to compare the resulting error rates. What rates do you get and how do they vary (regarding different executions -> since other images are used for training)? I will post my rates soon (when I finished the task).

Another question: Do your computations take some time as well? My do, but I think its just because of sift and kmeans.

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 13:35**

von **lustiz**

hymGo hat geschrieben:Hi,

I thought it would be interesting to compare the resulting error rates. What rates do you get and how do they vary (regarding different executions -> since other images are used for training)? I will post my rates soon (when I finished the task).

Another question: Do your computations take some time as well? My do, but I think its just because of sift and kmeans.

Hi, depending on both the resulting codebook and used training images I often (but not always) get zero errors for the training set for all three methods (naive bayes, separable svm, non-separable svm with a pretty low C). For the testing set I get in between 0% and 2.5% depending on the images/codebook, again. Most of the time it's 1.83333% or something (can't remember, don't have it available right now).

As far as times are concerned, extracting the features takes most of the time (ca 1 minute for all images). KMeans takes about 4-6 seconds. All other computations are neglectable.

Btw: My processor is a core 2 duo 2.5Ghz.

Cheers

EDIT: sorry, the time for extracting all features was actually a minute..

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 16:47**

von **ampelmann**

Hi, how do you solve k-means in 4-6 seconds?? I am working on it for days now, being really frustrated. It takes about 30-50 iterations (then it stops no matter what so it don't run forever). However, finding a cluster for each of the multiple thousand feature vectors (120 images * 10 and more vectors per image) by computing ||vector-mean|| for all 50 means and then recomputing means in case of changes - this takes so much time. 4-6seconds PER ITERATION (which makes 2-3 minutes in total!!) Where's my braintwist??? thankful for any help!

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 18:12**

von **lustiz**

ampelmann hat geschrieben:Hi, how do you solve k-means in 4-6 seconds?? I am working on it for days now, being really frustrated. It takes about 30-50 iterations (then it stops no matter what so it don't run forever). However, finding a cluster for each of the multiple thousand feature vectors (120 images * 10 and more vectors per image) by computing ||vector-mean|| for all 50 means and then recomputing means in case of changes - this takes so much time. 4-6seconds PER ITERATION (which makes 2-3 minutes in total!!) Where's my braintwist??? thankful for any help!

Hi,

I am not quite sure what you mean by

finding a cluster for each of the multiple thousand feature vectors

You have 50 clusters at any point of time! Sometimes clusters do not get any assignments anymore. Then you have to replace them by random data points.

I just timed it a bit more accurately:

*KMeans* takes 30-70 iterations, mostly about 50. Thereby, the time for one iteration is ca 0.075 seconds, constantly. The standard trick for writing efficient Matlab code is not using any loops whatsoever. In other words: Always use vectorization and make use of bsxfun etc. The resulting code is actually pretty compact.

Computing the Euclidian distances takes most of the time, that's right. Try to do it in as least lines as possible. It is possible to compute distances from all points to all means in a one-liner!

Cheerso

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 18:22**

von **ampelmann**

Hi, thanks for that reply! I kind of have the same steps (even though I need some more lines

). Just to clarify: You are working with feature vectors (128x1), right? So I have ~4800 feature vectors in total. For all these vectors I compute distances to the 50 means and do all the min, move, recompute mean stuff. Is that right?

PS: I don't use any loops at all - working with arrayfun a lot. that's recommendable?

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 18:32**

von **lustiz**

ampelmann hat geschrieben:Hi, thanks for that reply! I kind of have the same steps (even though I need some more lines

). Just to clarify: You are working with feature vectors (128x1), right? So I have ~4800 feature vectors in total. For all these vectors I compute distances to the 50 means, search for the minimum, move them to respective cluster, recompute means and restart iteration?!

PS: I don't use any loops at all - working with arrayfun a lot. that's recommendable?

Ahh, so here is the problem: Do not consider arrayfun or cellfun to be fast. They are just functions for convenience, however, they are NOT VECTORIZED implementations. They just take function pointers and consecutively apply them to each element. Do not confuse those two concepts! I would recommend to take a look at the Matlab intro given by the TAs, again. There is a section on vector operations, thou not all of them. You might add dot/sqrt/log.... and of course the handy BSXFUN!

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 18:35**

von **ampelmann**

Ok, so I will spend some time on speeding up my code later. However, the output should be right by now so I will do the vectorization part when I am done with the rest. Thanks for that help, feel a bit encouraged now

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 19:20**

von **hymGo**

lustiz hat geschrieben:
As far as times are concerned, extracting the features takes most of the time (ca 1 minute for all images). KMeans takes about 4-6 seconds. All other computations are neglectable.

Btw: My processor is a core 2 duo 2.5Ghz.

I have nearly the same processor. Loading the images takes ca. 80 secs. and the kMeans needs 20 secs. So, I think I can optimize my kMeans as well

Regarding the error rates and bayes, I get often 0 percent for the training set and something like 2.5%, 1,6%, 0.8 or 0% for the testing set. For the svm (only seperable case, since it seems to work) I get for both mostly 0 percent.

### Re: Ex 3.2 - Comparing error rates

Verfasst: **28. Jun 2013 19:47**

von **lustiz**

hymGo hat geschrieben:lustiz hat geschrieben:
As far as times are concerned, extracting the features takes most of the time (ca 1 minute for all images). KMeans takes about 4-6 seconds. All other computations are neglectable.

Btw: My processor is a core 2 duo 2.5Ghz.

I have nearly the same processor. Loading the images takes ca. 80 secs. and the kMeans needs 20 secs. So, I think I can optimize my kMeans as well

Regarding the error rates and bayes, I get often 0 percent for the training set and something like 2.5%, 1,6%, 0.8 or 0% for the testing set. For the svm (only seperable case, since it seems to work) I get for both mostly 0 percent.

Sounds about right

I guess I would optimize in following order:

1) The most speedup is gained by rewriting the distance computation into a fully vectorized one-liner without any loops at all.

2) After that I would recommend to take a look at how the new means are computed.

*accumarray* may be handy here.

3) To replace empty clusters, take a look at

http://www.mathworks.de/company/newslet ... atrix.html, particularly the section about logical indexing.