Seite 1 von 1

### Ex 2.3

Verfasst: 13. Jun 2013 21:49
Hi,

I have a question concering the task description. It is said that we should apply non-maximum suppression. The task describes non-maximum suppression as the extraction of the local maximum in a 5x5 neighborhood. So all pixels, aside from the maximum, are cancled out (set to zero). I don't understand the neighborhood itself. Is it a 5x5 window which is shifted over the harris values (withoud overlapping, e.g. first position 1, next postion 6). Or do we lay the neighborhood window over each pixel (if yes, there would be a lot of interest points left, also after thresholding; or are already determined local maximas are earesed if they are in a grid of a neighbor pixel which contains a larger value).

Moreover, I have a question concerning image coordinates and filters. Regarding central differences, we may have the following filter 0.5 [-1 0 1]. This filter is defined in the direction of the image's x axis. So, I though it is the x filter for correlation (since correlation is applied by overlaying the filter and multiplying the pixels with the weight and summing them up). But I got some how confused and I am not sure if its the x-axis filter for correlation or connvolution. Also the slides provide no way to differ between convolution an correlation filters. So I just wanted to ask, how I can differ between both (are correlation or convolution x-axis filters are defined in the x direction of the image).

### Re: Ex 2.3

Verfasst: 14. Jun 2013 12:01
Hi, a few thoughts that may help:

You can interpret non-maximum suppression as a non-linear filter you apply to the image. That is, you run through all pixels of the image and just keep the ones which are greater or equal than all pixels in their surrounding 5x5 window and discard all other ones. Having said that, it is not just a 5x5 window you shift by 5 pixels each time but a 5x5 sliding window which is used for non-linear filtering. After thresholding the result will probably have a number of interest points in the order of the shown one (Figure 2). By the way: Discarding a lot of interest points is exactly what you want! In the end we actually want to have a sparse description rather than a dense one.

Correlation means we weight each pixel of the sliding window with the weight in the corresponding position of the filter kernel:

$$g(i, j) = \sum\limits_{k,l} f(i+k, j+l) h(k,l)$$.

Convolution reverses the signs of the offsets which you may interpret as offset in the sliding window or as offset in the kernel mask. This means we just mirror the weights:

$$g(i, j) = \sum\limits_{k,l} f(i-k, j-l) h(k,l) = \sum\limits_{k,l} f(k, l) h(i-k,j-l)$$.
Example:

$$\left( \begin{array}{ccc} +1 & +2 & +1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{array} \right)$$

$$\left( \begin{array}{ccc} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{array} \right)$$

Using convolution for the top one and correlation for the bottom one give the same results when you apply them to another image. However, your question might have been about the signs and why the above one is convolution and not vice versa. The reason is the definition of the x and y axes in images. The top-left corner is (0,0) and the bottom right pixel has the coordinates (M-1, N-1) or as in matlab coordinates (1,1) and (M, N). Now central differences approximate the derivative df/dx (x) as:

$$\frac{\delta f(x)}{\delta x} \approx \frac{f(x + h) - f(x - h)}{2 h}.$$

Our spacing here are pixels so we want to use $$h = 1$$ which gives us:

$$\frac{\delta f(x)}{\delta x} \approx \frac{f(x + 1) - f(x - 1)}{2}.$$

And there it is, this is already the answer to your sign question! We approximate a pixel's derivative by subtracting the previous pixel from the next one and divide the sum by two (or multiply by 0.5). And next means next in our coordinate system! Now whenever you use either convolution or correlation, just think about the signs your result should have w.r.t. to the image coordinate system.

### Re: Ex 2.3

Verfasst: 14. Jun 2013 14:50
Hi,

i wrote an reply, but apparently i forgot to press send and now it is lost. Anyways, lustiz's reply pretty much answers your question.

Of course it makes only sense to do NMS for each pixel. Non overlapping windows would not be able to suppress surrounding values consistently. Think about a 5x5 grid. pixel x_1=5 and x_2=6 fall into 2 different "blocks". Now both are directly adjacent, but they would't matter in suppressing each other. This is not really sensible, right?

As i said in the Asgn1 1 presentation, you have to pay attention to the image coordinate system (y axis facing downward), directionality of the filter, and if conv or corr is used for filtering. It is easiest, if you create a toy example and visualize what you are actually doing.

Regards,
Thorsten