Introduction to Mean Shift Algorithm

rcthompson · on Sept 2, 2016

For all those struggling to understand the mean shift algorithm, it's much easier to understand in pictures. Here's an article that explains it with some diagrams: http://eric-yuan.me/continuously-adaptive-shift/

Basically, it goes like this:

1. Select a data point of interest.

2. Draw a circle of a specified radius around the point of interest.

3. Collect all data points within the circle and compute their mean.

4. Move the center of the circle to the mean.

5. Repeat 3 & 4 until convergence. Each iteration will move "uphill" on the density gradient of the data distribution until it reaches the top of the hill (a local maximum).

6. Repeat 1-5 for all data points. Points that converge to the same local maximum are members of the same cluster. The number of clusters is the number of local maxima.

For higher dimensions, replace "circle" with sphere (3-D) or hypersphere (4 & higher dimensions). Obviously, this algorithm depends on a choice of radius, which determines the granularity of the search for local maxima.

thomasahle · on Sept 2, 2016

Do I understand correctly that you run the converging process on every single point?

Couldn't you have a situation where points really far from a cluster would still converge to that cluster by this method? Something like a long string of points slowly getting closer and closer?

Does the mean shift algorithm have any guarantees on running time and/or the quality of the clustering it finds?

rcthompson · on Sept 3, 2016

> Do I understand correctly that you run the converging process on every single point?

I don't think you have to run it on every point in the data set. If your data set is very large, you could run it on a random sample of points, or a define a regular grid of starting positions at the resolution that you require.

> Couldn't you have a situation where points really far from a cluster would still converge to that cluster by this method? Something like a long string of points slowly getting closer and closer?

Yes, that's the idea. For each point, you're basically asking "If I start here and keep walking up the density gradient from here until I hit a maximum, where do I end up?" If the shape of the probability density function has a very long ridge, you could end up walking the entire length of the ridge until you hit the highest point. This means that you can have arbitrary-shaped clusters, within the smoothness bounds imposed by your chosen radius. This feature is considered a potential advantage over k-means clustering, which can only produce convex clusters.

> Does the mean shift algorithm have any guarantees on running time and/or the quality of the clustering it finds?

I haven't actually used it in practice, so I don't know.

fizixer · on Sept 2, 2016

So I guess the purpose is to find center-of-mass of a large collection of points, without using all those points for calculation (instead using only the ones inside a pre-defined radius)?

rcthompson · on Sept 3, 2016

Not the center of mass, the point with the greatest mass nearby. Think of a thin stick with a ball on one end. The center of mass, i.e. the point where the stick balances, might be somewhere along the stick, closer to the end with the ball on it. But the point of maximum local density will be at the center of the ball, no matter how long the stick is.

kough · on Sept 2, 2016

Cool writeup. This is part of sklearn, which has a great visualization of how it performs on some different clustering tasks, including comparisons to other algorithms: http://scikit-learn.org/stable/auto_examples/cluster/plot_cl...

Wookai · on Sept 2, 2016

Based on this plot, my naive reaction would be to decide that DBSCAN is the best algorithm, as it recovers the (qualitatively) best clusters in each case. Anybody has experience with DBSCAN? What are the downside?

kough · on Sept 2, 2016

The Wiki page has a nice writeup of a few downsides: https://en.wikipedia.org/wiki/DBSCAN#Disadvantages

web007 · on Sept 2, 2016

This sounds similar to k-means - pick some points "at random" and iteratively move them to the average location of all nearest neighbors until you stabilize. But mean shift goes in the opposite direction, giving each point a window and grouping them on overlaps. Can someone who understands both confirm this is the case, or correct my understanding?

RossBencina · on Sept 2, 2016

The article includes a comparison with k-means. From the article, mean-shift:

> does not assume anything about number of clusters

> can handle arbitrarily shaped clusters

> is fairly robust to initializations

> not very sensitive to outliers

> time complexity mean-shift: O(Tn^2), k-means: O(knT). (k: number of clusters, n: number of points, T: number of iterations).

[Edit: formatting]

nojvek · on Sept 2, 2016

I was expecting a more intro-ey layman explanation. This was pretty hard to understand.

Cyph0n · on Sept 2, 2016

I agree, these are basically the lecture notes for the course.

m_y-n_a_m_e · on Sept 2, 2016

Mean Shift in R https://cran.r-project.org/web/packages/MeanShift/MeanShift....

thecourier · on Sept 2, 2016

I love this blog. whenever I want to learn something cool I open Chrome in my Android and type: saravananthirumuruganathan.wordpress.com

brianberns · on Sept 2, 2016

> For each data point, Mean shift defines a window around it and computes the mean of the data point.

This makes no sense. A clearer explanation would go a long way.

robotresearcher · on Sept 2, 2016

The next paragraph in the fine article is:

At the high level, we can specify Mean Shift as follows : 1. Fix a window around each data point. 2. Compute the mean of data within the window. 3. Shift the window to the mean and repeat till convergence.

daveguy · on Sept 2, 2016

For each data point, Mean shift defines a window around it and computes the mean of the data points within the window.

Better?

bazzargh · on Sept 2, 2016

Not really.

Given an estimate of the mean; for each data point, Mean shift defines a window around it and computes a new estimated mean weighting each point by the probability density at the previous estimated mean calculated using the window

The (weighted) 'mean of the data points within the window' makes sense if you use the other perspective of looking at the window around the current estimated mean - you'll get the same answer, and to me this explanation is easier to grasp - the PDF only depends on the distance between the point and the estimated mean so you can think of either as the 'center'. But saying you calculate the mean of the data points within the window for each data point mixes up two perspectives and makes no sense.

ebalit · on Sept 2, 2016

daveguy's explanation is correct. It's the simplest form of the Mean Shift algorithm.

bazzargh · on Sept 2, 2016

Nope. It would be correct if it said 'For each estimate of the mean, Mean shift defines a window around it and computes the mean of the data points within the window, then repeats the process with this new estimate.'

Suppose we had 500 data points. Daveguy's process calculates a separate mean for a window around each data point. Now we have 500 means...and?

daveguy · on Sept 2, 2016

I was clarifying the sentence which the poster had pointed out as confusing, not the entire description of the algorithm.

ebalit · on Sept 2, 2016

And we shift every point to its corresponding mean. Then, we repeat it until convergence (every point is fixed).

biomcgary · on Sept 2, 2016

I just got out of a genome talk that uses the Mean Shift algorithm to find gene clusters, which speaks to the broad applicability of the algorithm.

garbage_stain · on Sept 2, 2016

mlpack has a nice command-line-accessible implementation: http://mlpack.org/docs/mlpack-2.0.3/man/mlpack_mean_shift.ht...

nerdponx · on Sept 3, 2016

This sounds an awful lot like Expectation-Maximization

contravariant · on Sept 3, 2016

I don't think so, at least I can't figure out what would correspond to the expectation step.

It seems to be a gradient ascent on a smoothed density function, so there's only a maximization step, no expectation step is involved.

tmaly · on Sept 2, 2016

this algorithm looks interesting.

any chance you could move it to github?

DLion · on Sept 2, 2016

This article is not mine I found it looking for articles about Mean Shift for my thesis but you can find a cpp implementation in OpenCV libraries: https://github.com/opencv/opencv/blob/master/modules/video/s...

wodenokoto · on Sept 2, 2016

The article is quite old, and plenty of implementation are widely available today.

Here is a link to mean shift for sklearn: http://scikit-learn.org/stable/modules/generated/sklearn.clu...

sixhobbits · on Sept 2, 2016

(2010)