Few days back I came across, the

Carrot Clustering Framework this inspired me to write something similar for Ruby. So, I started off with this project, and have right now implemented the basic K-Means and Hierarchical Clustering algorithms.

The first release can be downloaded from

Rubyforge using the following command

gem install clusterer

The gem requires the

stemmer gem, as a dependency.

There are also two example files which shows, how to use the library by clustering search results returned by

Yahoo and

Google. To try the example, the corresponding API key is needed.

Basically, one has to pass an array of strings to the clustering algorithm, and it will return the index of the clustered elements.

Clusterer::Clustering.kmeans_clustering(["hello world","mea culpa","goodbye world"])

Clusterer::Clustering.hierarchical_clustering(["hello world","mea culpa","goodbye world"])

The result might be something like

**[[1,3],[2]]**.

The method signature for K-means is as follows

def kmeans_clustering (docs, k = nil, max_iter = 10, &similarity_function)

K-means is a simple hill climbing algorithm, and can get stuck at local maxima, but it fast in nature. Just to ensure that the algorithm doesn't gets stuck in a state where it oscillates the max number of iteration is necessary.

When

k=nil the algorithm finds

k = Math.sqrt(docs.size) clusters.

def hierarchical_clustering (docs, k = nil, &similarity_function)

Hierarchical clustering gives much better results, but is comparatively slower, when data volume is quite high.

If you are using this gem in a live public facing site, then let me know; I would like to link to that.

Update: New release

Clusterer + other plugins