Tuesday 6 April 2010

Graph processing for really big data

MapReduce implementations of graph algorithms like PageRank and adsorption scale to millions of nodes on a cluster of around 50 machines, but if you want to process billions (or even tens of millions, depending on your algorithm) then you need a different framework.  Google uses Pregel, about which they've said little except that it was inspired by the Bulk Synchronous Parallel model for parallel programming.

So the announcement of a BSP package for Hadoop in the Apache HAMA project could be an interesting one to watch.  There's even a BSP hello world, although getting further may be hard work with the current level of documentation.

No comments:

Post a Comment