Here’s a simple version of the MapReduce framework presented in the now-famous Google paper by Dean and Ghemawat. My version of MapReduce is not intended as a usable high-performance framework, but rather as a learning tool. My goal is twofold: first, to learn to write algorithms in distributed/parallel MapReduce style. Second, to see how simply these concepts can be expressed in Ruby.
I use the Rinda framework to distribute tasks to remote workers. This simplifies a great deal of the MapReduce grunt work. The map and reduce code, along with data, is marshaled and sent over the network transparently. Creating a MapReduce job is as easy as creating an object, assigning lambdas for map and reduce, assigning data, then telling it to run.
Last night I attended the inaugural meeting of the Boulder-Denver Ruby User’s Group. “Meeting” was a term used in the loose sense–it was more a gaggle of Ruby enthusiasts sitting around tables with beer, chatting about Ruby and other geek stuff. The meeting was held at a brewery, so it was impossible to hear people more than a couple feet away, but as the group shifted around I probably talked with half a dozen others.