If it won't be simple, it simply won't be. [source code] by Miki Tebeka, CEO, 353Solutions

Friday, September 14, 2012

Using Hadoop Streaming With Avro

One of the way to use Python with Hadoop is via Hadoop Streaming. However it's geared mostly toward text based format and at work we use mostly Avro.

Took me a while to figure the magic, but here it is. Note that the input to the mapper is one JSON object per line.

Note it's a bit old (Avro is now at 1.7.4), originally from here.
Post a Comment

Blog Archive