At work, we store logs as a single CSV inside a zip file in HDFS (history, that's why :).
Looking around, I couldn't find a FileInput library that works with Hadoop streaming on CDH4 (the version we're using).
So I wrote one, hope you'll find it useful (you can download the jar directly from here.)
Here's an example how to use it:
If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment