At work, we store logs as a single CSV inside a zip file in HDFS (history, that's why :).
Looking around, I couldn't find a FileInput library that works with Hadoop streaming on CDH4 (the version we're using).
So I wrote one, hope you'll find it useful (you can download the jar directly from here.)
Here's an example how to use it:
No comments:
Post a Comment