If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, May 01, 2013

Getting Good Errors from Python Map/Reduce Jobs

At work, we use some Python map/reduce jobs (using Hadoop streaming).

Debugging can be difficult, since Hadoop does not keep the Python stacktraces and even if it does - it's very hard to find it. We decided to use crashlog, and now we get wonderful emails with detailed description of what went wrong.

Notes:
  • The current mapper input file is in map_input_file environment variable
  • Don't forget to add crashlog.py with -file (see here)
  • You must add "." to PYTHONPATH in order to import
    • import sys; sys.path.append('.') should do the trick
  • Email is not the best solution for distributed logging (you get a lot of email when things go South). I'm going to play with graylog2 in the future.

No comments:

Blog Archive