Debugging can be difficult, since Hadoop does not keep the Python stacktraces and even if it does - it's very hard to find it. We decided to use crashlog, and now we get wonderful emails with detailed description of what went wrong.
- The current mapper input file is in map_input_file environment variable
- Don't forget to add crashlog.py with -file (see here)
- You must add "." to PYTHONPATH in order to import
- import sys; sys.path.append('.') should do the trick
- Email is not the best solution for distributed logging (you get a lot of email when things go South). I'm going to play with graylog2 in the future.