If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Thursday, December 31, 2015

Using HAProxy to Prevent Deletes from Elasticsearch

At one of my clients, we wanted something quick and dirty to prevent deletes from Elasticsearch (shield is too expensive and would take too much time to integrate with our systems - we'll fix this technical debt later).

The quick solution was to place HAProxy in front of Elasticsearch and use its acl mechanism to prevent HTTP DELETE. Works like a charm.

Here's the HAProxy configuration and the docker-compose setup file I used to test the configuration.

Tuesday, December 22, 2015

Python's deque for Go

Working on a Go project with my friend Fabrizio, I've investigated ways to have a faster data structure to store history items with append and pop.

Got the idea to try implementing Python's deque in Go. The C implementation is pretty easy to read. The result is deque for Go, which implement a subset of the features from Python's deque but enough for our needs. And it's pretty fast too:

$ make compare
Git head is 765f6b0
cd compare && go test -run NONE -bench . -v
testing: warning: no tests to run
PASS
BenchmarkHistAppend-4  3000000        517 ns/op
BenchmarkHistList-4    2000000        702 ns/op
BenchmarkHistQueue-4   3000000        576 ns/op
BenchmarkHistDeque-4   3000000        423 ns/op
ok   _/home/miki/Projects/go/src/github.com/tebeka/deque/compare 8.505s

Wednesday, November 11, 2015

aenumerate - enumerate for async for

Python's new async/await syntax helps a lot with writing async code. Here's a little utility that provides the async equivalent of enumerate.

Thursday, September 24, 2015

git - Creating Pull Request for master

A co-worker asked me for a code review (we're using Stash, but this can work for other systems as well), the problem was that he worked on master (started his own project) and not in development branch. The solution was to create an empty orphan branch and then a pull request from master to that branch (reverse the usual order).

Here's how to create such branch.

Tuesday, September 01, 2015

Go Tour Exercise Solutions

As a backup plan for the last Go Meetup, I wrote the solutions to the exercises in Go Tour and we discussed some of them.

You can find the solutions here.

Monday, August 10, 2015

re2 available on conda

We're using re2 to get some speed gains on the many regular expressions we're trying to match. So far building it was either a manual step or a script that ran when building docker container. I decided to create a conda package (we're using Miniconda as our Python environment).

I started with conda skeleton pypi re2 (you need to conda install conda-build first). Then after some tweaking to build.sh we were good to go.

The result - you can now conda install -c tebeka re2 (only 64bit linux supported currently).

The project is here, I'll gladly accept any comments/improvements.

Here's build.sh which patches re2 Makefile and added the library and header location to the Python build step.

Tuesday, July 14, 2015

fastavro moved to github

If you can't beat them ... :)

fastavro is now on github. I still prefer mercurial as an SCM but most of the pull requests I get are on github and it doesn't worth the effort of maintaining two repositories (though hg-git is a big help)

Wednesday, July 08, 2015

dockermon - A Docker Event Monitor

I'm currently working with the awesome team at CyberInt (and yes, they are hiring).

We're moving to a docker based environment. The old environment used Supervisor to monitor and relaunch daemons. We had an event listener that notified us on our HipChat room every time a daemon crashed and wanted the same feature with our docker containers.

We didn't find a ready solution, so we wrote one and made it open source. The project is called dockermon and is one Python script with no external dependencies and also Python 2 and 3 compatible.

Tuesday, June 30, 2015

Naming "with open" Variable

Python''s "with" statement is great for resource handling. However I find my self struggling with naming (and naming is important) the context manager variable.

When you write "with open(''/path/to/somethere'') as X", what''s the best name for X? In some cases it''s obvious, but in most cases I find myself using the generic "fo" (stands for "file object").

I decided to run a little script on Python''s 3.4 Lib directory and find out what is the most common name. Here are the results:
Seems like f is the most common, but I really don''t like single letter variables. I''ll go with the 2nd place - fp.

Here''s the script used to generate this chart:

Friday, June 26, 2015

353Solutions - A Year in Review

353Solutions was founded a bit more than a year ago. I wasn't planning on doing consulting, I'm a techie and love the development abstraction layer that companies give you and let you code most of the time. (If this is not the case in the company you're working at - consider finding a better one :)

However as the old saying goes - "Man plans and god laughs". I found myself owning a teaching/consulting company called 353Solutions. So far it's fun and provides for the family - what else can you ask for?

Here are results from a short retrospective we did lately.

The Numbers

  • 6 clients
  • 204 work days
  • 204 hours teaching Python (7 courses)

Thoughts

I like working from home, however most companies I talked to wanted some office time. This is understandable since I don't only code but also do system and process design - these roles require more face to face communication. I'm still looking for something that will allow me to spend most of my time working from home.

Teaching is fun! I did that on and off most of my professional carrier, but now it's a big chunk of my time. I'm grateful to Raymond Hettinger who started me off and showed me what a top-notch class/workshop should look like. So far I'm mostly going to companies and teaching there, but just now we launched our own classes - it will  be awesome!

The downside for teaching is that it takes me away from home. For a limited amount this is great (I spend about a week every month teaching Python in the UK). However I'm looking for opportunities that will let me teach from home - stay tuned.

The social network is by far my biggest source of new jobs. Talking to other people - it's not just me. Investing time in making connections and keeping them will pay off. The main downside for is that people want to hire me and not 353Solutions. This means I need to work harder to market the other people who I work with - I can't do everything.

Learning to say "no" was the hardest thing for me. So many interesting things to do, so many cool companies ... But I like spending time with my family, friends and hobbies. You need to find the things that make you happy and pay enough, going cheap is not a good thing in most cases. What I did in some cases was to take less money and get equity instead. Something like "technical" investing in startups.

The main point I need to improve is marketing. It's not something I like to do but feel the need, especially now that we have our own classes. I'm learning and looking for the best thing that will get maximal impact with minimal amount of time. Or maybe hire someone for that? If you know a good option  - please let me know :)

Thursday, June 04, 2015

Use contextlib.closing to Handle "Legacy" Resources

Python''s context managers (with statement) are very handy at handling resources. (You see way less finally in Python code due to them). Maybe objects in Python can be used as context mangers - files, locks, database drivers and more. But some objects still do not.

To handle these "legacy" objects you can use contextlib.closing function which will return a context manager that will call obj.close() once the context manager exists.

Here''s an example of using contextlib.closing with sockets. We''ll be doing a simple HTTP request (Yeah, you should probably use requests or urlopen - this is just an example :)

Note also the user of iter with sentinel to read chunks up to 1K from the socket.

Wednesday, May 06, 2015

Combining jQuery and Multi Components of React

React is a great library for generating reactive web UI (and mobile). Reacts works well if there are isolated components or a big one with hierarchy. However I wanted to have a page with several isolated react components that are updated from the same data. The solution I found is to use an observer pattern and have each components register a callback to handle data change. See the code below and a live demo here.

Note that I am a React newbie, if you know of a better way - please enlighten me.

Tuesday, April 21, 2015

Solving Project Euler Problem 8 with numpy

I''m teaching a course in scientific Python these days. Usually I give "homework" from project Euler (which I personally use every time I learn a new programming language). I thought it''ll be fun to solve the problem not just with Python but with what numpy has to offer as well.

Here''s an example solution for project Euler problem 8.

Tuesday, April 07, 2015

Docker + MiniConda = A Perfect Match

Working with one of my clients (who is hiring BTW), we decided to use Docker as deployment platform. Since many Linux systems now use Python for many utilities, it''s advisable to install your own Python next to the system one and use it.

Installing CPython from source requires some system packages, libraries, headers and some knowledge. The much easier path it to use MiniConda (from the wonderful people at Continuum). Not only the Python installation is super simple, but also the conda package manger will get you a lot of packages pre-compiled so you don''t have to install gcc and header files for C extensions. And if you can''t find the package you need with conda, pip is also available.

Here''s a little project to demonstrate how to do this. The application is an image server with has two entry points /edge for edge detection and /resize for image resizing. We''ll be using scikit-image and Pillow for image manipulation and Flask as web server. All of them can be conda installed.

Here''s the Dockerfile for the project. Build with docker build -tag imgsrv, Run with docker run -p 8080:8080 imgsrv (see Makefile).

Tuesday, February 24, 2015

Adding Server SSL Certificate on Linux

Here''s a small script to add a server SSL certificate on Linux. You can export the certificate from your browser. Inspired by this stackoverflow post.

Tuesday, February 03, 2015

Logging from Celery to logstash and a structured log (JSON)

I love Celery, and we''re using it at one of my customers. One thing we wanted to have is centralized logging, since you can have multiple workers on multiple machines. We looked at several solutions and the winner came out to be logstash + kibana (AKA ELK stack).

Here''s some code to log to logstash with Celery current task information (if available) and also to a structured log (every line is a JSON object) for backup in case of network issues.

Tuesday, January 27, 2015

Using supervisord to Manage You Daemons

Say you have some daemons running. You''d like to restart them automatically if they fail, grab logs from them and in general manage them - supervisord will do it all for you.

One of my (super cool) clients needed also to start/stop daemons when configuration changes. The solution was to have a script that updates supervisord.conf every time we have a configuration change and then selectively start/stop on the daemons that have change (by default if your run supervisorctl update, it will restart all the daemons).

For this example, I''ll assume that the daemon processes are python -m SimpleHTTPServer and I have a list of port I''d like to listen on. This list of ports might change.

Example Usage


    $ ./updated.py 8000 8001 8002

    $ supervisorctl status

    httpd-8000                       RUNNING   pid 31768, uptime 0:00:04

    httpd-8001                       RUNNING   pid 31767, uptime 0:00:04

    httpd-8002                       RUNNING   pid 31766, uptime 0:00:04

    $ ./updated.py 8000 8004 8002 # Remove 8001, add 8004

    httpd-8001: disappeared

    httpd-8004: available

    httpd-8001: stopped

    httpd-8001: removed process group

    httpd-8004: added process group

    $ supervisorctl status

    httpd-8000                       RUNNING   pid 31768, uptime 0:00:12

    httpd-8002                       RUNNING   pid 31766, uptime 0:00:12

    httpd-8004                       RUNNING   pid 31785, uptime 0:00:02

    $

    

Friday, January 09, 2015

python -m

python -m lets you run modules as scripts. If your module is just one .py file it'll be executed (which usually means code under if __name__ == '__main__'). If your module is a directory, Python will look for __main__.py (next to __init__.py) and will run it.

One of Python's mottoes is "batteries included", and this goes for python -m as well. Here are some (all?) of the gems hidden in the standard library. Sadly not all of them have help, but I poked around in the source code to see the usage.

json.tool

This is by far the one I use most, it'll indent nicely an JSON input in the standard output and very helpful in combination with curl.
$ curl -sL http://j.mp/1IuxaLD
[{"x":1,"y":2},{"x":3,"y":4},{"x":5,"y":6}]
$ curl -sL http://j.mp/1IuxaLD | python -m json.tool
[
    {
        "x": 1,
        "y": 2
    },
    {
        "x": 3,
        "y": 4
    },
    {
        "x": 5,
        "y": 6
    }
]

zipfile

zipfile will let you view, extract and create zip files - very much like the zip and unzip. Here's the help:
$ python -m zipfile -h
Usage:
    zipfile.py -l zipfile.zip        # Show listing of a zipfile
    zipfile.py -t zipfile.zip        # Test if a zipfile is valid
    zipfile.py -e zipfile.zip target # Extract zipfile into target dir
    zipfile.py -c zipfile.zip src ... # Create zipfile from sources

gzip

Like zipfile, let's you compress and decompress .gz files, like gzip/gunzip. By default it'll compress a file but with -d will decompress.
python -m gzip wordlist.txt  # Will create wordlist.txt.gz
python -m gzip -d wordlist.txt.gz  # Will extract to wordlist.txt

filecmp

Compare two directories.
$ python -m filecmp /tmp/a /tmp/b
diff /tmp/a /tmp/b
Only in /tmp/a : ['1']
Only in /tmp/b : ['2']
Identical files : ['4']
Differing files : ['3']

Encode/Decode

Several modules lets you encode/decode in various formats:
  • base64
  • uu
  • encodings.rot_13
  • binhex
  • mimify
  • quopri
For example
$ echo 'secertpassword' | python -m encodings.rot_13
frpregcnffjbeq

Servers

There are several servers that you can run, the ones I know are SimpleHTTPServer, CGIHTTPServer and smtpd (mail). If you quickly want to serve some files from a directory on your machine, just run:
python -m SimpleHTTPServer

Clients

Modules that provide simple clients to various protocols are:
  • ftplib
  • poplib
  • nntplib
  • smtplib (on localhost only)
  • telnetlib
For example if you want to view Star Wars in text mode, do
$ python -m telnetlib towel.blinkenlights.nl

System Info

You can use platform to get some platform information (very much line uname -a) and locale to get locale information. Use mimetype to get the mime type of a file:
$ python -m mimetypes doc.html
type: text/html encoding: None

Python Utilties

  • compileall will compile all Python files to .pyc
  • dis will show bytecode for a file
  • pdb will start the Python debugger on a given file (see here)
  • pydoc will show documentation on a module/class/function
  • site will print some site information (sys.path, USER_BASE ...)
  • sysconfig will show many system related information (such as exec_prefix)
  • tabnanny will tell you of you mix tabs and spaces (like starting python with -t or -tt)
  • tokenize will print list of tokens in Python file
I mostly use pdb and pydoc, for example:
$ python -m pydoc os.remove
Help on built-in function remove in os:

os.remove = remove(...)
    remove(path)
    
    Remove a file (same as unlink(path)).

Profiling

There are several profiles and timers you can use from the command line:
  • cProfile - Show profile information
  • profile (use cProfile :)
  • timeit - Time how long things run
  • pstats - Print output of profiles
  • trace - Show tracing information on run
Example:
$ python -m cProfile match.py
         28537 function calls (27503 primitive calls) in 0.057 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :1()
        1    0.000    0.000    0.000    0.000 :1(ArgInfo)
        1    0.000    0.000    0.000    0.000 :1(ArgSpec)
   ...

$ pyton -m timeit 'import math; math.factorial(100)'
100000 loops, best of 3: 12.9 usec per loop

timeit has good help from the command line.

IDLE

You can start IDLE by running python -m idlelib.idle

ensurepip

Python 2.7.9 and 3.x comes with an easy way to install pip. Run python -m ensurepip and pypi is at your service.


That's about it ... What are you favorite python -m tools? Which ones did I miss?

EDIT: The good folks at comp.lang.python reminded me a few I forgot:

unittest

python -m unittest discover will run unittest in discovery mode. Just drop a new Python file starting with test and it'll be picked up next time you run the tests. You can also specify a specific test to run with python -m unittest test_file.py TestClass.test_method.

calendar

python -m calendar will show calendar of the current year. You can also run python -m calendar YEAR to display a specific year and python -m calendar YEAR MONTH to display a specific month.

Easter Eggs

python -m this will display the Python Zen
python -m antigravity will open XKCD comic web page (which my company is named after).

Blog Archive