If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Monday, December 26, 2016

Automatically Running BigQuery Flows

At a current project we're using Google's BigQuery to crunch some petabyte scale data. We have several SQL scripts that we need to run in specific order. The below script detects the table dependencies and run the SQL scripts in order. As a bonus you can run it with --view and it'll show you the dependency graph.

Friday, November 25, 2016

Using built-in slice for indexing

At one place I consult I saw something like the following code:
def fn1(items, use_tail=False):
if use_tail:
key = items[-2:]
else:
key = items[0]
print(key)
items = [1, 2, 3, 4, 5]
fn1(items)
fn1(items, True)
view raw slice-1.py hosted with ❤ by GitHub
This is good and valid Python code, however we can use the slice built-in slice to do the same job.

def fn2(items, use_tail=False):
idx = slice(-2, None) if use_tail else 0
key = items[idx]
print(key)
items = [1, 2, 3, 4, 5]
fn2(items)
fn2(items, True)
view raw slice-2.py hosted with ❤ by GitHub
Also when you're writing your own __getitem__ consider that key might be a slice object.

Saturday, November 12, 2016

Friday, September 16, 2016

Simple Object Pools

Sometimes we need object pools to limit the number of resource consumed. The most common example is database connnections.

In Go we sometime use a buffered channel as a simple object pool.

// Pool in go using buffer channels
package main
import (
"fmt"
"math/rand"
"sync"
"time"
)
// worker simulates work of a goroutine
func worker(id int, pool chan int, start chan bool, wg *sync.WaitGroup) {
<-start // Wait for all goroutines
rsc := <-pool // Get item from the pool
defer func() { pool <- rsc }() // Return item at end
defer wg.Done() // Signal we're dong
time.Sleep(time.Duration(rand.Int()%100) * time.Millisecond)
fmt.Printf("worker %d got resource %d\n", id, rsc)
}
func main() {
var wg sync.WaitGroup
start := make(chan bool)
// Create and fill pool
pool := make(chan int, 3)
for i := 0; i < 3; i++ {
pool <- i
}
// Run workers
for i := 0; i < 10; i++ {
wg.Add(1)
go worker(i, pool, start, &wg)
}
close(start) // Signal to start
wg.Wait() // Wait for goroutines to finish before exiting
}
view raw pool.go hosted with ❤ by GitHub
In Python, we can dome something similar with a Queue. Python's context manager makes the resource handing automatic so clients don't need to remember to return the object.


"""Simple Pool object"""
from queue import Queue
class Proxy:
"""Wraps original object with context manager that return the object to the
pool."""
def __init__(self, obj, pool):
self._obj = obj
self._pool = pool
def __enter__(self):
return self._obj
def __exit__(self, typ, val, tb):
self._pool._put(self._obj)
class Pool:
"""Pool of objects"""
def __init__(self, objects):
self._queue = Queue()
for obj in objects:
self._queue.put(obj)
def lease(self):
"""Lease an object from the pool, should be used as contect manger. e.g.:
with pool.lease() as conn:
cur = conn.cursor()
cur.execute('SELECT ...')
"""
return Proxy(self._queue.get(), self)
def _put(self, obj):
self._queue.put(obj)
if __name__ == '__main__':
from threading import Thread, Barrier
from time import sleep
from random import random
n = 10
b = Barrier(n)
p = Pool([1, 2, 3])
def worker(n, barrier, pool):
barrier.wait() # Wait for all threads to be ready
sleep(random() / 10)
with pool.lease() as val:
print('worker %d got resource %d' % (n, val))
for i in range(n):
Thread(target=worker, args=(i, b, p)).start()
view raw pool.py hosted with ❤ by GitHub
Here's the output of both programs:


$ go run pool.go
worker 7 got resource 0
worker 0 got resource 2
worker 3 got resource 1
worker 8 got resource 2
worker 1 got resource 0
worker 9 got resource 1
worker 5 got resource 1
worker 4 got resource 0
worker 2 got resource 2
worker 6 got resource 1

$ python pool.py
worker 5 got resource 1
worker 8 got resource 2
worker 1 got resource 3
worker 4 got resource 1
worker 0 got resource 2
worker 7 got resource 3
worker 6 got resource 1
worker 3 got resource 2
worker 9 got resource 3
worker 2 got resource 1

Tuesday, August 30, 2016

"Manual" Breakpoints in Go

When debugging, sometimes you need to set conditional breakpoints. This option is available both in gdb and delve. However sometimes when the condition is complicated, it's hard or even impossible to set it. A way around is to temporary write the condition in Go and set breakpoint "manually".

I Python we do it with pdb.set_trace(), in Go we'll need to work a little harder. The main idea is that breakpoints are special signal called SIGTRAP.

Here's the code to do this:
// "Manual" breapoing in go
// Compile with
// go build -gcflags "-N -l" manual-bp.go
// Then
// gdb manual-bp
// (gdb) run
// When you hit breakpoint, call "fin" twice to get to the current location
// (gdb) fin
// (gdb) fin
// After that it's usuall gdb commands
// (gdb) p i
// $1 = 3
package main
import (
"fmt"
"syscall"
)
func foo() {
for i := 0; i < 5; i++ {
if i == 3 { // Some complicated condition
// Initiate a breakpoint
syscall.Kill(syscall.Getpid(), syscall.SIGTRAP)
}
fmt.Printf("i = %d\n", i)
}
}
func main() {
foo()
}
view raw manual-bp.go hosted with ❤ by GitHub
You'll need tell the go tool not to optimize and keep variable information:

$ go build -gcflags "-N -l" manual-bp

Then run a gdb session

$ gdb manual-bp 
(gdb) run 

 When you hit the breakpoint, you'll be in assembly code. Exit two functions to get to your code

(gdb) fin
(gdb) fin

Then you'll be in your code and can run gdb commands

(gdb) p i
$1 = 3

This scheme also works with delve

$ dlv debug manual-bp.go 
(dlv) c 

Sadly delve don't have "fin" command so you'll need to hit "n" (next) until you reach your code. 

That's it, happy debugging.

Oh - and in the very old days we did about the same trick in C code. There we manually inserted asm("int $3)" to the code. You can do with with cgo but sending a signal seems easier.

Wednesday, August 24, 2016

Generate Relation Diagram from GAE ndb Model

Working with GAE, we wanted to create relation diagram from out ndb model. By deferring the rendering to dot and using Python's reflection this became an easy task.
#!/usr/bin/env python2
"""Export GAE model to dot format."""
# Usage example:
# ./ndb2dot.py cool.model | dot -Tpng -o model.png
# Install graphviz to get the dot command line utility
from inspect import isclass
from os import environ
import imp
import sys
# Inject GAE SDK path
sys.path.insert(0, environ.get('GAE_PY_SDK', '/opt/google_appengine'))
import dev_appserver # noqa
dev_appserver.fix_sys_path()
from google.appengine.ext import ndb # noqa
header = '''\
digraph G {
graph [rankdir=LR];
node [shape=none];
'''
def load_module(name):
"""Load module from name"""
mod = None
for mod_name in name.split('.'):
file, pathname, desc = imp.find_module(mod_name, mod and mod.__path__)
mod = imp.load_module(mod_name, file, pathname, desc)
return mod
def models(module):
"""Sorted list (by name) of models in module"""
for attr in dir(module):
obj = getattr(module, attr)
if isclass(obj) and issubclass(obj, ndb.Model):
yield obj
def prop_type(prop):
"""Property type (string)"""
if isinstance(prop, ndb.StructuredProperty):
return prop._modelclass.__name__
name = prop.__class__.__name__
suffix = 'Property'
if name.endswith(suffix):
name = name[:-len(suffix)]
return name
def model2dot(model):
"""dot represtantation of model"""
cls = model.__name__
print('''\
%s [
label = <<table>
<tr><td colspan="2"><b>%s</b></td></tr>''' % (cls, cls))
links = []
for name, prop in sorted(model._properties.iteritems()):
if isinstance(prop, ndb.StructuredProperty):
port = 'port="%s"' % name
links.append((name, prop))
else:
port = ''
print('%s<tr><td %s>%s</td><td>%s</td></tr>' % (
' ' * 8, port, name, prop_type(prop)))
print('''\
</table>>
];''')
for name, prop in links:
print(' %s:%s -> %s;' % (cls, name, prop_type(prop)))
if __name__ == '__main__':
from operator import attrgetter
from argparse import ArgumentParser
parser = ArgumentParser(description=__doc__)
parser.add_argument('module', help='module name (e.g. "cool.model")')
args = parser.parse_args()
try:
mod = load_module(args.module)
except ImportError as err:
raise SystemExit('error: cannot load %r - %s' % (args.module, err))
print(header)
for model in sorted(models(mod), key=attrgetter('__name__')):
model2dot(model)
print('}')
view raw ndb2dot.py hosted with ❤ by GitHub
Some links are still missing since we're using ancestor queries, but this can be handled by some class docstring syntax or just manually editing the resulting dot file.

Tuesday, July 05, 2016

Friday, June 10, 2016

Work with AppEngine SDK in the REPL

Working again with AppEngine for Python. Here's a small code snippet that will let you work with your code in the REPL (much better than the previous solution).
"""Initianlize GAE Python environment so you can work with it from the REPL"""
from os import environ
import sys
sys.path.insert(0, environ.get('GAE_PY_SDK', '/opt/google_appengine'))
import dev_appserver # noqa
dev_appserver.fix_sys_path()
from google.appengine.ext import testbed # noqa
tb = testbed.Testbed()
tb.activate()
tb.init_all_stubs()
view raw initgae.py hosted with ❤ by GitHub
What I do in IPython is:
In [1]: %run initgae.py

In [2]: %run app.py

And then I can work with my code and test things out.

Monday, May 30, 2016

Using ImageMagick to Generate Images

One of the exercises we did this week in the Python workshop used the term "bounding box diagonal". I had a hard time to explain it to the students without an image. Google image search didn't find anything great, so I decided to create such an image.

First I tried with drawing programs, but couldn't make the rectangle a square and the circle non-oval. Then I remembered imagemagick, I have it installed and mostly use it to resize images - but it can do much more. A quick look at the examples and some trial and error, and here's the result.


And here's the script that generated it:
#!/bin/bash
# Generate "Bounding Box Diagonal" image
convert \
-size 400x400 \
xc:skyblue \
-fill red \
-draw "circle 200,200 0,200" \
-stroke black \
-strokewidth 2 \
-draw "line 0,0 400,400" \
-stroke skyblue \
-pointsize 40 \
-draw "rotate 45 text 250,10 \"bbd\"" \
bbd.png
view raw gen-bbd.sh hosted with ❤ by GitHub

Saturday, April 16, 2016

Waiting for HTTP Server - Go Testing

I don't like mocking in tests. If I have a server to test, I prefer to start and instance and hit the API in my tests. Waiting for the server to start with a simple sleep is unpredictable. I prefer to start the server, try to hit an URL until it's OK or fail after a long timeout. This is a simple task with Go's select and time.After.
func waitForServer(URL string, timeout time.Duration) error {
ch := make(chan bool)
go func() {
for {
_, err := http.Get(URL)
if err == nil {
ch <- true
}
time.Sleep(10 * time.Millisecond)
}
}()
select {
case <-ch:
return nil
case <-time.After(timeout):
return fmt.Errorf("server did not reply after %v", timeout)
}
}
view raw server_test.go hosted with ❤ by GitHub

Tuesday, March 29, 2016

Slap a --help on it

Sometimes we write "one off" scripts to deal with certain task. However most often than not these scripts live more than just the one time. This is very common in ops related code that for some reason people don't apply the regular coding standards to.

It really upsets me when I try to see what a script is doing, run it with --help flag and it happily deletes the database while I wait :) It's so easy to add help support in the command line. In Python we do it with argparse, and we role our own in bash. Both cases it's extra 3 lines of code.

from argparse import ArgumentParser
parser = ArgumentParser(description='Does something')
parser.parse_args()
view raw script.py hosted with ❤ by GitHub
case $1 in
-h | --help ) printf "usage: %s\n\nDoes something" $(basename $0);;
esac
view raw script.sh hosted with ❤ by GitHub
Please be kind to future self and add --help support to your scripts.

Friday, March 11, 2016

vfetch - Fetch Go Vendor Depedencies

Go 1.6 now supports vendoring. I found myself cloning dependencies to "vendor" directory, then cloning their dependencies ... This got old really fast so vfetch was born. It's a quick and dirty solution, uses "go get" with a temporary GOPATH to get the package and its dependencies, then uses rsync to copy them to the vendor directory.

Installing is the usual "go get github.com/tebeka/vfetch" then you can use "vfetch github.com/gorilla/mux".

Comment, ideas and pull requests are more than welcomed.

Tuesday, March 08, 2016

Super Simple nvim UI

I've been playing with neovim lately, enjoying it and a leaner RC file. nvim comes currently only in terminal mode and I wanted a way to spin a new window for it. Here's a super simple script (I call it e) to start a new xfce4-terminal window with nvim.
#!/usr/bin/env python
"""Run nvim in new terminal window"""
from subprocess import Popen
from sys import argv
cmd = [
'xfce4-terminal',
'-T', 'nvim',
'-e', ' '.join(['nvim'] + argv[1:])
]
Popen(cmd)
view raw e.py hosted with ❤ by GitHub

Tuesday, February 23, 2016

Removing String Columns from a DataFrame

Sometimes you want to work just with numerical columns in a pandas DataFrame. The rule of thumb is that everything that has a type of object is something not numeric (you can get fancier with numpy.issubdtype). We're going to use the DataFrame dtypes with some boolean indexing to accomplish this.

In [1]: import pandas as pd  

In [2]: df = pd.DataFrame([
   ...:     [1, 2, 'a', 3],
   ...:     [4, 5, 'b', 6],
   ...:     [7, 8, 'c', 9],
   ...: ])  

In [3]: df  
Out[3]: 
   0  1  2  3
0  1  2  a  3
1  4  5  b  6
2  7  8  c  9

In [4]: df.dtypes  
Out[4]: 
0     int64
1     int64
2    object
3     int64
dtype: object

In [5]: df[df.columns[df.dtypes != object]]
Out[5]: 
   0  1  3
0  1  2  3
1  4  5  6
2  7  8  9

In [6]:   

Saturday, January 23, 2016

Forging Python - First Chapter is Up

Finally, first chapter of my upcoming book "Forging Python" is up. I'm doing it leanpub style so comments ans suggestions are more than welcomed.

I plan to finish the book this year, hopefully during the summer. However more than one person said I'm way too optimistic - time will tell :)

Tuesday, January 05, 2016

353Solutions - 2015 in Review

Happy new year!

First full calendar year that 353solutions is operating. Let's start with the numbers and then some insights and future goals.

Numbers

  • 170 days of work in total
    • Work day is a day where I billed someone for some part of it
      • Can be and hour can be 24 hours (when teaching abroad)
    • There were total of 251 work days in 2015
    • There were some work days that are not billable (drafting syllabuses, answering emails ...) but not that many
  • 111 of days consulting to 4 clients
    • 1st Go project!!!
  • 58 days teaching 14 courses
    • Python at all levels and scientific Python (including new async workshop)
    • In UK, Poland and Israel

Insights

  • Social network provided almost all the work
    • Keep investing in good friends (not just for work :)
  • Workshops pay way more than consulting
    • However can't work from home in workshops
    • Consulting keeps you updated with latest tech
  • Had to let go of a client due to draconian contract
    • No regrets here, it was the right decision
    • Super nice team. Sadly lawyers had final say the company
  • Python and data science are big and in high demand
  • Delegating overhead to the right person helps a lot
    • Accounting, contracts ...

Future Goals

  • Keep positioning in Python and Scientific Python area
  • Drive more Go projects and workshops
  • Works less days, have same revenue at end of year
  • Start some "public classes" where we rent a class and people show up
    • Some companies don't have big enough data science team
    • Need to invest in advertising
  • Publish my book (more on that later)

Blog Archive