If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, March 20, 2019

Speed: Default value vs checking for None

Python's dict has a get method. It'll either return an existing value for a given key or return a default value if the key is not in the dict. It's very tempting to write code like val = d.get(key, Object()), however you need to think about the performance implications. Since function arguments are evaluated before calling the function, this means the a new Object will be created regardless if it's in the dict or not. Let's see this affects performance.

get_default will create new Point every time and get_none will create only if there's no such object, it works since or evaluate it's arguments lazily and will stop once the first one is True.

First we'll try with a missing key:

In [1]: %run default_vs_none.py                                     
In [2]: locations = {}  # name -> Location 
In [3]: %timeit get_default(locations, 'carmen')
384 ns ± 2.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit get_none(locations, 'carmen')
394 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Not so much difference. However if the key exists:

In [5]: locations['carmen'] = Location(7, 3)
In [6]: %timeit get_default(locations, 'carmen')                 
377 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit get_none(locations, 'carmen')
135 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

We get much faster results.

Monday, March 04, 2019

CPU Affinity in Go

Go's concurrency unit is a goroutine, the Go runtime multiplexes goroutines to operating system (OS) threads. At an upper level, the OS maps threads to CPUs (or cores). To see this goroutine/thread migration, you'll need to use some C code (note that this is Linux specific).


If you run this code and sort the output, you'll see the workers moving between cores:
$ go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 0, CPU: 1
worker: 0, CPU: 2
worker: 0, CPU: 3
worker: 1, CPU: 0
worker: 1, CPU: 1
worker: 1, CPU: 2
worker: 1, CPU: 3
worker: 2, CPU: 0
worker: 2, CPU: 1
worker: 2, CPU: 2
worker: 2, CPU: 3
worker: 3, CPU: 0
worker: 3, CPU: 1
worker: 3, CPU: 2
worker: 3, CPU: 3

There is a cost for moving a thread from one core to another (see more here). In some cases this cost might be unacceptable and you'd like to pin a goroutine to a specific CPU.

Go has runtime.LockOSThread which pins the current goroutine to the current thread it's running on. For the rest of the way - pinning the thread to a CPU, you'll need to use C again.

Now if you run our code with the LOCK environment variable set, you can see each worker is pinned to a specific core.
$ LOCK=1 go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 1, CPU: 1
worker: 2, CPU: 2
worker: 3, CPU: 3

Blog Archive