PythonWise

If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, March 20, 2019

Speed: Default value vs checking for None

Python's dict has a get method. It'll either return an existing value for a given key or return a default value if the key is not in the dict. It's very tempting to write code like val = d.get(key, Object()), however you need to think about the performance implications. Since function arguments are evaluated before calling the function, this means the a new Object will be created regardless if it's in the dict or not. Let's see this affects performance.

get_default will create new Point every time and get_none will create only if there's no such object, it works since or evaluate it's arguments lazily and will stop once the first one is True.

First we'll try with a missing key:

In [1]: %run default_vs_none.py                                     
In [2]: locations = {}  # name -> Location 
In [3]: %timeit get_default(locations, 'carmen')
384 ns ± 2.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit get_none(locations, 'carmen')
394 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Not so much difference. However if the key exists:

In [5]: locations['carmen'] = Location(7, 3)
In [6]: %timeit get_default(locations, 'carmen')                 
377 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit get_none(locations, 'carmen')
135 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

We get much faster results.

Monday, March 04, 2019

CPU Affinity in Go

Go's concurrency unit is a goroutine, the Go runtime multiplexes goroutines to operating system (OS) threads. At an upper level, the OS maps threads to CPUs (or cores). To see this goroutine/thread migration, you'll need to use some C code (note that this is Linux specific).


If you run this code and sort the output, you'll see the workers moving between cores:
$ go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 0, CPU: 1
worker: 0, CPU: 2
worker: 0, CPU: 3
worker: 1, CPU: 0
worker: 1, CPU: 1
worker: 1, CPU: 2
worker: 1, CPU: 3
worker: 2, CPU: 0
worker: 2, CPU: 1
worker: 2, CPU: 2
worker: 2, CPU: 3
worker: 3, CPU: 0
worker: 3, CPU: 1
worker: 3, CPU: 2
worker: 3, CPU: 3

There is a cost for moving a thread from one core to another (see more here). In some cases this cost might be unacceptable and you'd like to pin a goroutine to a specific CPU.

Go has runtime.LockOSThread which pins the current goroutine to the current thread it's running on. For the rest of the way - pinning the thread to a CPU, you'll need to use C again.

Now if you run our code with the LOCK environment variable set, you can see each worker is pinned to a specific core.
$ LOCK=1 go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 1, CPU: 1
worker: 2, CPU: 2
worker: 3, CPU: 3

Thursday, January 17, 2019

Place a string in the middle of the screen - in bash

For ages my shell is greeting me with "Let's boogie, Mr Swine". I had it printed in fixed offset but following a discussion in comp.lang.python I decided to get the screen size and calculate the exact location.

I'm using tput to get the size and the fact that printf can get the width of the string as a parameter if you specify a * there (BTW - Python supports that as well)

Here's the code.

Sunday, January 13, 2019

353solutions - 2018 in Review

353Solutions - 2018 in Review

Happy new year! Here’s a summary of 353solution’s 2018.

Numbers

  • Total of 242 calendar days where worked for a customer (including partial days)
    • Down 11 days from 2017
    • Out of 261 work days in 2018
  • Median consulting work day is 3:36h
    • Workshops are usually a full day
  • Normalized work days (total divided by 8) is 203.3
    • Up from 171.6 normalized days in 2017
  • Of these 53.5 days were in workshops and the rest in consulting
    • Up from 30 in 2017
  • 13 workshops
    • Up from 10 last year
    • 4 more video courses on LinkedIn Learning/Lynda (over 300K viewers already)
    • Teaching in Israel, UK and the US and Germany
  • Several new client including MGT, Gett, Actiview and others
  • Revenue up by 20%
    • 40% from workshops

Insights

  • Personal social network keep bringing all the work
  • Go is exploding
    • GopherCon Israel brought many connections
    • Much more Go in consulting
  • Python & Data Science in demand for workshops
    • As well as Go
  • We need to get better on marketing open classes
    • Might have an alliance for that

Last Year’s Goals

  • Work less while increasing revenue
    • Mixed results here. Worked more but revenue is up ☺
  • Publish my book
    • Done: https://forging-python.com
  • More workshops and less consulting
    • Done
  • Two open enrollment workshops
    • Nope ☹
  • Two free 1/2 day workshops
    • Nope ☹
  • Keep working from home
    • Done
  • Attend at least 3 conferences
    • Only PyCon Israel
  • Give at least 4 talks in meetups or conferences
    • Done (2 PyWeb IL, 1 Go Israel, 1 PyCon Israel, 1 Big Data)
  • Get better at marketing
    • Decided marketing is the open source work I’m doing.

Goals for 2019

  • GopherCon Israel (February 11, 2019)
  • Work less while increasing revenue
  • Attend at least 3 conferences
  • Give at least 4 talks in meetups or conferences
  • Two free 1/2 day workshops
  • Start another book

Tuesday, November 13, 2018

direnv

I use the command line a lot. Some projects require different settings, say Python virtual environment, GOPATH for installing go packages and more.

I'm using direnv to help with settings per project in the terminal. For every project I have a .envrc file which specifies required settings, this file is automatically loaded once I change directory to the project directory or any of it's sub directories.

You'll need the following in your .zshrc

if whence direnv > /dev/null; then
    eval "$(direnv hook zsh)"
fi

Every time you create or change your .envrc, you'll need to run direnv allow to validate it and make sure it's loaded. (If you did some changes and want to check them, run "cd .")

Here are some .envrc examples for various scenarios:

Python + pipenv

source $(pipenv --venv)/bin/activate

Go

GOPATH=$(pwd | sed s"#/src/.*##")
PATH=${GOPATH}/bin:${PATH}

This assumes your project's path that looks like /path/to/project/src/github.com/project

If you're using the new go modules (in 1.11+), you probably don't need this.

Python + virtualenv

source venv/bin/activate

Python + conda

source activate env-name

Replace env-name with the name of your conda environment.

Wednesday, November 07, 2018

Go, protobuf & JSON

Sometimes you'd like more than one way to serve an API. In my case I'm currently working on serving both gRPC and HTTP. I'd like to have one place where objects are defined and have a nice way to serialize both from protobuf (which is the serialization gRPC uses) and JSON .

When producing Go code, protobuf adds JSON struct tags. However since JSON comes from dynamic languages, fields can have any type. In Go we can use map[string]interface{} but in protobuf this is a bit more complicated and we need to use oneof. The struct generated by oneof does not look like regular JSON and will make users of the API write complicated JSON structures.

What's nice about Go, is that we can have any type implement json.Marshaler and json.Unmarshaler. What's extra nice is that in Go, you can add these methods to the generated structs in another file (in Python, we'd have to change the generated source code since methods need to be inside the class definition).

Let's have a look at a simple Job definition


And now we can add some helper methods to aid with JSON serialization (protoc generates code to pb directory)


As a bonus, we added job.Properties that returns a "native" map[string]interface{}

Let's look at a simple example on how we can use it

And its output:
$ go run job.go
[j1]  user:"Saitama" count:1 properties: > properties:
[json]  {"user":"Saitama","count":1,"properties":{"retries":3,"target":"Metal Knight"}}

[j2]  user:"Saitama" count:1 properties: > properties:

Thursday, July 26, 2018

Specifying test cases for pytest using TOML

Say you have a function that converts text and you'd like to test it. You can write a directory with input and output and use pytest.parameterize to iterate over the cases. The problem is that the input and the output are in different files and it's not obvious to see them next to each other.

If the text for testing is not that long, you can place all the cases in a configuration file. In this example I'll be using TOML format to hold the cases and each case will be in a table in array of tables. You can probably do the same with multi document YAML.

Here's the are the test cases
And here's the testing code (mask removes passwords from the text)

When running pytest, you'll see the following:

$ python -m pytest -v
========================================= test session starts =========================================
platform linux -- Python 3.7.0, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/miki/.local/share/virtualenvs/pytest-cmp-F3l45TQF/bin/python
cachedir: .pytest_cache
rootdir: /home/miki/Projects/pythonwise/pytest-cmp, inifile:
collected 3 items                                                                                     

test_mask.py::test_mask[passwd] PASSED                                                          [ 33%]
test_mask.py::test_mask[password] PASSED                                                        [ 66%]
test_mask.py::test_mask[no change] PASSED                                                       [100%]

====================================== 3 passed in 0.01 seconds =======================================

Blog Archive