If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, March 20, 2019

Speed: Default value vs checking for None

Python's dict has a get method. It'll either return an existing value for a given key or return a default value if the key is not in the dict. It's very tempting to write code like val = d.get(key, Object()), however you need to think about the performance implications. Since function arguments are evaluated before calling the function, this means the a new Object will be created regardless if it's in the dict or not. Let's see this affects performance.

get_default will create new Point every time and get_none will create only if there's no such object, it works since or evaluate it's arguments lazily and will stop once the first one is True.

First we'll try with a missing key:

In [1]: %run default_vs_none.py                                     
In [2]: locations = {}  # name -> Location 
In [3]: %timeit get_default(locations, 'carmen')
384 ns ± 2.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit get_none(locations, 'carmen')
394 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Not so much difference. However if the key exists:

In [5]: locations['carmen'] = Location(7, 3)
In [6]: %timeit get_default(locations, 'carmen')                 
377 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit get_none(locations, 'carmen')
135 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

We get much faster results.

Monday, March 04, 2019

CPU Affinity in Go

Go's concurrency unit is a goroutine, the Go runtime multiplexes goroutines to operating system (OS) threads. At an upper level, the OS maps threads to CPUs (or cores). To see this goroutine/thread migration, you'll need to use some C code (note that this is Linux specific).


If you run this code and sort the output, you'll see the workers moving between cores:
$ go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 0, CPU: 1
worker: 0, CPU: 2
worker: 0, CPU: 3
worker: 1, CPU: 0
worker: 1, CPU: 1
worker: 1, CPU: 2
worker: 1, CPU: 3
worker: 2, CPU: 0
worker: 2, CPU: 1
worker: 2, CPU: 2
worker: 2, CPU: 3
worker: 3, CPU: 0
worker: 3, CPU: 1
worker: 3, CPU: 2
worker: 3, CPU: 3

There is a cost for moving a thread from one core to another (see more here). In some cases this cost might be unacceptable and you'd like to pin a goroutine to a specific CPU.

Go has runtime.LockOSThread which pins the current goroutine to the current thread it's running on. For the rest of the way - pinning the thread to a CPU, you'll need to use C again.

Now if you run our code with the LOCK environment variable set, you can see each worker is pinned to a specific core.
$ LOCK=1 go run affinity.go | sort -u
worker: 0, CPU: 0
worker: 1, CPU: 1
worker: 2, CPU: 2
worker: 3, CPU: 3

Thursday, January 17, 2019

Place a string in the middle of the screen - in bash

For ages my shell is greeting me with "Let's boogie, Mr Swine". I had it printed in fixed offset but following a discussion in comp.lang.python I decided to get the screen size and calculate the exact location.

I'm using tput to get the size and the fact that printf can get the width of the string as a parameter if you specify a * there (BTW - Python supports that as well)

Here's the code.

Sunday, January 13, 2019

353solutions - 2018 in Review

353Solutions - 2018 in Review

Happy new year! Here’s a summary of 353solution’s 2018.

Numbers

  • Total of 242 calendar days where worked for a customer (including partial days)
    • Down 11 days from 2017
    • Out of 261 work days in 2018
  • Median consulting work day is 3:36h
    • Workshops are usually a full day
  • Normalized work days (total divided by 8) is 203.3
    • Up from 171.6 normalized days in 2017
  • Of these 53.5 days were in workshops and the rest in consulting
    • Up from 30 in 2017
  • 13 workshops
    • Up from 10 last year
    • 4 more video courses on LinkedIn Learning/Lynda (over 300K viewers already)
    • Teaching in Israel, UK and the US and Germany
  • Several new client including MGT, Gett, Actiview and others
  • Revenue up by 20%
    • 40% from workshops

Insights

  • Personal social network keep bringing all the work
  • Go is exploding
    • GopherCon Israel brought many connections
    • Much more Go in consulting
  • Python & Data Science in demand for workshops
    • As well as Go
  • We need to get better on marketing open classes
    • Might have an alliance for that

Last Year’s Goals

  • Work less while increasing revenue
    • Mixed results here. Worked more but revenue is up ☺
  • Publish my book
    • Done: https://forging-python.com
  • More workshops and less consulting
    • Done
  • Two open enrollment workshops
    • Nope ☹
  • Two free 1/2 day workshops
    • Nope ☹
  • Keep working from home
    • Done
  • Attend at least 3 conferences
    • Only PyCon Israel
  • Give at least 4 talks in meetups or conferences
    • Done (2 PyWeb IL, 1 Go Israel, 1 PyCon Israel, 1 Big Data)
  • Get better at marketing
    • Decided marketing is the open source work I’m doing.

Goals for 2019

  • GopherCon Israel (February 11, 2019)
  • Work less while increasing revenue
  • Attend at least 3 conferences
  • Give at least 4 talks in meetups or conferences
  • Two free 1/2 day workshops
  • Start another book

Tuesday, November 13, 2018

direnv

I use the command line a lot. Some projects require different settings, say Python virtual environment, GOPATH for installing go packages and more.

I'm using direnv to help with settings per project in the terminal. For every project I have a .envrc file which specifies required settings, this file is automatically loaded once I change directory to the project directory or any of it's sub directories.

You'll need the following in your .zshrc

if whence direnv > /dev/null; then
    eval "$(direnv hook zsh)"
fi

Every time you create or change your .envrc, you'll need to run direnv allow to validate it and make sure it's loaded. (If you did some changes and want to check them, run "cd .")

Here are some .envrc examples for various scenarios:

Python + pipenv

source $(pipenv --venv)/bin/activate

Go

GOPATH=$(pwd | sed s"#/src/.*##")
PATH=${GOPATH}/bin:${PATH}

This assumes your project's path that looks like /path/to/project/src/github.com/project

If you're using the new go modules (in 1.11+), you probably don't need this.

Python + virtualenv

source venv/bin/activate

Python + conda

source activate env-name

Replace env-name with the name of your conda environment.

Wednesday, November 07, 2018

Go, protobuf & JSON

Sometimes you'd like more than one way to serve an API. In my case I'm currently working on serving both gRPC and HTTP. I'd like to have one place where objects are defined and have a nice way to serialize both from protobuf (which is the serialization gRPC uses) and JSON .

When producing Go code, protobuf adds JSON struct tags. However since JSON comes from dynamic languages, fields can have any type. In Go we can use map[string]interface{} but in protobuf this is a bit more complicated and we need to use oneof. The struct generated by oneof does not look like regular JSON and will make users of the API write complicated JSON structures.

What's nice about Go, is that we can have any type implement json.Marshaler and json.Unmarshaler. What's extra nice is that in Go, you can add these methods to the generated structs in another file (in Python, we'd have to change the generated source code since methods need to be inside the class definition).

Let's have a look at a simple Job definition


And now we can add some helper methods to aid with JSON serialization (protoc generates code to pb directory)


As a bonus, we added job.Properties that returns a "native" map[string]interface{}

Let's look at a simple example on how we can use it

And its output:
$ go run job.go
[j1]  user:"Saitama" count:1 properties: > properties:
[json]  {"user":"Saitama","count":1,"properties":{"retries":3,"target":"Metal Knight"}}

[j2]  user:"Saitama" count:1 properties: > properties:

Thursday, July 26, 2018

Specifying test cases for pytest using TOML

Say you have a function that converts text and you'd like to test it. You can write a directory with input and output and use pytest.parameterize to iterate over the cases. The problem is that the input and the output are in different files and it's not obvious to see them next to each other.

If the text for testing is not that long, you can place all the cases in a configuration file. In this example I'll be using TOML format to hold the cases and each case will be in a table in array of tables. You can probably do the same with multi document YAML.

Here's the are the test cases
And here's the testing code (mask removes passwords from the text)

When running pytest, you'll see the following:

$ python -m pytest -v
========================================= test session starts =========================================
platform linux -- Python 3.7.0, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/miki/.local/share/virtualenvs/pytest-cmp-F3l45TQF/bin/python
cachedir: .pytest_cache
rootdir: /home/miki/Projects/pythonwise/pytest-cmp, inifile:
collected 3 items                                                                                     

test_mask.py::test_mask[passwd] PASSED                                                          [ 33%]
test_mask.py::test_mask[password] PASSED                                                        [ 66%]
test_mask.py::test_mask[no change] PASSED                                                       [100%]

====================================== 3 passed in 0.01 seconds =======================================

Saturday, June 09, 2018

pexify - Package Python scripts using PEX

Sometimes you'd like to publish a simple Python script, but it depends on some external packages (e.g. requests). The usual solution is to create a package, which is a great solution but requires some work and might not be suitable for internal packages.

Another solution is to use PEX, which creates an executable virtual environment. The user running the script just needs a Python interpreter from the same version installed on their machine (which is usually the case).

Converting a script to PEX requires some work, to make this simpler I wrote pexify. which automates the process of creating a PEX from a single Python script.

Wednesday, April 18, 2018

Creating a Book with pandoc

One of my clients asked me for a printable book to accompany one of my workshops. My teaching style is more fluid and we write a lot of code in class. I wanted a quick way to take all the source code files and some images and make a book out of them.

Since I already work with markdown, I looked for a solution that can convert markdown to PDF. The winner was Pandoc (which can be installed with conda). Of course the out-of-the-box results weren't satisfactory and some tweaking was required. Mostly to make images appear where they are define in the markdown and not where pandoc/LaTex thinks is the optimal location.

The solution is composed of an awk script to add an include directive to markdown, a custom LaTex header to inline images and a Makefile to bind them all.

You can view the whole project here, including a example book. You can view the output here (decorators anyone?).

Tuesday, April 10, 2018

Installing Arch Linux on Laptop with UEFI

After some time with Xubuntu I decided to get back to Arch Linux.

Arch have a command line based installer, the installation instructions are pretty good but for a laptop with UEFI I had to do some extra steps. Here's what I came up with, hope you'll find it useful as well.

Most of my files are backed on "the cloud", my home directory with all the RC files is on a private git repository. Getting up and running after the initial install was pretty easy.

Tuesday, February 27, 2018

Python's iter & functools.partial

Python's built-in iter function is mostly used to extract an iterator from an iterable (if you're confused by this, see Raymond's excellent answer on StackOverflow).

iter has a second form where it takes a function with not arguments and a sentinel value. This doesn't see that useful but check out the readline example in the documentation. iter will call the function repeatably until it the function return the sentinel value and then will stop.

What happens when the function  producing the values require some arguments? I this case we can use functools.partial to create a zero arguments function.

Here's an example of a simple HTTP client using raw sockets, we can directly use join on the iterator.

Sunday, February 11, 2018

353Solutions - 2017 in Review

A little late, but here's a summary of 2017.

Numbers

  • Total of 255 calendar days where worked for a customer (including partial days)
    • Up 62 days from 2016
    • Out of 260 work days in 2016
  • Median work day is 6:53h
  • Normalized work days (total divided by 8) is 171.6
    • Up from 158.1 normalized days in 2016
  • Of these 30 days were in workshops and the rest in consulting
  • 10 workshops
    • Down from 18 last year
    • First video course on Lynda (over 100K viewers already)
    • Teaching in Israel, UK, US and Poland
  • Several new client including PayPal, CommonSense Robotics, Iguazio and others
  • Revenue down by 5%
    • But earnings are up :)
  • First newsletter went out

Insights

  • Personal social network keep bringing all the work
  • Very big workshops customer cut down a lot - picked up the difference with more consulting with is less lucrative
  • Python & Data Science in demand for workshops
  • Much more Go in consulting
  • Free 1/2 day open class did not bring any work
    • However will do some more - it was fun
  • We need to get better on marketing open classes

Last Year's Goals

  • Work less while keeping same revenue
    • Failed here. Worked more to keep about the same revenue
  • Work more from home
    • Success here
  • Publish my book
    • I can't believe this is not done yet.
  • Publish a video course
    • Done (with two more to come)
  • More open enrollment classes
    • Tried that but not enough enrollment, need to get better at marketing

Goals for 2018

  • Work less while increasing revenue
  • Publish my book
    • Learned from last year and reserved serveral days this quarter to finish it
  • More workshops and less consulting
    • Two open enrollment workshops
    • Two free 1/2 workshops
  • Keep working from home
    • Attend at least 3 conferences
  • Give at least 4 talks in meetups or conferences
    • First talk was Jan 1 on PyWeb-IL, have another one slated for March
  • Get better at marketing

Friday, December 08, 2017

Advent of Code 2017 #8 and Python's eval

I'm having fun solving Advent of Code 2017. Problem 8 reminded the power of Python's eval (and before you start commenting the "eval is evil" may I remind you of this :)

You can check The Go implementation that don't have eval and need to work harder.

Friday, November 24, 2017

Python → Go Cheat Sheet


I'm teaching and consulting in Go a lot lately, and I work a lot with Python as well. There's a big trend of rewriting backend services in Go and to help people coming from the Python world I've created a Go →Python cheatsheet.

The code in here, I'd love to hear if you have suggestion (via a PR ;)

Thursday, September 14, 2017

Checking for Zero Values in Go

In Go, every type has a zero value. Which is the value a variable of this type get if it's not initialized. I had a configuration object of type map[string]interface{} and I needed to check if value exists and is not a zero value.

Here's a small piece of code that checks for zero values:

Saturday, July 15, 2017

Generating Power Set using Bitmap

I was asked to write a function that generate a power set of items. At first I wrote a recursive algorithms but then another approach came to mind. When you calculate how many subsets there are, you can say that each item in the original set can either be or not be in a subset, which means 2^n subsets. This yes/no for including can be seen as a bitmask, and since we know that there are 2^n subsets we can use the number from 0 to 2^n-1 as bitmasks.

Monday, June 19, 2017

Who Touched the Code Last? (git)

Sometimes I'd like to know who to ask about a piece of code. I've developed a little Python script that shows the last people who touch a file/directory and the ones who touched it most.

Example output (on arrow project)
$ owners
Most: Wes McKinney (39.5%), Uwe L. Korn (15.3%), Kouhei Sutou (10.8%)
Last: Kengo Seki (31M), Uwe L. Korn (39M), Max Risuhin (23H)

So ask Wes or Uwe :)

Here's the code:

Monday, June 12, 2017

Go's append vs copy

When we'd like to concatenates slices in Go, most people usually reach out for append. Most of the time this solution is OK, however if you need to squeeze more performance - copy will be better.

EDIT: As Henrik Johansson suggested, if you pre-allocate the slice to append it'll fast as well.

Sunday, May 14, 2017

scikit-learn Compatible Pipeline Steps

A client wanted a way to create a pipeline of transformations on DataFrames. Since they already work with scikit-learn, they were familiar with Pipelines. It took very little code to create a base class for a pipeline step that will work with DataFrames.


If you find this interesting, you might want to hire me ;)

Blog Archive