If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Thursday, July 26, 2018

Specifying test cases for pytest using TOML

Say you have a function that converts text and you'd like to test it. You can write a directory with input and output and use pytest.parameterize to iterate over the cases. The problem is that the input and the output are in different files and it's not obvious to see them next to each other.

If the text for testing is not that long, you can place all the cases in a configuration file. In this example I'll be using TOML format to hold the cases and each case will be in a table in array of tables. You can probably do the same with multi document YAML.

Here's the are the test cases
And here's the testing code (mask removes passwords from the text)

When running pytest, you'll see the following:

$ python -m pytest -v
========================================= test session starts =========================================
platform linux -- Python 3.7.0, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/miki/.local/share/virtualenvs/pytest-cmp-F3l45TQF/bin/python
cachedir: .pytest_cache
rootdir: /home/miki/Projects/pythonwise/pytest-cmp, inifile:
collected 3 items                                                                                     

test_mask.py::test_mask[passwd] PASSED                                                          [ 33%]
test_mask.py::test_mask[password] PASSED                                                        [ 66%]
test_mask.py::test_mask[no change] PASSED                                                       [100%]

====================================== 3 passed in 0.01 seconds =======================================

Saturday, June 09, 2018

pexify - Package Python scripts using PEX

Sometimes you'd like to publish a simple Python script, but it depends on some external packages (e.g. requests). The usual solution is to create a package, which is a great solution but requires some work and might not be suitable for internal packages.

Another solution is to use PEX, which creates an executable virtual environment. The user running the script just needs a Python interpreter from the same version installed on their machine (which is usually the case).

Converting a script to PEX requires some work, to make this simpler I wrote pexify. which automates the process of creating a PEX from a single Python script.

Wednesday, April 18, 2018

Creating a Book with pandoc

One of my clients asked me for a printable book to accompany one of my workshops. My teaching style is more fluid and we write a lot of code in class. I wanted a quick way to take all the source code files and some images and make a book out of them.

Since I already work with markdown, I looked for a solution that can convert markdown to PDF. The winner was Pandoc (which can be installed with conda). Of course the out-of-the-box results weren't satisfactory and some tweaking was required. Mostly to make images appear where they are define in the markdown and not where pandoc/LaTex thinks is the optimal location.

The solution is composed of an awk script to add an include directive to markdown, a custom LaTex header to inline images and a Makefile to bind them all.

You can view the whole project here, including a example book. You can view the output here (decorators anyone?).

Tuesday, April 10, 2018

Installing Arch Linux on Laptop with UEFI

After some time with Xubuntu I decided to get back to Arch Linux.

Arch have a command line based installer, the installation instructions are pretty good but for a laptop with UEFI I had to do some extra steps. Here's what I came up with, hope you'll find it useful as well.

Most of my files are backed on "the cloud", my home directory with all the RC files is on a private git repository. Getting up and running after the initial install was pretty easy.

Tuesday, February 27, 2018

Python's iter & functools.partial

Python's built-in iter function is mostly used to extract an iterator from an iterable (if you're confused by this, see Raymond's excellent answer on StackOverflow).

iter has a second form where it takes a function with not arguments and a sentinel value. This doesn't see that useful but check out the readline example in the documentation. iter will call the function repeatably until it the function return the sentinel value and then will stop.

What happens when the function  producing the values require some arguments? I this case we can use functools.partial to create a zero arguments function.

Here's an example of a simple HTTP client using raw sockets, we can directly use join on the iterator.

Sunday, February 11, 2018

353Solutions - 2017 in Review

A little late, but here's a summary of 2017.

Numbers

  • Total of 255 calendar days where worked for a customer (including partial days)
    • Up 62 days from 2016
    • Out of 260 work days in 2016
  • Median work day is 6:53h
  • Normalized work days (total divided by 8) is 171.6
    • Up from 158.1 normalized days in 2016
  • Of these 30 days were in workshops and the rest in consulting
  • 10 workshops
    • Down from 18 last year
    • First video course on Lynda (over 100K viewers already)
    • Teaching in Israel, UK, US and Poland
  • Several new client including PayPal, CommonSense Robotics, Iguazio and others
  • Revenue down by 5%
    • But earnings are up :)
  • First newsletter went out

Insights

  • Personal social network keep bringing all the work
  • Very big workshops customer cut down a lot - picked up the difference with more consulting with is less lucrative
  • Python & Data Science in demand for workshops
  • Much more Go in consulting
  • Free 1/2 day open class did not bring any work
    • However will do some more - it was fun
  • We need to get better on marketing open classes

Last Year's Goals

  • Work less while keeping same revenue
    • Failed here. Worked more to keep about the same revenue
  • Work more from home
    • Success here
  • Publish my book
    • I can't believe this is not done yet.
  • Publish a video course
    • Done (with two more to come)
  • More open enrollment classes
    • Tried that but not enough enrollment, need to get better at marketing

Goals for 2018

  • Work less while increasing revenue
  • Publish my book
    • Learned from last year and reserved serveral days this quarter to finish it
  • More workshops and less consulting
    • Two open enrollment workshops
    • Two free 1/2 workshops
  • Keep working from home
    • Attend at least 3 conferences
  • Give at least 4 talks in meetups or conferences
    • First talk was Jan 1 on PyWeb-IL, have another one slated for March
  • Get better at marketing

Friday, December 08, 2017

Advent of Code 2017 #8 and Python's eval

I'm having fun solving Advent of Code 2017. Problem 8 reminded the power of Python's eval (and before you start commenting the "eval is evil" may I remind you of this :)

You can check The Go implementation that don't have eval and need to work harder.

Friday, November 24, 2017

Python → Go Cheat Sheet


I'm teaching and consulting in Go a lot lately, and I work a lot with Python as well. There's a big trend of rewriting backend services in Go and to help people coming from the Python world I've created a Go →Python cheatsheet.

The code in here, I'd love to hear if you have suggestion (via a PR ;)

Thursday, September 14, 2017

Checking for Zero Values in Go

In Go, every type has a zero value. Which is the value a variable of this type get if it's not initialized. I had a configuration object of type map[string]interface{} and I needed to check if value exists and is not a zero value.

Here's a small piece of code that checks for zero values:

Saturday, July 15, 2017

Generating Power Set using Bitmap

I was asked to write a function that generate a power set of items. At first I wrote a recursive algorithms but then another approach came to mind. When you calculate how many subsets there are, you can say that each item in the original set can either be or not be in a subset, which means 2^n subsets. This yes/no for including can be seen as a bitmask, and since we know that there are 2^n subsets we can use the number from 0 to 2^n-1 as bitmasks.

Monday, June 19, 2017

Who Touched the Code Last? (git)

Sometimes I'd like to know who to ask about a piece of code. I've developed a little Python script that shows the last people who touch a file/directory and the ones who touched it most.

Example output (on arrow project)
$ owners
Most: Wes McKinney (39.5%), Uwe L. Korn (15.3%), Kouhei Sutou (10.8%)
Last: Kengo Seki (31M), Uwe L. Korn (39M), Max Risuhin (23H)

So ask Wes or Uwe :)

Here's the code:

Monday, June 12, 2017

Go's append vs copy

When we'd like to concatenates slices in Go, most people usually reach out for append. Most of the time this solution is OK, however if you need to squeeze more performance - copy will be better.

EDIT: As Henrik Johansson suggested, if you pre-allocate the slice to append it'll fast as well.

Sunday, May 14, 2017

scikit-learn Compatible Pipeline Steps

A client wanted a way to create a pipeline of transformations on DataFrames. Since they already work with scikit-learn, they were familiar with Pipelines. It took very little code to create a base class for a pipeline step that will work with DataFrames.


If you find this interesting, you might want to hire me ;)

Wednesday, March 29, 2017

Color Log Lines

There are several log handler out there the color log lines. However I prefer to leave the log as is and when I want colors I pipe the log via a utility that does that. Coloring log lines by levels can be easily done with a little awk.

Here's an example how it looks

Monday, January 09, 2017

353Solutions - 2016 in Review

Happy new year!

Let's see how 353solutions did in 2016. We'll start with the numbers and then some insights and future goals.

Numbers
  • Total of 193 work days (days where customer were billed)
    • Up 23 days from 2015
    • Out of 261 work days in 2016
  • Out of these 61 days are in workshops and the rest consulting
  • 18 workshops
    • Up from 14 in 2015
    • Go workshops
    • First docker workshop (teaching with ops specialist)
    • First open enrollment class
    • In Israel, UK, and Poland
      • Only 4 out of the country (vs 8 last year, which is good - I prefer to fly less)
    • Workshops range from 1 to 4 days
  • Revenue up 10% from 2016
  • Several new clients including GE, AOL, Outbrain, EMC and more
Insights
  • Social network keeps providing almost all the work
    • Doing good work will get you more work
  • Workshops pay way more than consulting
    • However can't work from home in workshops
    • Consulting keeps you updated with latest tech
  • Python and data science are big and in high demand
    • However Go is getting traction
  • Having someone else take care of the "overhead" (billing, contracts ...) is great help.
Last Year's Goals
We've set some goals in 2015, let's see how we did:
  • Keep positioning in Python and Scientific Python area
    • Done
  • Drive more Go projects and workshops
    • Done
  • Works less days, have same revenue at end of year
    • Failed here. Worked more and revenue went up
  • Start some "public classes" where we rent a class and people show up
    • Did only one this year
  • Publish my book
    • Failed here, though I'm close. Probably Q1 this year.
Goals for 2017
  • Work less while keeping same revenue
  • Work more from home
  • Publish my book
  • Publish a video course
  • More open enrollment classes

Monday, December 26, 2016

Automatically Running BigQuery Flows

At a current project we're using Google's BigQuery to crunch some petabyte scale data. We have several SQL scripts that we need to run in specific order. The below script detects the table dependencies and run the SQL scripts in order. As a bonus you can run it with --view and it'll show you the dependency graph.

Friday, November 25, 2016

Using built-in slice for indexing

At one place I consult I saw something like the following code:
This is good and valid Python code, however we can use the slice built-in slice to do the same job.

Also when you're writing your own __getitem__ consider that key might be a slice object.

Saturday, November 12, 2016

Blog Archive