At a current project we're using Google's BigQuery to crunch some petabyte scale data. We have several SQL scripts that we need to run in specific order. The below script detects the table dependencies and run the SQL scripts in order. As a bonus you can run it with --view and it'll show you the dependency graph.
If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions
Monday, December 26, 2016
Friday, November 25, 2016
Using built-in slice for indexing
At one place I consult I saw something like the following code:
This is good and valid Python code, however we can use the slice built-in slice to do the same job.
Also when you're writing your own __getitem__ consider that key might be a slice object.
This is good and valid Python code, however we can use the slice built-in slice to do the same job.
Also when you're writing your own __getitem__ consider that key might be a slice object.
Saturday, November 12, 2016
Sunday, October 02, 2016
Friday, September 16, 2016
Simple Object Pools
Sometimes we need object pools to limit the number of resource consumed. The most common example is database connnections.
In Go we sometime use a buffered channel as a simple object pool.
In Python, we can dome something similar with a Queue. Python's context manager makes the resource handing automatic so clients don't need to remember to return the object.
Here's the output of both programs:
In Go we sometime use a buffered channel as a simple object pool.
In Python, we can dome something similar with a Queue. Python's context manager makes the resource handing automatic so clients don't need to remember to return the object.
Here's the output of both programs:
$ go run pool.go
worker 7 got resource 0
worker 0 got resource 2
worker 3 got resource 1
worker 8 got resource 2
worker 1 got resource 0
worker 9 got resource 1
worker 5 got resource 1
worker 4 got resource 0
worker 2 got resource 2
worker 6 got resource 1
$ python pool.py
worker 5 got resource 1
worker 8 got resource 2
worker 1 got resource 3
worker 4 got resource 1
worker 0 got resource 2
worker 7 got resource 3
worker 6 got resource 1
worker 3 got resource 2
worker 9 got resource 3
worker 2 got resource 1
Tuesday, August 30, 2016
"Manual" Breakpoints in Go
When debugging, sometimes you need to set conditional breakpoints. This option is available both in gdb and delve. However sometimes when the condition is complicated, it's hard or even impossible to set it. A way around is to temporary write the condition in Go and set breakpoint "manually".
I Python we do it with pdb.set_trace(), in Go we'll need to work a little harder. The main idea is that breakpoints are special signal called SIGTRAP.
Here's the code to do this:
You'll need tell the go tool not to optimize and keep variable information:
Then run a gdb session
When you hit the breakpoint, you'll be in assembly code. Exit two functions to get to your code
(gdb) fin
(gdb) fin
Then you'll be in your code and can run gdb commands
(gdb) p i
$1 = 3
This scheme also works with delve
$ dlv debug manual-bp.go
(dlv) c
Sadly delve don't have "fin" command so you'll need to hit "n" (next) until you reach your code.
That's it, happy debugging.
Oh - and in the very old days we did about the same trick in C code. There we manually inserted asm("int $3)" to the code. You can do with with cgo but sending a signal seems easier.
I Python we do it with pdb.set_trace(), in Go we'll need to work a little harder. The main idea is that breakpoints are special signal called SIGTRAP.
Here's the code to do this:
You'll need tell the go tool not to optimize and keep variable information:
$ go build -gcflags "-N -l" manual-bp
$ gdb manual-bp
(gdb) run
When you hit the breakpoint, you'll be in assembly code. Exit two functions to get to your code
(gdb) fin
(gdb) fin
Then you'll be in your code and can run gdb commands
(gdb) p i
$1 = 3
This scheme also works with delve
$ dlv debug manual-bp.go
(dlv) c
Sadly delve don't have "fin" command so you'll need to hit "n" (next) until you reach your code.
That's it, happy debugging.
Oh - and in the very old days we did about the same trick in C code. There we manually inserted asm("int $3)" to the code. You can do with with cgo but sending a signal seems easier.
Labels:
go
Wednesday, August 24, 2016
Generate Relation Diagram from GAE ndb Model
Working with GAE, we wanted to create relation diagram from out ndb model. By deferring the rendering to dot and using Python's reflection this became an easy task.
Some links are still missing since we're using ancestor queries, but this can be handled by some class docstring syntax or just manually editing the resulting dot file.
Tuesday, July 05, 2016
Friday, June 10, 2016
Work with AppEngine SDK in the REPL
Working again with AppEngine for Python. Here's a small code snippet that will let you work with your code in the REPL (much better than the previous solution).
What I do in IPython is:
And then I can work with my code and test things out.
What I do in IPython is:
In [1]: %run initgae.py
In [2]: %run app.py
And then I can work with my code and test things out.
Labels:
python
Monday, May 30, 2016
Using ImageMagick to Generate Images
One of the exercises we did this week in the Python workshop used the term "bounding box diagonal". I had a hard time to explain it to the students without an image. Google image search didn't find anything great, so I decided to create such an image.
First I tried with drawing programs, but couldn't make the rectangle a square and the circle non-oval. Then I remembered imagemagick, I have it installed and mostly use it to resize images - but it can do much more. A quick look at the examples and some trial and error, and here's the result.
First I tried with drawing programs, but couldn't make the rectangle a square and the circle non-oval. Then I remembered imagemagick, I have it installed and mostly use it to resize images - but it can do much more. A quick look at the examples and some trial and error, and here's the result.
And here's the script that generated it:
Saturday, April 16, 2016
Waiting for HTTP Server - Go Testing
I don't like mocking in tests. If I have a server to test, I prefer to start and instance and hit the API in my tests. Waiting for the server to start with a simple sleep is unpredictable. I prefer to start the server, try to hit an URL until it's OK or fail after a long timeout. This is a simple task with Go's select and time.After.
Tuesday, March 29, 2016
Slap a --help on it
Sometimes we write "one off" scripts to deal with certain task. However most often than not these scripts live more than just the one time. This is very common in ops related code that for some reason people don't apply the regular coding standards to.
It really upsets me when I try to see what a script is doing, run it with --help flag and it happily deletes the database while I wait :) It's so easy to add help support in the command line. In Python we do it with argparse, and we role our own in bash. Both cases it's extra 3 lines of code.
Please be kind to future self and add --help support to your scripts.
It really upsets me when I try to see what a script is doing, run it with --help flag and it happily deletes the database while I wait :) It's so easy to add help support in the command line. In Python we do it with argparse, and we role our own in bash. Both cases it's extra 3 lines of code.
Please be kind to future self and add --help support to your scripts.
Labels:
python
Friday, March 11, 2016
vfetch - Fetch Go Vendor Depedencies
Go 1.6 now supports vendoring. I found myself cloning dependencies to "vendor" directory, then cloning their dependencies ... This got old really fast so vfetch was born. It's a quick and dirty solution, uses "go get" with a temporary GOPATH to get the package and its dependencies, then uses rsync to copy them to the vendor directory.
Installing is the usual "go get github.com/tebeka/vfetch" then you can use "vfetch github.com/gorilla/mux".
Comment, ideas and pull requests are more than welcomed.
Installing is the usual "go get github.com/tebeka/vfetch" then you can use "vfetch github.com/gorilla/mux".
Comment, ideas and pull requests are more than welcomed.
Tuesday, March 08, 2016
Super Simple nvim UI
I've been playing with neovim lately, enjoying it and a leaner RC file. nvim comes currently only in terminal mode and I wanted a way to spin a new window for it. Here's a super simple script (I call it e) to start a new xfce4-terminal window with nvim.
Tuesday, February 23, 2016
Removing String Columns from a DataFrame
Sometimes you want to work just with numerical columns in a pandas DataFrame. The rule of thumb is that everything that has a type of object is something not numeric (you can get fancier with numpy.issubdtype). We're going to use the DataFrame dtypes with some boolean indexing to accomplish this.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([
...: [1, 2, 'a', 3],
...: [4, 5, 'b', 6],
...: [7, 8, 'c', 9],
...: ])
In [3]: df
Out[3]:
0 1 2 3
0 1 2 a 3
1 4 5 b 6
2 7 8 c 9
In [4]: df.dtypes
Out[4]:
0 int64
1 int64
2 object
3 int64
dtype: object
In [5]: df[df.columns[df.dtypes != object]]
Out[5]:
0 1 3
0 1 2 3
1 4 5 6
2 7 8 9
In [6]:
Labels:
python
Saturday, January 23, 2016
Forging Python - First Chapter is Up
Finally, first chapter of my upcoming book "Forging Python" is up. I'm doing it leanpub style so comments ans suggestions are more than welcomed.
I plan to finish the book this year, hopefully during the summer. However more than one person said I'm way too optimistic - time will tell :)
I plan to finish the book this year, hopefully during the summer. However more than one person said I'm way too optimistic - time will tell :)
Tuesday, January 05, 2016
353Solutions - 2015 in Review
Happy new year!
First full calendar year that 353solutions is operating. Let's start with the numbers and then some insights and future goals.
First full calendar year that 353solutions is operating. Let's start with the numbers and then some insights and future goals.
Numbers
- 170 days of work in total
- Work day is a day where I billed someone for some part of it
- Can be and hour can be 24 hours (when teaching abroad)
- There were total of 251 work days in 2015
- There were some work days that are not billable (drafting syllabuses, answering emails ...) but not that many
- 111 of days consulting to 4 clients
- 1st Go project!!!
- 58 days teaching 14 courses
- Python at all levels and scientific Python (including new async workshop)
- In UK, Poland and Israel
Insights
- Social network provided almost all the work
- Keep investing in good friends (not just for work :)
- Workshops pay way more than consulting
- However can't work from home in workshops
- Consulting keeps you updated with latest tech
- Had to let go of a client due to draconian contract
- No regrets here, it was the right decision
- Super nice team. Sadly lawyers had final say the company
- Python and data science are big and in high demand
- Delegating overhead to the right person helps a lot
- Accounting, contracts ...
Future Goals
- Keep positioning in Python and Scientific Python area
- Drive more Go projects and workshops
- Works less days, have same revenue at end of year
- Start some "public classes" where we rent a class and people show up
- Some companies don't have big enough data science team
- Need to invest in advertising
- Publish my book (more on that later)
Subscribe to:
Posts (Atom)