If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, January 11, 2012

fastavro with Cython

Added an optional step of compiling fastavro with Cython. Just doing that, with no Cython specific code reduced the time of processing 10K records from 2.9sec to 1.7sec. Not bad for that little work.

Also added a __main__.py so you can use fastavro to process Avro files:

  • python -m fastavro weather.avro # Dump records in JSON format
  • python -m fastavro --schema weather.avro # Dump schema

Friday, January 06, 2012

fastavro

Just released fastavro to PyPI. It has way less features than the official avro package, but according to my tests it's about 5 times faster.

Tuesday, December 27, 2011

How To Use A Chat To Amplify Your Team

A chat room is a great tool for any team. I've worked both in distributed and "local" team and in both cases a central chat room was one of the most effective tools we used. The chat room is the central location for updates, conversations, shared knowledge and more. It's also highly effective in many open source projects.


Starting is super easy, you can either set an internal server in your company with one of many open IRC/Jabber servers or try an external service (like Campfire, or HipChat).


The success of the chat room depends on the signal/noise ratio in the main room. Try to have every new conversation start in the main room, and only if it becomes to noisy move it to another room. 1/1 conversations are hiding information from the team and should be used only for personal communication.

Don't forget add services to the chat room. It's easy to add build notification, source commits, reminders and other smart bots. We use (Jenkins Jabber plugin) to get build notification. However be careful not to generate too much noise and tweak all these bots.

Logging and searching is another critical feature. Some services (like Campfire) provide it out of the box. If your chat server does not provide logging, it's pretty easy to add it with one of the many logging bots out there, and hooking search on top of the logs is pretty easy as well. (That's what we did with selenium.saucelabs.com which is a combination of Supybot and Omega).

Another issue is presence, people need to make it clear when they are "in" or "out" of the room. In some cases when the server logs it's as easy as login/logout. In other cases just notify the team your are away (we use the IRC /me lunching a lot). Stating that it's acceptable to be "out" of the room allows people to close the chat when they need to be in the zone.

So go ahead, set a chat for you team. Soon you'll wonder how you managed before.

Thursday, December 15, 2011

Decimals Class Decorator

A friend at work had a problem where he wanted to make sure some attributes are always Decimal.

We considered using the excellent traits library, but it's an overkill and we didn't want to introduce another dependency. For me this looked like a job for properties but we wanted some way to generate the properties automatically. Lucky for us we now have class decorators (in the old days I'd probably use a metaclass).

Friday, December 02, 2011

branches

At work, our workflow involves JIRA and feature branches. We name the feature branches in the name of the issue they are solving.

However when doing hg branches it's hard to know what branch you want.
Below is a small script that annotates the output of hg branches with the issue description from JIRA (and also added the branch parent).

Friday, November 11, 2011

VirtualBox and USB

A little something I found out, writing it here in hope someone else will find it useful.

I'm on a Ubunbu system (11.10), and needed to connect to a USB device from an XP VM (wanted to sync and upgrade iPod touch, there is no iTunes for Linux). I couldn't connect the XP VM to any USB port (the USB menu was empty). The solution is to add yourself to the vboxusers group.

Open Users and Group, click on Manage Groups and double click on vboxusers and make sure your name is checked. After that logout and login, you should see the USB devices in the USB menu now.

Tuesday, November 01, 2011

Summarize tuples

Raymond asked "Group the list of tuples on a given field and sum or count selected other fields.", here's my solution using sqlite3. (original answer here).

Friday, October 28, 2011

Using dbus To Control Pidgin

I using Pidgin as my IM client, however after work I'd like to disconnect from our jabber server. Being the command like geek I am, I wrote the following. (Earlier version was written in bash using purple-remote .) You can learn more about Pidgin dbus interface here.

Saturday, October 15, 2011

Finding Module Version

A friend at work asked me how can he find which version of Python module is currently installed. The quick answer is to use pip:
     
        pip freeze | grep <module>


Friday, September 30, 2011

Old Berlios Projects

Seems like Berlios is closing down. I've placed a backup of my old projects here. Let me know if you like to take ownership on one/several of them.

Using Generators in nose Tests

Sometimes you have many tests that do exactly the same thing but with different data. nose provides an easy way to do that with tests that return generators.

Saturday, September 24, 2011

Feynman On The Importance Of Playing

I'm reading "Surely You're Joking, Mr. Feynman" (very good read).
What he says about burnout, playing and doing the things you love is priceless:

But when it came time to do some research. I couldn't get to work. I was a little tired; I was not interested; I couldn't do research! ...

And then I thought to myself, "You know, what they think of you is so fantastic, it's impossible to live up to it. You have no responsibility to live up to it!"... 

Then I had another thought; Physics disgusts me a little bit now, but I used to enjoy doing physics. Why did I enjoy it? I used to play with it. I used to do whatever I felt like doing - it didn't have to do with whether it was important for the development of nuclear physics...

So I get this new attitude ... I'm going to play with physics, whenever I want to, without worrying about any importance whatsoever.
Within a week I was in the cafeteria and some guy, fooling around, throws a plate in the air. ...

I had nothing to do, so I start to figure out the motion of the rotating plate...
And before I knew it (it was a very short time) I was "playing" - working, really - with the same old problem that I loved so much, that I had stopped working on when I went to Los Alamos; my thesis-type problems; all those old-fashioned wonderful things.
It was effortless. It was easy to play with these things. It was like uncorking a bottle: Everything flowed out effortlessly. ...

There was no importance to what I was doing, but ultimately there was. The diagrams and the whole business that I got the Nobel Prize for came from that piddling around with the wobbling plate.

Go out and play, I'm sure you'll make wonderful things.

EDIT: Thanks for HN for proof reading this. Also found a more complete excerpt here.

Thursday, September 08, 2011

Monday, August 29, 2011

"Top X" UI

Doing some experiments with data at work, one of the common tasks was "show top X items". However X was changing all the time. Finally I came out with a web based UI where you can set the top yourself. Hopefully someone else will find it useful.

Key points:
  • Using jQuery, jQuery UI and flot
  • Site is static, no backend needed
You can see a demo on word frequency of "Alice In Wonderland" here.


As usual, the code is in bitbucket.

Thursday, August 18, 2011

Crucible Command Line Client

At work, we use Crucible (which I mostly like). However we do reviews post commit (using feature branches). And creating a review from a patch is too much clicks to my taste. Hence crucible command line client. You can install it with easy_install crucible and then run it.

crucible has many options, but you can save many defaults in ~/.cruciblerc (see example at top of the crucible script).

Wednesday, July 13, 2011

logio

The logging module is nice, one think I find myself missing here and there is the ability to treat logs as file objects. This help them "play nice" with many modules.

Here is a quick hack to do that, at the bottom there's an example on how to create CSV log files (and then we can use TimedRotatingFileHandler to rotate the logs).

Thursday, July 07, 2011

PDB - The Movie

I did it again, this time showing how to work with PDB (the Python debugger).



Could it be the proximity to Hollywood that causes me to do these movies?

Saturday, June 25, 2011

ingress

One of the many cool features in Twisted is "manhole". It lets you connect to any running server and get a shell (over ssh or telnet) in the server environment. This is very helpful when debugging.

It's very easy to implement this using SocketServer which is in the standard library, this way even if you're not running under Twised, you can still have this feature. I've created ingress just for this (easy_install ingress), using it is super easy:
import ingress
ingress.install()

The code itself is not that complicated. (Could have been simpler if there was a way to override writing to stdout in code.InteractiveConsole).

Thursday, June 09, 2011

3 IDLE Tips - The Movie

I've managed to make a screencast, giving tips on how to use IDLE.
(Yeah, I know - it sucks, but that's how I learn ... ;)

Sunday, May 29, 2011

avrocat

After playing a bit with simpleavro, I've decided it'll be better done in the Unix tradition of "doing one thing well" and using pipes. Hence avrocat was born, a "cat" like utility for avro files.
usage: avrocat [-h] [-n COUNT] [-s SKIP] [-f {json,csv}] [--header]
               [--filter FILTER] [--schema]
               filename

`cat` for Avro files

positional arguments:
  filename              avro file (- for stdin)

optional arguments:
  -h, --help            show this help message and exit
  -n COUNT, --count COUNT
                        number of records to print
  -s SKIP, --skip SKIP  number of records to skip
  -f {json,csv}, --format {json,csv}
                        record format
  --header              print CSV header
  --filter FILTER       filter records (e.g. r['age']>1)
  --schema              print schema

EDIT: Seems like avrocat is finding itself into the "avro" package.

Blog Archive