If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, March 21, 2007


Python 2.5 has a defaultdict dictionary in the collections
defaultdict takes a factory function in the constructor. This function
will create the default value each time you try to get a missing item.

Then you can write a word histogram function like this:
from collections import defaultdict
def histogram(text):
histogram = defaultdict(int) # int() -> 0
for word in text.split():
histogram[word] += 1
return histogram
Or, if you want to store the location of the words as well
def histogram(text):
histogram = defaultdict(list) # list() -> []
for location, word in enumerate(text.split()):
return histogram


Anonymous said...

nice usage man :)

Miki Tebeka said...

Raymond Hettinger pointed out the int is slow. It's better to use itertools.repeat for the default factory:

from itertools import repeat
zero = repeat(0).next
words = defaultdict(zero)

Blog Archive