Wednesday, March 21, 2007

defaultdict

Python 2.5 has a defaultdict dictionary in the collections
module.
defaultdict takes a factory function in the constructor. This function
will create the default value each time you try to get a missing item.

Then you can write a word histogram function like this:
from collections import defaultdict
def histogram(text):
histogram = defaultdict(int) # int() -> 0
for word in text.split():
histogram[word] += 1
return histogram
Or, if you want to store the location of the words as well
def histogram(text):
histogram = defaultdict(list) # list() -> []
for location, word in enumerate(text.split()):
histogram[word].append(location)
return histogram

2 comments:

  1. Anonymous27/3/07 16:09

    nice usage man :)

    ReplyDelete
  2. Raymond Hettinger pointed out the int is slow. It's better to use itertools.repeat for the default factory:

    from itertools import repeat
    zero = repeat(0).next
    words = defaultdict(zero)

    ReplyDelete