PythonWise: January 2008

Friday, January 18, 2008

Simple Text Summarizer

Comments:

About 50 lines of code
Gives reasonable results (try it out)
tokenize need to be improved much more (better detection, stop words ...)
split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
In real life you'll need to "clean" the text (Ads, credits, ...)

Tuesday, January 15, 2008

attrgetter is fast

#!/usr/bin/env python

from operator import attrgetter
from random import shuffle

class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y

def sort1(points):
    points.sort(key = lambda p: p.x)

def sort2(points):
    points.sort(key = attrgetter("x"))

if __name__ == "__main__":
    from timeit import Timer

    points1 = [Point(x, 2 * x) for x in range(100)]
    points2 = points1[:]

    num_times = 10000

    t1 = Timer("sort1(points1)", "from __main__ import sort1, points1")
    print t1.timeit(num_times)

    t2 = Timer("sort2(points2)", "from __main__ import sort2, points2")
    print t2.timeit(num_times)

$ ./attr.py
0.492087125778
0.29891705513
$

Friday, January 04, 2008

Faster and Shorter "dot" using itertools

Let's calculate the dot product of two vectors:


from itertools import starmap, izip
from operator import mul

def dot1(v1, v2):
  result = 0
  for i, value in enumerate(v1):
      result += value * v2[i]
  return result

def dot2(v1, v2):
  return sum(starmap(mul, izip(v1, v2)))

if __name__ == "__main__":
  from timeit import Timer

  num_times = 1000
  v1 = range(100)
  v2 = range(100)

  t1 = Timer("dot1(%s, %s)" % (v1, v2), "from __main__ import dot1")
  print t1.timeit(num_times) # 0.038722038269

  t2 = Timer("dot2(%s, %s)" % (v1, v2), "from __main__ import dot2")
  print t2.timeit(num_times) # 0.0260770320892

dot2 is faster and shorter, however dot1 is more readable - my vote goes to dot2.

PythonWise

Friday, January 18, 2008

Simple Text Summarizer

Tuesday, January 15, 2008

attrgetter is fast

Friday, January 04, 2008

Faster and Shorter "dot" using itertools

Blog Archive

About Me