# PythonWise

If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

## Friday, January 18, 2008

### Simple Text Summarizer

• About 50 lines of code
• Gives reasonable results (try it out)
• tokenize need to be improved much more (better detection, stop words ...)
• split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
• In real life you'll need to "clean" the text (Ads, credits, ...)

## Tuesday, January 15, 2008

### attrgetter is fast

#!/usr/bin/env python

from operator import attrgetter
from random import shuffle

class Point:
def __init__(self, x, y):
self.x, self.y = x, y

def sort1(points):
points.sort(key = lambda p: p.x)

def sort2(points):
points.sort(key = attrgetter("x"))

if __name__ == "__main__":
from timeit import Timer

points1 = [Point(x, 2 * x) for x in range(100)]
points2 = points1[:]

num_times = 10000

t1 = Timer("sort1(points1)", "from __main__ import sort1, points1")
print t1.timeit(num_times)

t2 = Timer("sort2(points2)", "from __main__ import sort2, points2")
print t2.timeit(num_times)

\$ ./attr.py
0.492087125778
0.29891705513
\$

## Friday, January 04, 2008

### Faster and Shorter "dot" using itertools

Let's calculate the dot product of two vectors:

from itertools import starmap, izip
from operator import mul

def dot1(v1, v2):
result = 0
for i, value in enumerate(v1):
result += value * v2[i]
return result

def dot2(v1, v2):
return sum(starmap(mul, izip(v1, v2)))

if __name__ == "__main__":
from timeit import Timer

num_times = 1000
v1 = range(100)
v2 = range(100)

t1 = Timer("dot1(%s, %s)" % (v1, v2), "from __main__ import dot1")
print t1.timeit(num_times) # 0.038722038269

t2 = Timer("dot2(%s, %s)" % (v1, v2), "from __main__ import dot2")
print t2.timeit(num_times) # 0.0260770320892
dot2 is faster and shorter, however dot1 is more readable - my vote goes to dot2.