PythonWise

If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Friday, January 18, 2008

Simple Text Summarizer

• About 50 lines of code
• Gives reasonable results (try it out)
• tokenize need to be improved much more (better detection, stop words ...)
• split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
• In real life you'll need to "clean" the text (Ads, credits, ...)

Tuesday, January 15, 2008

attrgetter is fast

```#!/usr/bin/env python

from operator import attrgetter
from random import shuffle

class Point:
def __init__(self, x, y):
self.x, self.y = x, y

def sort1(points):
points.sort(key = lambda p: p.x)

def sort2(points):
points.sort(key = attrgetter("x"))

if __name__ == "__main__":
from timeit import Timer

points1 = [Point(x, 2 * x) for x in range(100)]
points2 = points1[:]

num_times = 10000

t1 = Timer("sort1(points1)", "from __main__ import sort1, points1")
print t1.timeit(num_times)

t2 = Timer("sort2(points2)", "from __main__ import sort2, points2")
print t2.timeit(num_times)
```

```\$ ./attr.py
0.492087125778
0.29891705513
\$
```

Friday, January 04, 2008

Faster and Shorter "dot" using itertools

Let's calculate the dot product of two vectors:
`from itertools import starmap, izipfrom operator import muldef dot1(v1, v2):  result = 0  for i, value in enumerate(v1):      result += value * v2[i]  return resultdef dot2(v1, v2):  return sum(starmap(mul, izip(v1, v2)))if __name__ == "__main__":  from timeit import Timer  num_times = 1000  v1 = range(100)  v2 = range(100)  t1 = Timer("dot1(%s, %s)" % (v1, v2), "from __main__ import dot1")  print t1.timeit(num_times) # 0.038722038269  t2 = Timer("dot2(%s, %s)" % (v1, v2), "from __main__ import dot2")  print t2.timeit(num_times) # 0.0260770320892`
dot2 is faster and shorter, however dot1 is more readable - my vote goes to dot2.