If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Friday, January 18, 2008

Simple Text Summarizer

Comments:
  • About 50 lines of code
  • Gives reasonable results (try it out)
  • tokenize need to be improved much more (better detection, stop words ...)
  • split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
  • In real life you'll need to "clean" the text (Ads, credits, ...)

4 comments:

Chris said...

You could improve the sentence splitting by using NLTK's Punkt tokenizer.

Anonymous said...

Hey another person reads your blog! Oh my its getting crowded :)

Anonymous said...

I read it too :)

Anonymous said...

Here you go !

Blog Archive