Friday, January 18, 2008

Simple Text Summarizer

Comments:
  • About 50 lines of code
  • Gives reasonable results (try it out)
  • tokenize need to be improved much more (better detection, stop words ...)
  • split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
  • In real life you'll need to "clean" the text (Ads, credits, ...)

4 comments:

  1. You could improve the sentence splitting by using NLTK's Punkt tokenizer.

    ReplyDelete
  2. Anonymous18/3/10 17:57

    Hey another person reads your blog! Oh my its getting crowded :)

    ReplyDelete
  3. I read it too :)

    ReplyDelete
  4. Here you go !

    ReplyDelete