If it won't be simple, it simply won't be. [source code] by Miki Tebeka, CEO, 353Solutions

Friday, January 18, 2008

Simple Text Summarizer

Comments:
  • About 50 lines of code
  • Gives reasonable results (try it out)
  • tokenize need to be improved much more (better detection, stop words ...)
  • split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
  • In real life you'll need to "clean" the text (Ads, credits, ...)
Post a Comment

Blog Archive