- About 50 lines of code
- Gives reasonable results (try it out)
- tokenize need to be improved much more (better detection, stop words ...)
- split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
- In real life you'll need to "clean" the text (Ads, credits, ...)
If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions
Friday, January 18, 2008
Simple Text Summarizer
Comments:
Subscribe to:
Post Comments (Atom)
4 comments:
You could improve the sentence splitting by using NLTK's Punkt tokenizer.
Hey another person reads your blog! Oh my its getting crowded :)
I read it too :)
Here you go !
Post a Comment