- About 50 lines of code
- Gives reasonable results (try it out)
- tokenize need to be improved much more (better detection, stop words ...)
- split_to_sentences need to be improved much more (handle 3.2, Mr. Smith ...)
- In real life you'll need to "clean" the text (Ads, credits, ...)
If it won't be simple, it simply won't be. [source code]
Friday, January 18, 2008
Simple Text Summarizer
Comments:
Subscribe to:
Post Comments (Atom)
