Guest_imported
New member
- Jan 1, 1970
- 0
I've converted some pdf-files into txt-files. These files, however, contain a lot of redundancy. Take an ordinary text (like an article in a newspaper, etc.) for which you have to write a script that says the following:
IF a sentence X contains more than 15 words, THEN print sentence X, ELSE not.
But how to define a sentence? Is it right to say that a sentence is a string of words (and some other characters) that ends at '.'?
With the script you propose, I hope to exclude things like: titles, short enumerations, etc.
Thanks,
Lizebé
IF a sentence X contains more than 15 words, THEN print sentence X, ELSE not.
But how to define a sentence? Is it right to say that a sentence is a string of words (and some other characters) that ends at '.'?
With the script you propose, I hope to exclude things like: titles, short enumerations, etc.
Thanks,
Lizebé