Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to define a sentence?

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
I've converted some pdf-files into txt-files. These files, however, contain a lot of redundancy. Take an ordinary text (like an article in a newspaper, etc.) for which you have to write a script that says the following:
IF a sentence X contains more than 15 words, THEN print sentence X, ELSE not.

But how to define a sentence? Is it right to say that a sentence is a string of words (and some other characters) that ends at '.'?

With the script you propose, I hope to exclude things like: titles, short enumerations, etc.

Thanks,

Lizebé
 
awk '
function define_a_sentence(candidate, pass,fail) {
pass = 1
fail = 0
if (substr(candidate,1,7) !~ /bo/)
return fail
}
if (a = sub(/[0-9]$/,"",candidate) != "0")
return fail
}
other tests..
}
{
if (length($0) > 15 && $0 ~ /[regexp]/) {
line = $0
defined = define_a_sentence(line)
if (defined == 0) {
print line, "Is not a my_type_sentence."
gsub(line,"",$0)
} else {
print line, "Looks okay to me."
}
}' file


You must realize that your lack of concrete examples makes this silly...You devise the test: search the file by line
with an initial dummy match and then make further decisions
via functions or other matches in the main() program body...
How about some criteria at least.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top