Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Printing a sentence.

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
To be more concrete: suppose I have a txt-file that contains the following cases:

(1) John!
(2) I will give you a few examples.
(3) This new dictionary I have, recognizes the importance
of spoken English, showing the words and phrases used
to communicate naturally in spoken English.
(4) Is it true that you go to South Africa for at least 2
months?
(5) Law §1 states that it is forbidden; law §2 claims the
opposite; law $3 is about something completely
different!
(6) I have bought the following things: 2 eggs, 5 apples
and 3 bananas.
(7) - the country's fragile economy is threatened by the
continued drought;
- many African people are starving

Examples 1 to 7 are all types of sentences. My question is: Could you help me with a (gawk) script that will only print examples 3 - 6, following the criteria that a sentence should contain at least 10 words and that it must not be preceded by a '-' (since I want to avoid enumerations of this kind)?

HINT: a sentence (as is shown in examples 1 - 7) consists of a number of words (=necessary!), numerals (=optional) and all types of characters (=optional) except for: '.', '!' and '?'. These three elements mark the end of each sentence.

So, perhaps I need to define the variable 'sentence' in the BEGIN-block by making use of a regexp? Or maybe I should treat the sentence as an array?

I hope someone could help me with this. I don't really know how I should solve this problem. Anyway, many thanks!

Lizebé



 
Hi Lizebé,

I am not sure what you want here, but this works if the
lines that are to be sentences all have a space at the end of the line.


#!/bin/sh

nawk '{

if (( NF < 10 ) && ( $0 ~ /\.$/||/\!$/||/\?$/ )) next

while ( $0 !~ /\.$/||/\!$/||/\?$/ ) {
sentence = sentence$0
getline

if ( $0 ~ /\.$/||/\!$/||/\?$/ ) {
$0 = sentence$0
print
sentence = &quot;&quot;
getline
}

if ( $0 ~ /^-/ ) next
}
}' input > output


Hope this helps



flogrr
flogr@yahoo.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top