Guest_imported
New member
- Jan 1, 1970
- 0
To be more concrete: suppose I have a txt-file that contains the following cases:
(1) John!
(2) I will give you a few examples.
(3) This new dictionary I have, recognizes the importance
of spoken English, showing the words and phrases used
to communicate naturally in spoken English.
(4) Is it true that you go to South Africa for at least 2
months?
(5) Law §1 states that it is forbidden; law §2 claims the
opposite; law $3 is about something completely
different!
(6) I have bought the following things: 2 eggs, 5 apples
and 3 bananas.
(7) - the country's fragile economy is threatened by the
continued drought;
- many African people are starving
Examples 1 to 7 are all types of sentences. My question is: Could you help me with a (gawk) script that will only print examples 3 - 6, following the criteria that a sentence should contain at least 10 words and that it must not be preceded by a '-' (since I want to avoid enumerations of this kind)?
HINT: a sentence (as is shown in examples 1 - 7) consists of a number of words (=necessary!), numerals (=optional) and all types of characters (=optional) except for: '.', '!' and '?'. These three elements mark the end of each sentence.
So, perhaps I need to define the variable 'sentence' in the BEGIN-block by making use of a regexp? Or maybe I should treat the sentence as an array?
I hope someone could help me with this. I don't really know how I should solve this problem. Anyway, many thanks!
Lizebé
(1) John!
(2) I will give you a few examples.
(3) This new dictionary I have, recognizes the importance
of spoken English, showing the words and phrases used
to communicate naturally in spoken English.
(4) Is it true that you go to South Africa for at least 2
months?
(5) Law §1 states that it is forbidden; law §2 claims the
opposite; law $3 is about something completely
different!
(6) I have bought the following things: 2 eggs, 5 apples
and 3 bananas.
(7) - the country's fragile economy is threatened by the
continued drought;
- many African people are starving
Examples 1 to 7 are all types of sentences. My question is: Could you help me with a (gawk) script that will only print examples 3 - 6, following the criteria that a sentence should contain at least 10 words and that it must not be preceded by a '-' (since I want to avoid enumerations of this kind)?
HINT: a sentence (as is shown in examples 1 - 7) consists of a number of words (=necessary!), numerals (=optional) and all types of characters (=optional) except for: '.', '!' and '?'. These three elements mark the end of each sentence.
So, perhaps I need to define the variable 'sentence' in the BEGIN-block by making use of a regexp? Or maybe I should treat the sentence as an array?
I hope someone could help me with this. I don't really know how I should solve this problem. Anyway, many thanks!
Lizebé