Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

concordance program

Status
Not open for further replies.

missippi

IS-IT--Management
Feb 8, 2001
42
US
Hi I am teaching myself cgi and found this tutorial. I am stuck on this part:

Exercise
A useful tool in natural language processing is concordance. This allows a specific string to be displayed in its immediate context whereever it appears in a text. For example, a concordance program identifying the target string the might produce some of the following output. Notice how the occurrences of the target string line up vertically.
discovered (this is the truth) that when he
t kinds of metal to the leg of a frog, an e
rrent developed and the frog's leg kicked,
longer attached to the frog, which was dea
normous advances in the field of amphibian
ch it hop back into the pond -- almost. Bu
ond -- almost. But the greatest Electrical
ectrical Pioneer of them all was Thomas Edi

This exercise is to write such a program. Here are some tips:

Read the entire file into array (this obviously isn't useful in general because the file may be extremely large, but we won't worry about that here). Each item in the array will be a line of the file.
When the chop function is used on an array it chops off the last character of every item in the array.
Recall that you can join the whole array together with a statement like $text = "@lines";
Use the target string as delimiter for splitting the text. (Ie, use the target string in place of the colon in our previous examples.) You should then have an array of all the strings between the target strings.
For each array element in turn, print it out, print the target string, and then print the next array element.
Recall that the last element of an array @food has index $#food.
As it stands this would be a pretty good program, but the target strings won't line up vertically. To tidy up the strings you'll need the substr function. Here are three examples of its use.
substr("Once upon a time", 3, 4); # returns "e up"
substr("Once upon a time", 7); # returns "on a time"
substr("Once upon a time", -6, 5); # returns "a tim"

The first example returns a substring of length 4 starting at position 3. Remember that the first character of a string has index 0. The second example shows that missing out the length gives the substring right to the end of the string The third example shows that you can also index from the end using a negative index. It returns the substring that starts at the 6th character from the end and has length 5.
If you use a negative index that extends beyond the beginning of the string then Perl will return nothing or give a warning. To avoid this happening you can pad out the string by using the x operator mentioned earlier. The expression (" "x30) produces 30 spaces, for example.


This is what I have written so far:


$file='story.txt'; #need file called story.txt to work
open(INFO, $file);
@lines = <INFO>;
chop @lines;
$text = &quot;@lines&quot;;


@line2 = split(/the/, $text);

foreach $structure (@line2)
{
print &quot;$structure&quot;;
print &quot;The&quot;;
print &quot;$structure \n&quot;;
}

 
Another approach would be to use some pattern matching
rather than the substr's.

open(IPF,&quot;<some_file&quot;)
or die &quot;Failed to open input file, $!\n&quot;;
while (<IPF>) { $buffer .= $_ ; }
close IPF;

# this will match 2 words before, then the term of interest
# and the 2 words following the term.
while ($buffer =~ /\b\w+\b\w+\b$term\b\w+\b\w+\b/gis)
{
print &quot;$&\n&quot;;
}

HTH If you are new to Tek-Tips, please use descriptive titles, check the FAQs,
and beware the evil typo.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top