Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

detecting substrings within strings

Status
Not open for further replies.

SotonStu

Programmer
May 7, 2003
33
GB
This is an awk script i have written, attempting to isolate all links in an html page and print out the frequency of each link:

#!/bin/gawk -f
# Print list of word frequencies

BEGIN{FS="\""}
NR == 1 { printf("%s\n%s", (NR==1) ? "" : ")", FILENAME)}
/http/{
printf "\n"
for (i = 1; i <= NF; i++)
freq[$i]++
}

END {
for (word in freq)
printf &quot;%s\t%d\n&quot;, word, freq[word]
}

This gets the following output on a simple html file containing two links to google:

2
<A HREF = 1
from the word <A HREF = 1
>me!</A> 2

How can i get it to JUST add the links to the associative array and not all of the tags surroundng it.

Thanks again!
 
Sorry, my last thread has just been replied to which kinda makes my line of thinking here obsolete. looks liek i was barking up the wrong tree
 
Write an awk command file to do the following :

For each line of input, check that the number of brackets&quot;(&quot;and&quot;)&quot; in the line is balanced. print something out for each line as it's processed. At the end of input, print out the number of lines that had brackets in them, and the number where the number of brackets is unbalanced.
You just need to check the total numbers of &quot;(&quot;and&quot;)&quot; are correct, not that the ordering is feasible for balanced brackets. E.g. a+(5*2))( can be considered balanced
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top