Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

word frequencies 1

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Hi all,

I need a (gawk) script that counts the number of words in different textfiles, sorts them from high (left) to low (right), and copies this to a new file. Suppose the first file - named sport.txt - looks like this:

bird sport basket tennis basket guard victory basket guard victory

This becomes:

basket guard victory bird sport tennis

and should be automatically copied to a file, named sport.key (so extension "txt" should be replaced with "key").

Suppose the second file - named war.txt - looks like this:

guns food defence shield defence victory guns victory rebels

This becomes:

guns defence victory food shield rebels

and should be copied to a new file, called "war.key".

I hope you can help me with this. Many thanks in advance!

sunny

ps: before I forget: if two or more words have the same frequency of occurence, the order in which these words occur, is of no importance.


 
Something like this can get you started. You just have to arrange the saving part. It's not all awk as you see. All awk would make the file writing/saving very easy but requires writing sorting and formatting routines which will take time. This looks like a college project, but we'll help anyway.

awk '{
for( i=1; i<=NF; i++){
A[$i]++
}

for (item in A){
print A[item] , item
}
}' sport.txt |
sort -k 1nr,2 |
sed -e 's/^[0-9]*[ ]*//;1h;1!{x;G;$!x;}' -e 's/\n/ /g;$!d;' Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Problem solved indeed! And no, this is not a school project :) Many thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top