Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Accurate AWK array searching 2

Status
Not open for further replies.

starrysky1

Programmer
Apr 26, 2018
4
PR
Can anybody offer some help getting this AWK to search correctly?

I need to search inside the "sample.txt" file for all the 6 array elements in the "combinations" file. However, I need the search to happen from every single character instead of like an ordinary text editor search box type search, which searches by blocks after each occurrence. I need to search in the most squeezed in way so as to display exactly every times it happens. For example I need the type of search that finds inside the string "AAAAA" the combination "AAA" happening 3 times, not 1 time. See my previous post about this:
The sample.txt file is:
Bash:
AAAAAHHHAAHH
The combinations file is:
Bash:
AA  
HH  
AAA  
HHH  
AAH  
HHA
How do I get the script
Bash:
#!/bin/bash
awk 'NR==FNR {data=$0; next} {printf "%s %d \n",$1,gsub($1,$1,data)}' 'sample.txt' combinations > searchoutput
to output the desired output:
Bash:
AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1
instead of what it is currently outputing:
Bash:
AA 3 
HH 2 
AAA 1 
HHH 1 
AAH 2 
HHA 1
?

As we can see, the script is only finding the combinations just like a text editor. I need it to search for the combinations from the start of every character instead so that the desired output happens.

How do I have the AWK output the desired output instead? Can't thank you enough.
 
Try something like this:
Code:
[COLOR=#0000ff]# Run:[/color]
[COLOR=#0000ff]#   awk -f starrysky1.awk sample.txt combinations.txt[/color]

{ 
  [COLOR=#0000ff]# remove spaces[/color]
  [COLOR=#008080]gsub[/color]([COLOR=#ff00ff]/[[/color][COLOR=#6a5acd] [/color][COLOR=#ff00ff]][/color][COLOR=#6a5acd]+[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] [COLOR=#ff00ff]""[/color][COLOR=#6a5acd],[/color] [COLOR=#6a5acd]$0[/color])
} 

[COLOR=#6a5acd]NR[/color]==[COLOR=#6a5acd]FNR[/color] {
  data=[COLOR=#6a5acd]$0[/color][COLOR=#6a5acd];[/color]
  data_len = [COLOR=#008080]length[/color](data)
  [COLOR=#804040][b]next[/b][/color]
} 

{
  pattern = [COLOR=#6a5acd]$0[/color]
  pattern_len = [COLOR=#008080]length[/color](pattern)
  pattern_count = [COLOR=#ff00ff]0[/color]
  [COLOR=#804040][b]for[/b][/color] (j=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color] j+pattern_len[COLOR=#ff00ff]-1[/color] <= data_len[COLOR=#6a5acd];[/color] j++) {
    [COLOR=#804040][b]if[/b][/color] ([COLOR=#008080]substr[/color](data[COLOR=#6a5acd],[/color] j[COLOR=#6a5acd],[/color] pattern_len) ~ pattern) {
      pattern_count++
    }      
  }
  [COLOR=#804040][b]printf[/b][/color]([COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%s[/color][COLOR=#6a5acd]\t[/color][COLOR=#6a5acd]%d[/color][COLOR=#ff00ff] [/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] pattern[COLOR=#6a5acd],[/color] pattern_count)
}

Output:
Code:
$ awk -f starrysky1.awk sample.txt combinations.txt
AA      5
HH      3
AAA     3
HHH     1
AAH     2
HHA     1
 
Hi

Here on Tek-Tips we used to thank for the received help by giving stars. Please click the

[fuchsia]★[/fuchsia] [navy]Great post![/navy]

link at the bottom of mikrom's post ( then confirm in the pop-up window ). That way you both show your gratitude and indicate this thread as helpful.


Feherke.
feherke.github.io
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top