Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare datat in four consecutive line using awk

Status
Not open for further replies.

AnirudhDive

Technical User
May 13, 2016
1
US
Hi

I have a trajectory file which looks like below:
3LIG C1 1 -0.531 3.372 3.189
3LIG C1 2 -0.598 3.333 3.325
3LIG O2 3 -0.521 3.246 3.124
3LIG O2 6 -0.596 3.194 3.331
4LIG C1 12 -0.471 -1.170 3.326
4LIG C1 13 -0.483 -1.195 3.179
4LIG O2 14 -0.533 -1.043 3.347
4LIG O2 17 -0.589 -1.105 3.143
14LIG C1 23 3.300 -1.089 3.161
14LIG C1 24 3.279 -0.942 3.180
14LIG O2 25 3.258 -1.145 3.277
14LIG O2 28 3.236 -0.925 3.312
15LIG C1 34 1.808 3.160 3.227
15LIG C1 35 1.722 3.285 3.230
15LIG O2 36 1.933 3.216 3.240
15LIG O2 39 1.792 3.386 3.178
16LIG C1 45 -3.325 -0.288 3.188
16LIG C1 46 -3.197 -0.365 3.199
16LIG O2 47 -3.276 -0.156 3.176
16LIG O2 50 -3.114 -0.297 3.114
19LIG C1 56 -3.643 -0.138 3.289
19LIG C1 57 -3.616 0.009 3.313
19LIG O2 58 -3.575 -0.193 3.402
19LIG O2 61 -3.492 0.018 3.378
22LIG O2 67 -4.063 -2.776 3.958
26LIG C1 72 -1.888 -3.464 3.919
29LIG C1 75 1.965 4.140 5.273
29LIG O2 76 2.085 4.063 5.253
29LIG O2 78 2.054 4.159 5.054
31LIG C1 81 -3.715 -0.470 3.157
31LIG C1 82 -3.731 -0.522 3.297
31LIG O2 83 -3.794 -0.567 3.094
31LIG O2 86 -3.867 -0.562 3.303
33LIG C1 92 -2.117 4.064 3.277
33LIG C1 93 -1.987 4.078 3.354
33LIG O2 94 -2.068 4.043 3.145
33LIG O2 97 -1.890 4.091 3.254
35LIG C1 103 -1.360 -1.957 3.171
35LIG C1 104 -1.351 -1.970 3.325
35LIG O2 105 -1.226 -1.939 3.132
35LIG O2 108 -1.216 -2.019 3.338
36LIG C1 114 -3.480 -4.514 3.349
36LIG C1 115 -3.332 -4.523 3.349
36LIG O2 116 -3.507 -4.397 3.273
36LIG O2 118 -3.288 -4.446 3.241
42LIG C1 120 0.413 -2.912 3.190
42LIG C1 121 0.438 -2.781 3.124
42LIG O2 122 0.529 -2.923 3.272
42LIG O2 125 0.578 -2.785 3.098
47LIG C1 131 -2.571 -0.985 3.402
47LIG C1 132 -2.448 -0.902 3.413
47LIG O2 133 -2.620 -0.955 3.271
47LIG O2 136 -2.409 -0.890 3.281

Now there is a repetition of set of 4 lineswhich have their first column values to be exactly same. There are a certain line which do not follow this pattern.

How do I get rid of these lines not following the pattern using awk or sed.

Kindly help me out with this.

Thanks in advance. [bigsmile]

- Aniruddha M Dive
 
for example like this

anirudhdive.awk
Code:
[COLOR=#0000ff]# Run:[/color]
[COLOR=#0000ff]# awk -f anirudhdive.awk anirudhdive.txt > anirudhdive_out.txt[/color]
[COLOR=#6a5acd]BEGIN[/color] {
  [COLOR=#6a5acd]FS[/color] = [COLOR=#ff00ff]" "[/color]
  pattern = [COLOR=#ff00ff]""[/color]
  nr_lines = [COLOR=#ff00ff]0[/color]
} 

{ 
  [COLOR=#804040][b]if[/b][/color] (pattern == [COLOR=#ff00ff]""[/color]) {
    pattern = [COLOR=#6a5acd]$1[/color]
  }

  [COLOR=#804040][b]if[/b][/color] ([COLOR=#6a5acd]$1[/color] == pattern) {
    [COLOR=#0000ff]# add to next array element[/color]
    nr_lines = nr_lines + [COLOR=#ff00ff]1[/color]
    lines[[COLOR=#6a5acd]nr_lines[/color]] = [COLOR=#6a5acd]$0[/color]
  } 
  [COLOR=#804040][b]else[/b][/color] {
    [COLOR=#0000ff]# set as a new pattern[/color]
    pattern = [COLOR=#6a5acd]$1[/color]

    [COLOR=#0000ff]# store to first array element[/color]
    nr_lines = [COLOR=#ff00ff]1[/color]
    lines[[COLOR=#6a5acd]nr_lines[/color]] = [COLOR=#6a5acd]$0[/color]
  }

  [COLOR=#804040][b]if[/b][/color] (nr_lines == [COLOR=#ff00ff]4[/color]) {
     [COLOR=#0000ff]# print all 4 lines stored in the array[/color]
     [COLOR=#804040][b]for[/b][/color] (i=[COLOR=#ff00ff]1[/color][COLOR=#6a5acd];[/color]  i <= [COLOR=#ff00ff]4[/color][COLOR=#6a5acd];[/color] i++) {
       [COLOR=#804040][b]print[/b][/color] lines[[COLOR=#6a5acd]i[/color]]
     }
     [COLOR=#0000ff]# initialize variables for next pattern[/color]
     pattern = [COLOR=#ff00ff]""[/color]
     nr_lines = [COLOR=#ff00ff]0[/color]
  } 
} 

[COLOR=#6a5acd]END[/color] {
  [COLOR=#804040][b]print[/b][/color] [COLOR=#ff00ff]".. Done."[/color]
}

If you have the data posted above in a file anirudhdive.txt and you run the script

Code:
awk -f anirudhdive.awk anirudhdive.txt > anirudhdive_out.txt

then you will get a file anirudhdive_out.txt which doesn't contain these lines, which are not grouped by 4 with same pattern i.e.:
Code:
22LIG O2 67 -4.063 -2.776 3.958
26LIG C1 72 -1.888 -3.464 3.919
29LIG C1 75 1.965 4.140 5.273
29LIG O2 76 2.085 4.063 5.253
29LIG O2 78 2.054 4.159 5.054
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top