Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

FIlter double lines

Status
Not open for further replies.

laurentiuz

Programmer
Oct 26, 2002
14
0
0
FR
Hello everybody !
Can you help please :I want to create a new file , starting from a one in which I must find and exclude some error-doubles lines even in there are others justified doubles lines. Ex :
000000
111111
111111#
222222
222222
333333
444444
444444#

The 111111+111111# , 444444+444444# doubles OK
The 222222 is erouneous and must be excluded !
Thank you in advance !
PS: sorry I think I messup another threat by mistake ! sorry !
 
Hard way:
Code:
 awk -v fn="destination filename" 'BEGIN  {i=1} 
    {
           array[i++] = $0
    }

    END { 
    for (m=1 ; m <= i ; m++) {
        for (p=1 ; p <= i ; p++) {
            if (array[m] == array[p] && m != p) {
               delete array[p] 
          }
      } 
    }

 for (m=1 ; m <= i ; m++) {
     if (array[m]) {print array[m] >> fn}
 }
}
Hint:
man sort
 
> The 222222 is erouneous and must be excluded !
How do you tell that this is bad, and that 111111 is good?
Or are those '#' really part of the file?
 
would it not be possible to simply assign a variable to the current line and then compare it with the next line?

something in the style of:
[tt]
if $0 == lastLine {next}
else {print}
lastLine = $0
[/tt]

ps: I know that this code will not work, it's not a sample code, but it's to give you an idea ...
 
Salem,
How else are you going to differentiate?
The question doesn't make any sense if the lines
don't contain the end hash.

Baraka,
You need to compare every element against any other
not the last -vs- the next. That doesn't do anything.
 
Sorry to be not clear enough from the beginning.If there is a line with &quot;#&quot; then the double line is accepted. If not it must be filtered. Thank you !
 
With the example file given, then the 'uniq' command (with no options) would give the desired result.

From man uniq ...

uniq prints a copy of the original input file with
the second and succeeding copies of any repeated
lines removed. (Note that repeated lines must be
adjacent in order to be found).
 
Hi Ygor ! I agree , so &quot;sort -u&quot; but I need to keep double lines in which for one key/element is marked with &quot;#&quot; There are accepted double lines.
 
I misunderstoodf originally:
Here's code that works.
Code:
awk -v fn=&quot;destination filename&quot; ' BEGIN  {i=1}
    {
           array[i++] = $0
    }

    END {
    for (m=1 ; m <= i ; m++) {
        for (p=1 ; p <= i ; p++) {
            if (array[m] == array[p] && m != p && array[p] !~ /#$/) {
               delete array[p]
          }
      }
    }

 for (m=1 ; m <= i ; m++) {
     if (array[m]) {print array[m] >> fn}
 }
}' filename

Test output with your sample:
000000
111111
111111#
222222
333333
444444
444444#
 
Thanks marsd ! It works ! You really helped me a LOT ! Thanks !
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top