FIlter double lines

laurentiuz · Jun 1, 2003

Hello everybody !
Can you help please :I want to create a new file , starting from a one in which I must find and exclude some error-doubles lines even in there are others justified doubles lines. Ex :
000000
111111
111111#
222222
222222
333333
444444
444444#

The 111111+111111# , 444444+444444# doubles OK
The 222222 is erouneous and must be excluded !
Thank you in advance !
PS: sorry I think I messup another threat by mistake ! sorry !

marsd · Jun 1, 2003

Hard way:

Code:

 awk -v fn=&quot;destination filename&quot; 'BEGIN  {i=1} 
    {
           array[i++] = $0
    }

    END { 
    for (m=1 ; m <= i ; m++) {
        for (p=1 ; p <= i ; p++) {
            if (array[m] == array[p] && m != p) {
               delete array[p] 
          }
      } 
    }

 for (m=1 ; m <= i ; m++) {
     if (array[m]) {print array[m] >> fn}
 }
}

Hint:
man sort

Salem · Jun 1, 2003

> The 222222 is erouneous and must be excluded !
How do you tell that this is bad, and that 111111 is good?
Or are those '#' really part of the file?

Baraka69 · Jun 2, 2003

would it not be possible to simply assign a variable to the current line and then compare it with the next line?

something in the style of:
[tt]
if $0 == lastLine {next}
else {print}
lastLine = $0
[/tt]

ps: I know that this code will not work, it's not a sample code, but it's to give you an idea ...

marsd · Jun 2, 2003

Salem,
How else are you going to differentiate?
The question doesn't make any sense if the lines
don't contain the end hash.

Baraka,
You need to compare every element against any other
not the last -vs- the next. That doesn't do anything.

laurentiuz · Jun 2, 2003

Sorry to be not clear enough from the beginning.If there is a line with "#" then the double line is accepted. If not it must be filtered. Thank you !

Ygor · Jun 3, 2003

With the example file given, then the 'uniq' command (with no options) would give the desired result.

From man uniq ...

uniq prints a copy of the original input file with
the second and succeeding copies of any repeated
lines removed. (Note that repeated lines must be
adjacent in order to be found).

laurentiuz · Jun 3, 2003

Hi Ygor ! I agree , so "sort -u" but I need to keep double lines in which for one key/element is marked with "#" There are accepted double lines.

marsd · Jun 3, 2003

I misunderstoodf originally:
Here's code that works.

Code:

awk -v fn=&quot;destination filename&quot; ' BEGIN  {i=1}
    {
           array[i++] = $0
    }

    END {
    for (m=1 ; m <= i ; m++) {
        for (p=1 ; p <= i ; p++) {
            if (array[m] == array[p] && m != p && array[p] !~ /#$/) {
               delete array[p]
          }
      }
    }

 for (m=1 ; m <= i ; m++) {
     if (array[m]) {print array[m] >> fn}
 }
}' filename

Test output with your sample:
000000
111111
111111#
222222
333333
444444
444444#

laurentiuz · Jun 6, 2003

Thanks marsd ! It works ! You really helped me a LOT ! Thanks !

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

FIlter double lines

laurentiuz

Programmer

marsd

IS-IT--Management

Salem

Programmer

Baraka69

Programmer

marsd

IS-IT--Management

laurentiuz

Programmer

Ygor

Programmer

laurentiuz

Programmer

marsd

IS-IT--Management

laurentiuz

Programmer

Similar threads

Part and Inventory Search

Sponsor