Tired of Excel filter 2

demis001 · Sep 4, 2008

Hi Guys,

I hope I will get solution from awk, I am tired of opening and filtering the sequence per mellion line. I want to discard a sequence if more than two . at the begening( meaning character 1-10)

Data:

GCGGAA.GATCATTA
GCGA.GGCA.GCCG.CC.
GCTCCGGGA.GGCTCGGG
CTCC...A.GGCTGGGA
GCT.....A.GGT....A
GCAGGA.GGTGGCCA
GCAGGA.GGTGGCCA
CGTGGA.GGTGTGAG
GGAGGGTCA.GTAGTGAG
GCT.CGCGA.GTCCCAGA
GCGGCG.A.GTGGTGAG
CTCGTA.GA.T..TAGC.
GCTCCGTGA.T.CTGGCA
GAGGGAGTA.TTTTTTTT
GGCG.GCTAA.ACGTACG
GAGCGCTTAA.TC.AA.G
CGGTTGGGAAAAAAAAAA
CGGTTGGGAAAAAAAAAA

For Example, I need only the last two line from the above data. Think like this I have 64 mellion line of the same problem. I did majorty of filter by awk regex but can't able to handle this.

awk '$1!~/AAAAAAAAAAAAAAAA/||/TTTTTTTTTTTTTT/||/\.\.\.\.\.\.\.\.\.\.\.\.\./{print $1}' filein.txt

On excel I am doing the following
1. left(A1, 10)
2. sort
3. remove manually the top low complex sequence
Then I will repeate the same for
1 = right(A1, 10)
2 sort
3 remove manualy

LKBrwnDBA · Sep 4, 2008

Your statement of "if more than two . at" does not reflect the result you are aasking for "I need only the last two line ".

To get only the last two (three, you missed one)lines try:

Code:

awk 'index(substr($0,1,10),".") == 0 {print $0;}' InFile.txt

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

demis001 · Sep 4, 2008

It does the job but I don't understand ==0 thing. Would you please explain to me. I know this part 'index(substr($0,1,10),".") . Why you say ==0?

Dereje

demis001 · Sep 4, 2008

How can I print a data removed? The script discard if "." is fond in substr[1-10]. Is there any I can print the filtered part before I pipe to file to make sure what the script done.

For exampel

I am doing the following first

1 awk '$1~/^TTTTTTTTTTTTT/{print $1}'
before I do

awk '$1!~/^TTTTTTTTTTTTT/{print $1}' infile>outfile

Dereje

PHV · Sep 5, 2008

awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"' infile>outfile

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

MadMichael · Sep 5, 2008

Would you please explain to me. I know this part 'index(substr($0,1,10),".") . Why you say ==0?

The index function returns the position location in the string of the first "." character. If it returns 0, the "." was not found in the string.

PHV · Sep 5, 2008

Is there any I can print the filtered part
OOps, sorry for the typo:
awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"}' infile>outfile

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

demis001 · Sep 5, 2008

-------------------------------------------
Is there any I can print the filtered part
OOps, sorry for the typo:
awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"}' infile>outfile
---------------------------------------------

Infact I intend to ask printing the content of returned by
"awk 'index(substr($0,1,10),".") == 0" stetment. I want to print lines with 'awk index(substr($0, 1, 10), ".")==1 before I do the:
awk 'index(substr($0,1,10),".") == 0 {print $0;}' InFile.txt

Thanks

PHV · Sep 5, 2008

awk 'index(substr($0,1,10),".")==0{print;next}{print > "filtered.txt"}' InFile.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Tired of Excel filter 2

demis001

Programmer

LKBrwnDBA

MIS

demis001

Programmer

demis001

Programmer

PHV

MIS

MadMichael

Programmer

PHV

MIS

demis001

Programmer

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor