Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tired of Excel filter 2

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
Hi Guys,

I hope I will get solution from awk, I am tired of opening and filtering the sequence per mellion line. I want to discard a sequence if more than two . at the begening( meaning character 1-10)

Data:

GCGGAA.GATCATTA
GCGA.GGCA.GCCG.CC.
GCTCCGGGA.GGCTCGGG
CTCC...A.GGCTGGGA
GCT.....A.GGT....A
GCAGGA.GGTGGCCA
GCAGGA.GGTGGCCA
CGTGGA.GGTGTGAG
GGAGGGTCA.GTAGTGAG
GCT.CGCGA.GTCCCAGA
GCGGCG.A.GTGGTGAG
CTCGTA.GA.T..TAGC.
GCTCCGTGA.T.CTGGCA
GAGGGAGTA.TTTTTTTT
GGCG.GCTAA.ACGTACG
GAGCGCTTAA.TC.AA.G
CGGTTGGGAAAAAAAAAA
CGGTTGGGAAAAAAAAAA

For Example, I need only the last two line from the above data. Think like this I have 64 mellion line of the same problem. I did majorty of filter by awk regex but can't able to handle this.

awk '$1!~/AAAAAAAAAAAAAAAA/||/TTTTTTTTTTTTTT/||/\.\.\.\.\.\.\.\.\.\.\.\.\./{print $1}' filein.txt

On excel I am doing the following
1. left(A1, 10)
2. sort
3. remove manually the top low complex sequence
Then I will repeate the same for
1 = right(A1, 10)
2 sort
3 remove manualy
 
Your statement of "if more than two . at" does not reflect the result you are aasking for "I need only the last two line ".

To get only the last two (three, you missed one)lines try:
Code:
awk 'index(substr($0,1,10),".") == 0 {print $0;}' InFile.txt
[3eyes]

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
 
It does the job but I don't understand ==0 thing. Would you please explain to me. I know this part 'index(substr($0,1,10),".") . Why you say ==0?

Dereje
 
How can I print a data removed? The script discard if "." is fond in substr[1-10]. Is there any I can print the filtered part before I pipe to file to make sure what the script done.

For exampel

I am doing the following first

1 awk '$1~/^TTTTTTTTTTTTT/{print $1}'
before I do

awk '$1!~/^TTTTTTTTTTTTT/{print $1}' infile>outfile

Dereje
 
awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"' infile>outfile


Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Would you please explain to me. I know this part 'index(substr($0,1,10),".") . Why you say ==0?
The index function returns the position location in the string of the first "." character. If it returns 0, the "." was not found in the string.
 
Is there any I can print the filtered part
OOps, sorry for the typo:
awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"}' infile>outfile

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
-------------------------------------------
Is there any I can print the filtered part
OOps, sorry for the typo:
awk '$1!~/^TTTTTTTTTTTTT/{print $1;next}{print $1>"filtered.txt"}' infile>outfile
---------------------------------------------

Infact I intend to ask printing the content of returned by
"awk 'index(substr($0,1,10),".") == 0" stetment. I want to print lines with 'awk index(substr($0, 1, 10), ".")==1 before I do the:
awk 'index(substr($0,1,10),".") == 0 {print $0;}' InFile.txt

Thanks
 
awk 'index(substr($0,1,10),".")==0{print;next}{print > "filtered.txt"}' InFile.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top