Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Challenging Awk Array problem

Status
Not open for further replies.

Scifi9547

Technical User
May 6, 2010
10
US
Hi,

I rather have a very complicated awk problem here, at least to me. I have two files.

File 1:

607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1
607 917 214 114 45 chr1 3000237 3000733 -194194699 - L1_Mur2 LINE L1 -4488 1389 913 1
607 215 31 0 30 chr1 3000733 3000766 -194194666 + (TTTG)n Simple_repeat Simple_repeat 2 33 0 2
607 845 233 76 114 chr1 3000766 3000792 -194194640 - L1_Mur2 LINE L1 -6816 912 887 1
607 621 250 65 37 chr1 3001287 3001583 -194193849 - Lx9 LINE L1 -1596 6048 5742 3
607 1320 197 332 7 chr1 3001722 3002005 -194193427 - RLTR25A LTR ERVK 0 1028 625 4


File 2:
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC
28|3721 + gi|149352351|ref|NC_000069.5|NC_000069 chr3 154935392 GAGTTTTACAGTCCA
28|3721 + gi|149288852|ref|NC_000067.5|NC_000067 chr1 152633707 GAGTTTTACAGTCCA
28|3721 + gi|149361432|ref|NC_000073.5|NC_000073 chr7 86595415 GAGTTTTACAGTCCA
34|3145 - gi|149321426|ref|NC_000084.5|NC_000084 chr18 43464724 ACGGCTTACGA
34|3145 - gi|149354224|ref|NC_000071.5|NC_000071 chr5 37676290 ACGGCTTACGA

If field 6 of file 1 is same as field 4 of file 2, then see if field 5 of file 2 lies within the range specified by the fields 7 and 8 of file 1. If yes, extract the line from file 2 and add the fields 11, 12 and 13 of file 1 in to a separate file. Whew!

Ok for example - field 4 of file 2 i.e. chr1 is same as field 6 of file 1. Then see if field 5 of file 2 i.e.3000072 (which is always a number) lies in the range of fields 7 and 8 (3000001 3000156) of file 1. So, I need the output (the line from file 2 plus fields 11,12 and 13 of file 1) in a separate file as

4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC L1_Mur2 LINE L1

Thank you very much in advance
 
What have you tried so far, and where are you stuck?

Search this forum for solutions containing FNR==NR (or NR==FNR) for some tips on how to process two input files like that by loading the first one into an array, and processing the second one with reference to that array.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top