Hi,
I rather have a very complicated awk problem here, at least to me. I have two files.
File 1:
607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1
607 917 214 114 45 chr1 3000237 3000733 -194194699 - L1_Mur2 LINE L1 -4488 1389 913 1
607 215 31 0 30 chr1 3000733 3000766 -194194666 + (TTTG)n Simple_repeat Simple_repeat 2 33 0 2
607 845 233 76 114 chr1 3000766 3000792 -194194640 - L1_Mur2 LINE L1 -6816 912 887 1
607 621 250 65 37 chr1 3001287 3001583 -194193849 - Lx9 LINE L1 -1596 6048 5742 3
607 1320 197 332 7 chr1 3001722 3002005 -194193427 - RLTR25A LTR ERVK 0 1028 625 4
File 2:
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC
28|3721 + gi|149352351|ref|NC_000069.5|NC_000069 chr3 154935392 GAGTTTTACAGTCCA
28|3721 + gi|149288852|ref|NC_000067.5|NC_000067 chr1 152633707 GAGTTTTACAGTCCA
28|3721 + gi|149361432|ref|NC_000073.5|NC_000073 chr7 86595415 GAGTTTTACAGTCCA
34|3145 - gi|149321426|ref|NC_000084.5|NC_000084 chr18 43464724 ACGGCTTACGA
34|3145 - gi|149354224|ref|NC_000071.5|NC_000071 chr5 37676290 ACGGCTTACGA
If field 6 of file 1 is same as field 4 of file 2, then see if field 5 of file 2 lies within the range specified by the fields 7 and 8 of file 1. If yes, extract the line from file 2 and add the fields 11, 12 and 13 of file 1 in to a separate file. Whew!
Ok for example - field 4 of file 2 i.e. chr1 is same as field 6 of file 1. Then see if field 5 of file 2 i.e.3000072 (which is always a number) lies in the range of fields 7 and 8 (3000001 3000156) of file 1. So, I need the output (the line from file 2 plus fields 11,12 and 13 of file 1) in a separate file as
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC L1_Mur2 LINE L1
Thank you very much in advance
I rather have a very complicated awk problem here, at least to me. I have two files.
File 1:
607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1
607 917 214 114 45 chr1 3000237 3000733 -194194699 - L1_Mur2 LINE L1 -4488 1389 913 1
607 215 31 0 30 chr1 3000733 3000766 -194194666 + (TTTG)n Simple_repeat Simple_repeat 2 33 0 2
607 845 233 76 114 chr1 3000766 3000792 -194194640 - L1_Mur2 LINE L1 -6816 912 887 1
607 621 250 65 37 chr1 3001287 3001583 -194193849 - Lx9 LINE L1 -1596 6048 5742 3
607 1320 197 332 7 chr1 3001722 3002005 -194193427 - RLTR25A LTR ERVK 0 1028 625 4
File 2:
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC
28|3721 + gi|149352351|ref|NC_000069.5|NC_000069 chr3 154935392 GAGTTTTACAGTCCA
28|3721 + gi|149288852|ref|NC_000067.5|NC_000067 chr1 152633707 GAGTTTTACAGTCCA
28|3721 + gi|149361432|ref|NC_000073.5|NC_000073 chr7 86595415 GAGTTTTACAGTCCA
34|3145 - gi|149321426|ref|NC_000084.5|NC_000084 chr18 43464724 ACGGCTTACGA
34|3145 - gi|149354224|ref|NC_000071.5|NC_000071 chr5 37676290 ACGGCTTACGA
If field 6 of file 1 is same as field 4 of file 2, then see if field 5 of file 2 lies within the range specified by the fields 7 and 8 of file 1. If yes, extract the line from file 2 and add the fields 11, 12 and 13 of file 1 in to a separate file. Whew!
Ok for example - field 4 of file 2 i.e. chr1 is same as field 6 of file 1. Then see if field 5 of file 2 i.e.3000072 (which is always a number) lies in the range of fields 7 and 8 (3000001 3000156) of file 1. So, I need the output (the line from file 2 plus fields 11,12 and 13 of file 1) in a separate file as
4|17999 - gi|149361523|ref|NC_000074.5|NC_000074 chr1 3000072 TTTATCGTCATCGTC L1_Mur2 LINE L1
Thank you very much in advance