Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Urgent Help Needed. Merging lines in .txt file

Status
Not open for further replies.

awknerd

Programmer
Jul 4, 2008
2
US
I need to write a script that reads through an input .txt file and replaces the end value with the end value of the next line for lines that have distance <=4000. The first label line is not actually in the input. In the below example, 3217 is the distance from the end of the first line to the start of the second line. 14021 is the distance from the previous line (not included) to the start of the first line. So once the script finds a distance <=4000, it needs to replace the end of the previous line with the end of the current line.

Any help would be greatly appreciated! Thanks!

INPUT:

chrm start end block length distance
chr7 27398704 27399096 ENm010Block536 392 14021
chr7 27402314 27402466 ENm010Block537 152 3217
chr7 27412536 27412726 ENm010Block538 190 10069
chr7 27416032 27416424 ENm010Block539 392 3305
chr7 27420022 27420972 ENm010Block540 950 3597

Desired OUTPUT:

chr7 27398704 27402466
chr7 27412536 27420972
 
What have you tried so far and where in your code are you stuck ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Well, here's some code I've used for the following input, using distance <=1000 to merge.

INPUT: Use distance <=1000 to merge

chr7 27104483 27104633 ENm010Block71 150 0
chr7 27104634 27104812 ENm010Block72 178 0
chr7 27104813 27105154 ENm010Block73 341 0
chr7 27106872 27106977 ENm010Block74 105 1717
chr7 27106978 27107481 ENm010Block75 503 0
chr7 27107482 27108156 ENm010Block76 674 0
chr7 27108157 27108194 ENm010Block77 37 0
chr7 27108422 27108700 ENm010Block78 278 227
chr7 27109258 27109365 ENm010Block79 107 557
chr7 27109366 27109431 ENm010Block80 65 0
chr7 27109432 27110017 ENm010Block81 585 0
chr7 27110018 27110056 ENm010Block82 38 0
chr7 27110057 27110309 ENm010Block83 252 0
chr7 27110310 27110435 ENm010Block84 125 0
chr7 27110436 27110489 ENm010Block85 53 0
chr7 27110490 27110550 ENm010Block86 60 0
chr7 27110551 27110789 ENm010Block87 238 0
chr7 27111956 27112348 ENm010Block88 392 1166
chr7 27112374 27112830 ENm010Block89 456 25
chr7 27114388 27114881 ENm010Block90 493 1557
chr7 27114882 27115338 ENm010Block91 456 0
chr7 27115339 27115870 ENm010Block92 531 0
chr7 27116098 27116173 ENm010Block93 75 227
chr7 27116174 27116705 ENm010Block94 531 0
chr7 27116706 27116755 ENm010Block95 49 0
chr7 27116756 27116781 ENm010Block96 25 0
chr7 27116782 27116945 ENm010Block97 163 0
chr7 27116946 27117276 ENm010Block98 330 0
chr7 27117277 27117960 ENm010Block99 683 0
chr7 27118910 27119137 ENm010Block100 227 949
chr7 27119138 27119213 ENm010Block101 75 0
chr7 27119214 27119365 ENm010Block102 151 0
chr7 27119366 27119783 ENm010Block103 417 0
chr7 27119784 27119822 ENm010Block104 38 0
chr7 27119823 27119948 ENm010Block105 125 0
chr7 27119949 27119985 ENm010Block106 36 0
chr7 27119986 27120353 ENm010Block107 367 0
chr7 27120354 27120430 ENm010Block108 76 0
chr7 27120431 27120734 ENm010Block109 303 0
chr7 27120735 27120784 ENm010Block110 49 0
chr7 27120785 27121113 ENm010Block111 328 0
chr7 27121114 27121886 ENm010Block112 772 0
chr7 27121887 27121912 ENm010Block113 25 0
chr7 27121950 27122139 ENm010Block114 189 37
chr7 27122140 27122368 ENm010Block115 228 0
chr7 27122369 27122596 ENm010Block116 227 0
chr7 27123470 27123811 ENm010Block117 341 873
chr7 27123812 27124306 ENm010Block118 494 0
chr7 27124307 27125180 ENm010Block119 873 0
chr7 27126966 27127320 ENm010Block120 354 1785
chr7 27127612 27127725 ENm010Block121 113 291
chr7 27127726 27128410 ENm010Block122 684 0
chr7 27128411 27129055 ENm010Block123 644 0
chr7 27129056 27129182 ENm010Block124 126 0
chr7 27129183 27129550 ENm010Block125 367 0
chr7 27130006 27130043 ENm010Block126 37 455
chr7 27130044 27130880 ENm010Block127 836 0
chr7 27130881 27131260 ENm010Block128 379 0
chr7 27135440 27135630 ENm010Block129 190 4179
chr7 27136554 27136807 ENm010Block130 253 923
chr7 27136808 27136820 ENm010Block131 12 0
chr7 27136821 27136845 ENm010Block132 24 0
chr7 27136846 27136895 ENm010Block133 49 0
chr7 27136896 27137035 ENm010Block134 139 0
chr7 27137036 27137071 ENm010Block135 35 0
chr7 27137072 27137237 ENm010Block136 165 0
chr7 27137238 27137580 ENm010Block137 342 0
chr7 27137581 27137618 ENm010Block138 37 0
chr7 27137619 27137796 ENm010Block139 177 0


DESIRED OUPUT:

chr7 27104483 27105154
chr7 27106872 27110789
chr7 27111956 27112830
chr7 27114388 27125180
chr7 27126966 27131260
chr7 27135440 27137796

The code so far:

awk 'END { print _, __ }
1 == NR || $NF >= 1000 {
if (c) print _, __
_ = $1 FS $2
c = 1
}
{ __ = $3 }' file








It's not working, and I've tried lots of changes, but I'm really lost here and in need of help. If anyone can help, that would be great. THanks!!
 
When I run your code against that data it produces exactly what you say is the desired output... so I'm confused! What erroneous output are you getting from your script?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top