Hi folks,
I've got one of those "merge files" question. What I have is literally thousands of files, each containing a text string (lets call it "XXXXXXXXXXXXXXX"; "YYYYYYY" etc). The string may be on one or more lines in the file. Each file has a different file name, and all files are...
Hi PHV,
I've just tried your suggestion but the script doesn't run. Couldn't work out what the problem was. But basically, I have two files. fileA contains the lines:
>TA0001
>TA0002
>TA0006
>TA0008
>TA0012
fileB looks like this:
>TA0001 some other information
GATAGGATTAGATCGATGATGATAGAGA...
I'm using gawk and I already have the scripts to compare the two files to pull out the common field. The example that I have posted in the document file shows that file2.txt has additional rows of information below the common field. It is this additional information that I am trying to capture...
iribach
I assume you asked me what language I speak ... only English unfortunately.
I can compare the two files and pull out the common item, but I don't know how to pull out the extra information from file2. I'm on UNIX.
Thanks
iribach
Thanks for the very quick response and your confidence in my abilities ... but I'm afraid I'm not that far advanced to take the suggestion and put it into a workable script :-(
Hi,
I'm trying to do the usual compare two files and pull out the common field, but in addition to that, I want some additional information from file two to be extracted as well. Tried a few different approaches, but think its time to ask the experts.
I've placed an example of the two files at...
Hi PHV,
Thanks for the suggestion. It worked well mostly, but I just noticed one important thing (for me anyway). In cases where the Query= string is not followed by the string starting with >gnl on the next line, the Query= header is not printed. ie if I have this situation:
Query=...
Hi all,
I have the following input file:
START
Query= TaBa0-000001
>gnl|UG|Ta#S13248438 ug=Ta.24021
>gnl|UG|Ta#S17880947 ug=Ta.8115
>gnl|UG|Ta#S17984398 ug=Ta.29549
Query= TaBa0-000002
>gnl|UG|Ta#S17988614 ug=Ta.30772
>gnl|UG|Ta#S16202639 ug=Ta.25603
>gnl|UG|Ta#S13134499 ug=Ta.16563...
Hi,
I know this is really simple, but its been a while since I've written anything in awk. Basically, I have a file that has the characters "gi=" followed by a number. These strings are in no particular column in the file hence I can't just tell the script to print a particular column. If...
Hi all,
I'm still struggling with Gawk and wouldn't mind some help on a 'difficult' (to me anyway) problem. I have a huge (thousands of pages) file that looks something like this:
Query=
>gi12345
Query:
Sbjct:
Query:
Sbjct:
>gi67859
etc
etc
Query=
>gi98765
What I'd like to do is compare...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.