Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching Keys in Two Files

Status
Not open for further replies.

biobrain

MIS
Jun 21, 2007
90
0
0
GB
I have two different files i.e

with following data in first file

1VYW, 2.3, A, B, C, D,
1W98, 2.15, A, B, PHOSPHOTHREONINE
1H4L, 2.65, A, B, D, E,
1GII, 2.00, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1GII, 2.00, A,
1OIR, 1.91, A,

and following data in 2nd file

pdb|1VYW|A Chain A, Structure Of Cdk2CYCLIN A WITH PNU-292137 >g... 605 e-174
pdb|1W98|A Chain A, Human Cyclin-Dependent Kinase 2 >gi|4139570|... 605 e-174
pdb|1H4L|A Chain A, Cyclin A - Cyclin-Dependent Kinase 2 Complex... 605 e-174
pdb|1OIT|A Chain A, Imidazopyridines: A Potent And Selective Cla... 603 e-173
pdb|1PF8|A Chain A, Crystal Structure Of Human Cyclin-Dependent ... 603 e-173
pdb|1OIR|A Chain A, Imidazopyridines: A Potent And Selective Cla... 602 e-173
pdb|1OGU|A Chain A, Structure Of Human Thr160-Phospho Cdk2CYCLIN... 602 e-173
pdb|1H01|A Chain A, Cdk2 In Complex With A Disubstituted 2, 4-Bi... 602 e-173
pdb|1JST|A Chain A, Phosphorylated Cyclin-Dependent Kinase-2 Bou... 602 e-173


Here 1VYW, 1W98, 1H4L are common PDB identifier in both files

1. Now i want to read 2nd file and where ever identifier from the 1st file match the identifier in the 2nd It should extract these last two [\S]+$ i.e 602 e-173 and write this in a third file

I.e I want that

After running the script

The 3rd File will become like this

1VYW, 2.3, A, B, C, D, 605, e-174
1W98, 2.15, A, B, PHOSPHOTHREONINE. 605,e-174

Here 605, e-174 is obtained from the 2nd file from the corresponding identifier
 
I know you keep asking a lot of questions.. but normally your code is pretty close to what you want.. so you should go ahead and post it.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
I have started on this

I have break my work into small steps

now in

pdb|1VYW|A Chain A, Structure Of Cdk2CYCLIN A WITH PNU-292137 >g... 605 e-174

I want to first extract 1VYW and store it into a variable

I have written a code but that is not working

Code:
if ($_=~/^pdb\/|(\....)\/|[\S\s]+/){
  

  print "$1";

}
 
I have also tried this

Code:
if ($_=~/^pdb\/|(\S)\/|[\S\s]+$/){
  

  print "$1";

and


Code:
if ($_=~/^pdb\/|(\S+)\/|[\S\s]+$/){
  

  print "$1";

but both of these are also not working
 
Try this to get "1VYW" into its own variable:

my $variable_you_want = (split('\|', $_))[1];
print $variable_you_want . "\n";
 
Code:
if ($_=/^pdb\/|/) {
my $variable= (split('\|', $_))[1];
print $variable. "\n";

I have tried this. and is not working
 
@test = ("pdb|1VYW|A Chain A, Structure Of Cdk2CYCLIN A WITH PNU-292137");
for (@test){
if (/^pdb\|(\S+)\|/) {
print $1;
}
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
Dear I have to extract this from a big file

example

pdb|1VYW|A Chain A, Structure Of Cdk2CYCLIN A WITH PNU-292137 >g... 605 e-174
pdb|1W98|A Chain A, Human Cyclin-Dependent Kinase 2 >gi|4139570|... 605 e-174
pdb|1H4L|A Chain A, Cyclin A - Cyclin-Dependent Kinase 2 Complex... 605 e-174
pdb|1OIT|A Chain A, Imidazopyridines: A Potent And Selective Cla... 603 e-173
pdb|1PF8|A Chain A, Crystal Structure Of Human Cyclin-Dependent ... 603 e-173
pdb|1OIR|A Chain A, Imidazopyridines: A Potent And Selective Cla... 602 e-173
pdb|1OGU|A Chain A, Structure Of Human Thr160-Phospho Cdk2CYCLIN... 602 e-173
pdb|1H01|A Chain A, Cdk2 In Complex With A Disubstituted 2, 4-Bi... 602 e-173
pdb|1JST|A Chain A, Phosphorylated Cyclin-Dependent Kinase-2 Bou... 602 e-173
 
travs69's will work. This, I think (because I have not tested it) is incorrect:

if ($_=/^pdb\/|/)

I believe you will need to escape the '|'

Try if ($_=/^pdb\|/) and notice the "\|" That's what travs69 does. Sorry if I lead you down the wrong path.

Thanks travs!

 
I have made my job a bit easier

now i have extracted only relevant information from the two files to make it simple for me no the two file i have contain

file 1.
1VYW, 2.3, A, B, C, D,
1W98, 2.15, A, B, PHOSPHOTHREONINE
1FIN, 2.65, A, B, D, E,
1GII, 2.00, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1GII, 2.00, A,
1OIR, 1.91, A,
1H1P, 2.1, A, B, C, D, PHOSPHOTHREONINE
1V0B, 2.2, A, B,
1OB3, 1.9, A, B,
1OGU, 2.6, A, B, C, D, PHOSPHOTHREONINE
2IW6, 2.3, A, B, C, D, PHOSPHOTHREONINE
1FIN, 2.3, A, B, C, D,
1FIN, 2.3, A, B, C, D,
1H01, 1.79, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1V0B, 2.2, A, B,
1OB3, 1.9, A, B,
1mOW, 3.10, A, B,
1QMZ, 2.2, A, B, C, D, E, F, PHOSPHOTHREONINE
1JOW, 3.10, A, B,

and file 2

1VYW 605 e-174
1W98 605 e-174
1FIN 605 e-174
1OIT 603 e-173
1PF8 603 e-173
1OIR 602 e-173
1OGU 602 e-173
1H01 602 e-173
1JST 602 e-173
1QMZ 602 e-173
1H1P 602 e-173
1W98 601 e-173
1GZ8 601 e-173
2IW6 600 e-172
1E9H 599 e-172
1GII 598 e-172
2IW8 595 e-171
1V0B 362 e-101

Now I want to write a 3rd file with this as a output

1VYW, 2.3, A, B, C, D, 605 e-174
1W98, 2.15, A, B, PHOSPHOTHREONINE, 605 e-174
1FIN, 2.65, A, B, D, E,605 e-174

etc


by matching IVYW or 1W98 keys and so on in both files and writing the complete information corresponding to IVYW type keys in the 3rd file
 
No problem cfmartin2000.. normally I would just split the data (like what you were doing) but biobrain has been using regexp for everything else so I figured he would just want to stick with that.

Here is one way.. I'm sure there's a better one.
#assuming you read file1 into @file1, and file2 into @file2

for $line (@file1) {
@tmp = split /\,/, $line;
@tmp2 = grep/$tmp[0]/, @file2;
@tmp3 = split /\s+/, $tmp2[0],2;

print "$line $tmp3[1]\n";
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
i have tried this but it is not working

Code:
$output= "output.txt";
open (FILE, "$output");
@file1= <FILE>;
print @file1;

$out= "out.txt";
open (FIL, "$out");
@file2= <FIL>;
print @file2;

for $line (@file1) {
 @tmp = split /\,/, $line;
 @tmp2 = grep/$tmp[0]/, @file2;
 @tmp3 = split /\s+/, $tmp2[0],2;
 
 print "$line $tmp3[1]\n";
}
 
output.txt file

1VYW, 2.3, A, B, C, D,
1W98, 2.15, A, B, PHOSPHOTHREONINE
1FIN, 2.65, A, B, D, E,
1GII, 2.00, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1GII, 2.00, A,

out.txt file

1VYW 605 e-174
1W98 605 e-174
1FIN 605 e-174
1OIT 603 e-173
1PF8 603 e-173
1OIR 602 e-173
1OGU 602 e-173

Here is pseudo code for what i want. I have tried to write actual code but getting problem

1. Split File output.txt and store into an array @temp and store first four characters key into a variable let us say $temp[0]

2. Split File out.txt and store into an array @temp2 and store first four characters key into a variable let us say $temp2[0]

3. Now for every $temp[0]==temp2[0]

print $temp[all values i.e 0, 1, 2, 3....] $temp2 [all values i.e 1, 2, 3, ....other than 0]

so that result will be like this for IVYW ==IVYW

1VYW, 2.3, A, B, C, D, 605 e-174
 
You need to chomp your arrays..
Code:
$output= "output.txt";
open (FILE, "$output");
@file1= <FILE>;
chomp @file1;
#print @file1;

$out= "out.txt";
open (FIL, "$out");
@file2= <FIL>;
chomp @file2;
#print @file2;

for $line (@file1) {
 @tmp = split /\,/, $line;
 @tmp2 = grep/$tmp[0]/, @file2;
 @tmp3 = split /\s+/, $tmp2[0],2;
 
 print "$line $tmp3[1]\n";
}


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
btw it outputs


1VYW, 2.3, A, B, C, D, 605 e-174
1W98, 2.15, A, B, PHOSPHOTHREONINE 605 e-174
1FIN, 2.65, A, B, D, E, 605 e-174
1GII, 2.00, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1GII, 2.00, A,



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
what do you do with duplicate "keys"?

1FIN, 2.65, A, B, D, E,
1GII, 2.00, A,
1JST, 2.6, A, B, C, D, PHOSPHORYLATED
1GII, 2.00, A,
1OIR, 1.91, A,
1H1P, 2.1, A, B, C, D, PHOSPHOTHREONINE
1V0B, 2.2, A, B,
1OB3, 1.9, A, B,
1OGU, 2.6, A, B, C, D, PHOSPHOTHREONINE
2IW6, 2.3, A, B, C, D, PHOSPHOTHREONINE
1FIN, 2.3, A, B, C, D,
1FIN, 2.3, A, B, C, D,




------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
This may be a type by me, In actual files i think there will not be a duplicate keys. However let me give a shape to my our all codes and compile them in a package. In case if there is a real problem with duplicate key I will discuss here. I had started PERL just a few days before and this forum is of great HELP for me. I really say thanks to all of you.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top