I have a 'master' file, in this example 20 bytes long. The data looks like this:
11111TestOne0005.10
22222TestTwo0006.50
The key in this file starts in col 1 to 12, so 11111TestOne and 22222TestTwo are the keys. (In reality, this file will be larger approx. 7,000 recs at 450 bytes long)
I have a second file, which I will need to search to find the key. (This file in reality will be close to 1 million recs at 600 bytes long)
The data looks like this:
9999999999999999999911111TestOneXXXX
999999999999999999999999999999999999
The key in this file starts in col 21 to 32.
The result of this process should produce a NEW file, the data in the second file that matched the master file. In this example, the new file should look like this:
9999999999999999999911111TestOneXXXX
In the example here is my code:
key=`cut -c1-12 masterfile`
for i in $key
do
awk /"$i"/ second_file >> new_second_file
done
When I test on a 'small' sample of data, it works fine. BUT, if I test it against reality (1 million recs), the process runs long, 90 minutes. I cancel the job, so I actual don't know if this is working. Probalbly not.
Can someone suggest another approach? Thanks
11111TestOne0005.10
22222TestTwo0006.50
The key in this file starts in col 1 to 12, so 11111TestOne and 22222TestTwo are the keys. (In reality, this file will be larger approx. 7,000 recs at 450 bytes long)
I have a second file, which I will need to search to find the key. (This file in reality will be close to 1 million recs at 600 bytes long)
The data looks like this:
9999999999999999999911111TestOneXXXX
999999999999999999999999999999999999
The key in this file starts in col 21 to 32.
The result of this process should produce a NEW file, the data in the second file that matched the master file. In this example, the new file should look like this:
9999999999999999999911111TestOneXXXX
In the example here is my code:
key=`cut -c1-12 masterfile`
for i in $key
do
awk /"$i"/ second_file >> new_second_file
done
When I test on a 'small' sample of data, it works fine. BUT, if I test it against reality (1 million recs), the process runs long, 90 minutes. I cancel the job, so I actual don't know if this is working. Probalbly not.
Can someone suggest another approach? Thanks