Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

substituting fields in one file with same fields from another 1

Status
Not open for further replies.

starlite79

Technical User
Aug 15, 2008
89
US
Hi everyone.

I'm fairly new to AWK, but from what I've read I think it can do some powerful things if I know what I'm doing.

I would like to be able to substitute two fields in one file (a Fortran program I wrote) with the same fields from a C program that is very similar.

Here is one line from each file:

C file line:
vvd(test(-0.1), 6.183185307179586477, 1e-12, "test", "a", status);

Fortran file line:
CALL VVD ( test ( -0.1D0 ), 6.183185307179587D0,
: 1D-12, 'test', 'A', STATUS )

I began with writing a simple awk program to make the field separator a comma and print the second field as so:

BEGIN { FS = "," # make comma the field separator
}
$1 ~ /VVD/ { print $2
}

The C file has records which take only one line. On the other hand, the Fortran program (to conform to the 72 character requirement of my colleague) has records that have multiple lines (indicated by the continuation character :). I only am concerned about the second and third fields and need to know how to make ; the record separator for the one file and something else (maybe a newline?) for the second.

Any help would be very much appreciated.
 
I was wondering if there is a way to handle function calls like this one:

C file line:
vvd(test(-0.1), 6.183185307179586477, 1e-12, "test", "a", status);

Fortran file line:
CALL VVD ( test ( -0.1D0 ), 6.183185307179587D0,
: 1D-12, 'test', 'A', STATUS )

In this case, there are inner parentheses, which are also treated as field separators when they shouldn't be.

Thanks again for all of your help Annihilannic.
 
Geez, almost need to rewrite the whole thing to accommodate that! :) Because the test() occurs in the field we use to index the arrays it means a lot of data "massaging" to make it look like the C version so that we can look up the new values for the precision and tolerance...

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split it up by commas
        split($0,a,", *")
        ind=tolower(a[1])
        # strip off the function name and bracket
        sub(".*vvd\(","",ind)
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split it up by commas
        n=split($0,a,", *")
        ind=a[1]
        # remove the function call
        sub(".*CALL VVD \(","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", *")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

Annihilannic.
 
Hi, thanks for the new code. I tried to run it, and I get a warning and a fatal error

awk: New_transfer:13: warning: escape sequence `\(' treated as plain `('
awk: New_transfer:13: (FILENAME=testc.data FNR=1) fatal: Unmatched ( or \:) /.*vvd(/

I tried it as is since I don't think I need any begin statements to be done before any of the files are read.

 
Correct, this version doesn't need a BEGIN clause any more.

Hmm... what OS and version of awk are we looking at here? Try:

Code:
        sub(".*vvd[(]","",ind)
...
        sub(".*CALL VVD [(]","",ind)

Annihilannic.
 
No errors now but I get blanks in the updated file where the precision and tolerance fields should be...

My OS is RHEL 5.2 and have gawk-3.1.5-14
 
Seems to be working fine for me on the nearest system I could find (RHEL 4.6 / gawk 3.1.3) using the same test data I posted earlier. Anything different?

Annihilannic.
 
I am using the same test data that I wrote a few posts ago. I can't find anything different with the code. Here is what I have now which is not working.

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split it up by commas
        split($0,a,", *")
        ind=tolower(a[1])
        # strip off the function name and bracket
#        sub(".*vvd\(","",ind)
        sub(".*vvd[(]","",ind)
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split it up by commas
        n=split($0,a,", *")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", *")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }
 
I just tried that code with the single lines of test data you posted on 29 Sep 08 14:48 and it seems to work perfectly (on HP-UX). I also tried it on Linux with gawk and that's fine too.

Can you post the output of cat -vet file.c and cat -vet file.fortran for the data that does not work, just to check or any unexpected characters?

Annihilannic.
 
Hi Annihilannic,

Here are the results when I use cat -vet on the files:

cat -vet testc.data
vvd(testAnp(-0.1), 6.183185307179586477, 1e-12, "testAnp", "a", status);$

cat -vet testf.data
CALL VVD ( test_ANP ( -0.1D0 ), 6.183185307179587D0,$
: 1D-12, 'test_ANP', 'A', STATUS )$

I have "$" at the end of line.
 
The $s are expected, they just mark the ends of the lines in cat -vet output. Looks like there are no funny characters in there.

Since we are using the first field of the function call as the index to identify the lines into which to substitute the changed parameters... it's no surprise that it's not working for that case, because even after removing spaces and decimal notation, and converting to lowercase, testanp(-0.1) is not equal to test_anp(-0.1).

Maybe you need to remove underscores too...?

Annihilannic.
 
Removing the underscores did the trick! One tiny thing that did not print was the "D0" for the new Fortran precision. Everything else printed as it should with the precision and tolerance replaced by the C values. This is minor but I think it is needed in my code.

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split it up by commas
        split($0,a,", *")
        ind=tolower(a[1])
        # strip off the function name and bracket
#        sub(".*vvd\(","",ind)
        sub(".*vvd[(]","",ind)
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split it up by commas
        n=split($0,a,", *")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        # remove underscore
        sub("_","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", *")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

The vvd/VVD function can also test matrix elements. In C, they are coded as "vvd(A[1][1], more args" and in Fortran they are coded as "CALL VVD ( A(1,1), more args".

Would I just need to add another line in the evaluate /^[ ]*vvd/ expression like this: sub(".*vvd[[]","",ind)?
 
This problem keeps throwing curveballs which completely invalidate the solutions proposed so far...

The fact that the FORTRAN array references contain commas means that we can no longer reliably split the line up by commas to perform substitution in the correct fields.

Are there any other scenarios we need to know about before we try and resolve this one?

Are you sure it's not possible to dump all of this data into a flat file and read that in for processing? It seems to me like it would be a more sensible approach than hacking it directly into source code.

Annihilannic.
 
Sorry about that. I see your point about not being able to use commas as field separators anymore.

I would have to say that it would not be possible to dump the values into flat files.

Maybe there is a way to tell awk that a character immediately followed by a comma immediately followed by a character is part of the field?

I promise there are no more curveballs!
 
That's a good idea - if the code is formatted tidily and always includes spaces after the comma where it is used as a function parameter separator that should work fine. Just change the '*' to '+' in the split() calls to make it match one-or-more spaces rather than zero-or-more.

You'll also need to do a couple more substitions to make the C array syntax look like the FORTRAN syntax so that the indices match. i.e. replace "][" with ",", then "[" with "(" and "]" with ")".

Annihilannic.
 
Hello again.
I tried what you suggested on a larger piece of code that includes some array syntax. The updated FORTRAN file is okay until it gets to the array functions. Everything prints correctly but the prec and tol fields are empty.

I had to enclose the metacharacter as shown in my code.

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split line up by commas
#        split($0,a,", *")
        split($0,a,", +")
        ind=tolower(a[1])
        # strip off the function name and bracket
        sub(".*vvd[(]","",ind)
        # match C array syntax with Fortran array syntax
        sub("[][]",",",ind)
        sub("[[]","(",ind)
        sub("[]]",")",ind)
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split line up by commas
        n=split($0,a,", +")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        # remove underscore
        sub("_","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", +")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

 
Add this line temporarily and you should see why it's not working:

Code:
        sub("[]]",")",ind)
        [COLOR=green]print "index is " ind[/color]
        prec[ind]=a[2]

[][] actually matches one of either closing or opening square brackets. You want to match both together, so you would either need to use []][]], or perhaps simpler, \]\[.

Annihilannic.
 
Hi. Unfortunately, \]\[ gives errors (I tried that method first!) so I have to stick with the messier notation.
I print the index after the first substitution. So, it recognizes that array[][] should be a field. But I can't figure out how to get that to look like array( , ).

Here is the code again:

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split line up by commas
#        split($0,a,", *")
        split($0,a,", +")
        ind=tolower(a[1])
        # strip off the function name and bracket
        sub(".*vvd[(]","",ind)
        # match C array syntax with Fortran array syntax
        sub("[]][]]",",",ind)
        print "index is " ind
        sub("[][][]]]","[( , )]",ind)
        print "index is " ind
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split line up by commas
        n=split($0,a,", +")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        # remove underscore
        sub("_","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", +")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

I get a few ^M in my output file, but I'm willing to deal with those using sed or vim. I wish I knew why I can't use the simpler notation. I think it would make it easier to understand what I'm doing.
 
I figured this part out :)

Now,
Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split line up by commas
#        split($0,a,", *")
        split($0,a,", +")
        ind=tolower(a[1])
        # strip off the function name and bracket
        sub(".*vvd[(]","",ind)
        # match C array syntax with Fortran array syntax
        sub("[]][]]",",",ind)
#        print "index is " ind
        sub("[[]","(",ind)
#        print "index is " ind
        sub("[]][[]",",",ind)
#        print "index is " ind
        sub("[]]",")",ind)
#        print "index is " ind
        prec[ind]=a[2]
#        print "prec is " a[2] 
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split line up by commas
        n=split($0,a,", +")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        # remove underscore
        sub("_","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", +")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

Unfortunately, I didn't think the following would matter. The C code arrays start at a[0][0] whereas Fortran starts at a[1][1]. I didn't think this would be a problem, but it is :(. I don't know if I should bother forcing elements to match or just throw in the towel at this point.
 
I managed to match 6 of the 9 elements (thank goodness it is only a 3x3 array! Could you help with the problem of the overwriting of elements?

I also am attempting to get the "D0" back to the new Fortran precision values when it is missing. Could you help with that as well?

Code:
# match vvd preceded by white space
/^[ ]*vvd/ {
        # split line up by commas
        split($0,a,", +")
        ind=tolower(a[1])
        # strip off the function name and bracket
        sub(".*vvd[(]","",ind)
        # match C array syntax with Fortran array syntax
        sub("[]][]]",",",ind)
        sub("[[]","(",ind)
        sub("[]][[]",",",ind)
        sub("[]]",")",ind)
        # change C array indices to match Fortran indices
        sub("0,0","1,1",ind)
        sub("0,1","1,2",ind)
        sub("0,2","1,3",ind)
        sub("2,0","3,1",ind)
        sub("2,1","3,2",ind)
        sub("2,2","3,3",ind)
        print "index is " ind 
        prec[ind]=a[2]
        tol[ind]=a[3]
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        # split line up by commas
        n=split($0,a,", +")
        ind=a[1]
        # remove the function call
#        sub(".*CALL VVD \(","",ind)
        sub(".*CALL VVD [(]","",ind)
        # remove spaces
        gsub(" ","",ind)
        # remove decimal
        sub("D0","",ind)
        # remove underscore
        sub("_","",ind)
        ind=tolower(ind)
        sub(a[2],prec[ind])
        # if precision does not have "D" in it, insert "D0"
        if ("D" !~ prec[ind]) {
                sub(a[2],prec[ind]+"D0")
        }
        oldtol=a[3]
        # if the tolerance was not on this line, it
        # must be on the next
        if (oldtol == "") {
                print
                getline
                n=split($0,a,", +")
                oldtol=a[1]
                # remove line continuation and white space
                sub("[[:space:]:]*","",oldtol)
        }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

Thanks so much already.
 
Try doing the subs of the indices in reverse, that way they won't step on each other. i.e. start with sub("2,2","3,3",ind), and finish with sub("0,0","1,1",ind). Shame we didn't do this in perl, a tr/321/210/ would be really handy right about now! You could also do this in fewer steps by just doing the first array index, then the second.

The following code won't work for a couple of reasons:

Code:
        sub(a[2],prec[ind])
        # if precision does not have "D" in it, insert "D0"
        if ("D" !~ prec[ind]) {
                sub(a[2],prec[ind]+"D0")
        }

The regex you are matching needs to be on the right hand side of the !~ expression, so just reverse it.

The second sub() won't find a match because the first one has already replaced the matching string with a new precision.

To contatenate strings in awk just put them one after the other, "+" will attempt a mathematical operation resulting in 0. Try awk 'BEGIN {print "a"+"b"; print "c" "d"}' to see what I mean.

So try this instead:

Code:
        ind=tolower(ind)
        # if precision does not have "D" in it, append "D0"
        if (prec[ind] !~ "D") {
                prec[ind]=prec[ind] "D0"
        }
        sub(a[2],prec[ind])
        oldtol=a[3]

I notice as well that due to our change of separator we are losing the comma after the precision in the FORTRAN output... I'll leave fixing that as an exercise for you!

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top