Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

substituting fields in one file with same fields from another 1

Status
Not open for further replies.

starlite79

Technical User
Aug 15, 2008
89
US
Hi everyone.

I'm fairly new to AWK, but from what I've read I think it can do some powerful things if I know what I'm doing.

I would like to be able to substitute two fields in one file (a Fortran program I wrote) with the same fields from a C program that is very similar.

Here is one line from each file:

C file line:
vvd(test(-0.1), 6.183185307179586477, 1e-12, "test", "a", status);

Fortran file line:
CALL VVD ( test ( -0.1D0 ), 6.183185307179587D0,
: 1D-12, 'test', 'A', STATUS )

I began with writing a simple awk program to make the field separator a comma and print the second field as so:

BEGIN { FS = "," # make comma the field separator
}
$1 ~ /VVD/ { print $2
}

The C file has records which take only one line. On the other hand, the Fortran program (to conform to the 72 character requirement of my colleague) has records that have multiple lines (indicated by the continuation character :). I only am concerned about the second and third fields and need to know how to make ; the record separator for the one file and something else (maybe a newline?) for the second.

Any help would be very much appreciated.
 
You can redefine the record separator at any time by changing the value of the special variable RS.

Presuming that you are supplying these two files as input files on the awk command-line, a common trick to determine which file you are currently processing is to compare NR (overall record count) with FNR (record count of current input file). When they differ you are processing the second (or a subsequent) file, e.g.

Code:
awk '
    FNR==NR {
        # do stuff for first input file here
        next # skip to next record
    }
    {
        RS="<new record separator>"
        # processing for second and subsequent input files here
    }
' c_file fortran_file > updated_fortran_file

However by changing the RS mid way through processing interesting things may happen... try it and see I guess!

I personally avoid changing RS and use logic and buffers to accumulate lines if I'm processing blocks of text over multiple lines, but it depends on the problem.

Annihilannic.
 
Thanks for your reply. I attempted some of what you mentioned and made an awk program called sub_fields. Here is the code:
Code:
BEGIN {FS="," # make comma the field separator
  }
  FNR==NR {
    $1 ~ /VVD/ { # assign fields to global variables
       $2 = "prec"; $3 = "tol"
       }
    next # skip to the next record
  }
  {
    RS="STATUS )"
      $1 ~ /VVD/ {
         gsub(/:/, "" ); # ignore continuation in Fortran file
         gsub($2, "prec"); gsub($3,"tol")
         }
  }
From the command line I typed awk -f sub_fields c_file fortran_file > updated_fortran_file.

However, when I opened the updated_fortran_file, it was an empty file. Could anyone let me know what I am doing wrong either in the code and/or by attempting to run the program?

Thanks!
 
Realized an obvious mistake (forgot to use -f when I invoked awk on the sub_fields program). I corrected some syntax errors and changed some of the program.

Code:
BEGIN {FS="," # make comma the field separator
  }
  FNR==NR { RS=";"
  }
    $1 ~ /vvd/ { # assign fields to global variables
       $2 = prec; $3 = tol
    next # skip to the next record
       }
    RS="STATUS )"
      $1 ~ /CALL VVD/ {
         gsub(/:/, "" ); # ignore continuation in Fortran file
         gsub($2, prec); gsub($3, tol)
         }

This assumes the C file is read first and the Fortran file is read second. I would love some hints on a more general code. Also, I get a new file that is no longer empty, but all that happens is (if I'm understanding it correctly) that the new file has the same line as the Fortran file. I'm testing with just one line each until I can debug it (the real files are many lines long).
What I want to do is substitute just two fields (not the entire record) of Fortran file when record has "CALL VVD" at beginning (probably need that ^ with /CALL VVD/, huh?) with the similar fields of C file (which I tried to assign to variables "prec" for precision and "tol" for tolerance.

Any help would be great!
 
You need to remember that the entire C file is processed before the Fortran file. I'm presuming that there are multiple function calls like this in each file (otherwise you may as well just update them by hand!), so you would need to store all of the values from the C file in an array of some kind so that you can recall them all when processing the Fortran file.

Is there some data in the Fortran file that needs to remain unchanged (i.e. the other parameters to the function calls)? If not, it would be simpler to just generate new Fortran code based on the C input. If you could give us a couple more example lines of input data from both files and the expected output it would help understand the problem better.

Note that you appear to have your assignments in the first part the wrong way around. Also you have no print statement, the lines are just being printed implicitly at the moment because the RS="STATUS )" assignment is being treated as an expression that always evaluates to "true".

Annihilannic.
 
I'm presuming that there are multiple function calls like this in each file
Yes, many function calls matched. Doing by hand would be a nightmare!!
Yes, there is data in the Fortran file that would need to remain unchanged. I am willing to accept that the tolerances, for example, would need to be changed back from 1e-12 to 1D-12.
Could you give an example of where I would put a print statement? I read most of an AWK manual written by the developers and yet feel lost in some respects.

There are other functions besides VVD in both test programs, but the vvd/VVD one is the only one I'm concerned about.

I'm attaching code snippets.
Code:
      CALL VVD ( DPSIBI, -0.2025309152835086622D-06, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DEPSBI, -0.3306041454222147661D-07, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DRA, -0.7078279744199225811D-07, 1D-12,
     :           'BI00', ' ', STATUS )

Code:
vvd(dpsibi, -0.2025309152835086613e-06, 1e-12,
      "Bi00", "dpsibi", status);
   vvd(depsbi, -0.3306041454222147847e-07, 1e-12,
      "Bi00", "depsbi", status);
   vvd(dra, -0.7078279744199225506e-07, 1e-12,
      "Bi00", "dra", status);

vvd and VVD do appear elsewhere (when they are defined), but not at the beginning of lines as the subroutine calls. There are several test routines that call this function. Also, in the Fortran program, sometimes the third field (separated by comma) is continued on the next line, but I don't want AWK to treat it as a new record.

I hope I've explained the problem better.
 
Try this:

Code:
BEGIN {
        FS="[(, ]+" # make any number of commas, brackets
                    # and spaces the field separator
}
# match vvd preceded by white space
/^[     ]*vvd/ {
        # assign fields to global arrays
        prec[$3]=$4; tol[$3]=$5
        # change exponent syntax
        sub("e","D",tol[$3])
        sub("e","D",prec[$3])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        ind=tolower($4)
        sub($5,prec[ind])
        oldtol=$6
        # if the tolerance is not on this line, it
        # must be on the next
        if (oldtol == "") { print; getline; oldtol=$3 }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the fortran input
FNR != NR { print }

The field indexes look a little odd because when you redefine the field separator to be one or more of a range of characters it considers white space at the beginning of a line to follow an initial empty field.

I'm assuming that the first parameter is unique for each function call, and using that as an index to arrays containing the precision and tolerance so they can be recalled later when processing the FORTRAN input.

I'm also making the big assumption that if the FORTRAN call does wrap the third parameter to the next line, it will be the first parameter on the next line. Of course this will break if it happens to wrap in other places as well. It's pretty messy, but hopefully it'll give you some ideas.

Annihilannic.
 
Thanks Annihilannic. I think I get the basic ideas and should be able to take it work for my programs. I really appreciate the assistance!
 
I had to put this project on the back burner for a while. I've tried the code as is, and I got a new Fortran file that looks like this:

CALL VVD ( iau_ANP ( -0.1D0 ), 6.183185307179587D0,
: 1D-12, 'iau_ANP',, STATUS )

So, it looks like the fifth field (if it is separated by comma) is changed to a blank. Nothing else changes. I tried a few modifications, but the end product was the same. Your code made sense to me, so I'm not sure why it didn't do the intended things.

Any ideas?
 
I suspect it's because of the mixed case of the first parameter, which would make the array lookup result in nothing. I assumed, based on the examples given, that all of them were in lower case in the C file, and upper case in the FORTRAN file. Is FORTRAN case sensitive? If not, just index the array by lower case names too.

Annihilannic.
 
It used to be so that Fortran could only be written in Uppercase. Now it doesn't matter, though I chose to use mostly uppercase. When you say, "just index the array to lower case names too", I'm not entirely sure what you mean.

I see you've used ind = tolower($4)... for the second file parsed (the Fortran file). Do I need to write something like index (I imagine I cannot use ind more than once?) = tolower($5) as well? Or do I write ind = tolower($0) for the entire line in the Fortran file?

I don't understand what the program is treating as a field. Originally I thought I did. How is precision $4 and tolerance $5? Also, not every precision in the C file has the "e" in it. Will that be a problem?
 
What I meant was... when processing the C file, set ind=tolower($3) and then use ind as the index to the prec[] and tol[] arrays instead of $3.

Using this line as an example:

Code:
   vvd(depsbi, -0.3306041454222147847e-07, 1e-12,

Note that I set the field separator to be a regex matching "a sequence of one or more commas, brackets or spaces".

awk encounters the white space before "vvd" and considers it a field separator, therefore it decides that $1 is empty. "vvd" is $2. "(" is a field separator. "depsbi" is $3. ", " is a field separator. "-0.3306041454222147847e-07" is $4. ", " is again a field separator. "1e-12" is $5.

For some reason awk by default treats any amount of white space as a separator, ignoring any white space at the beginning of a line. Whenever you override the FS this behaviour stops... I'm not sure whether there is some way you can simulate this behaviour with a custom regex, I'd be glad to see if someone knows!

If the precisions do not contain an "e" the sub() will do nothing.

Annihilannic.
 
Hi,

I made the changes as you suggested. I'll post them here.

Code:
BEGIN {
   FS = "[(,]+" # make any number of commas, brackets
         # and spaces the field separator
}
# match vvd preceded by white space
/^[ ]*vvd/ {
        # assign fields to global arrays
        ind=tolower($3)
        prec[ind]=$4; tol[ind]=$5
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        ind=tolower($4)
        sub($5,prec[ind])
        oldtol=$6
        # if the tolerance is not on this line, it
        # must be on the next
        if (oldtol == "") { print; getline; oldtol=$3 }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

Unfortunately, the updated Fortran file looks the same as the old Fortran file. So, is the definition of what the prec and tol are somehow getting overwritten? It would make my day to have this work. Can I ask for your help again?
 
I think you have just lost a space in the definition of the input field separator. Otherwise that script seems to work fine for me.

Annihilannic.
 
I inserted extra space for the input FS. But the updated Fortran file looks the same as the old Fortran file except it is missing the "STATUS )" on all lines and lost a ", STATUS )" on the last line. :(

May I see your output? My updated fortran file looks like this:

CALL VVD ( DPSIBI, -0.2025309152835086622D-06, 1D-12,
: 'BI00', ' ',
CALL VVD ( DEPSBI, -0.3306041454222147661D-07, 1D-12,
: 'BI00', ' ',
CALL VVD ( DRA, -0.7078279744199225811D-07, 1D-12,
: 'BI00', ' '

The precisions are what they were in the old Fortran file, not the C as they should be. In this case the tolerances were 1e-12 in both C and Fortran.


Here is the C file again,

vvd(dpsibi, -0.2025309152835086613e-06, 1e-12,
"Bi00", "dpsibi", status);
vvd(depsbi, -0.3306041454222147847e-07, 1e-12,
"Bi00", "depsbi", status);
vvd(dra, -0.7078279744199225506e-07, 1e-12,
"Bi00", "dra", status);

The last few digits are different. Doing this by hand doesn't sound so crazy anymore :(
 
Do you have tabs in the code? Maybe you just need to add a tab to the IFS.

Code:
$ cat file.c
   vvd(dpsibi, -0.2025309152835086613e-06, 1e-12,
      "Bi00", "dpsibi", status);
   vvd(depsbi, -0.3306041454222147847e-07, 1e-12,
      "Bi00", "depsbi", status);
   vvd(dra, -0.7078279744199225506e-07, 1e-12,
      "Bi00", "dra", status);
   vvd(drp, -0.9078279744199225506e-07, 1e-13,
      "Bi00", "drp", status);
$ cat file.fortran
      SOME OTHER FORTRAN
      CALL VVD ( DPSIBI, -0.2025309152835086622D-06, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DEPSBI, -0.3306041454222147661D-07, 1D-12,
     :           'BI00', ' ', STATUS )
      SOME MORE OTHER FORTRAN
      CALL VVD ( DRA, -0.7078279744199225811D-07, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DRP, -0.7078279744199225811D-07,
     :           1D-12, 'BI00', ' ', STATUS )
$ awk -f starlite79.awk file.c file.fortran
      SOME OTHER FORTRAN
      CALL VVD ( DPSIBI, -0.2025309152835086613D-06, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DEPSBI, -0.3306041454222147847D-07, 1D-12,
     :           'BI00', ' ', STATUS )
      SOME MORE OTHER FORTRAN
      CALL VVD ( DRA, -0.7078279744199225506D-07, 1D-12,
     :           'BI00', ' ', STATUS )
      CALL VVD ( DRP, -0.9078279744199225506D-07,
     :           1D-13, 'BI00', ' ', STATUS )
$

Annihilannic.
 
Thanks, I'll try that. One thing of note is that the Fortran code was created on a Linux workstation, whereas the C code was created on a Windows machine. Maybe I need to do a dos2unix conversion on the C code? Any ideas?
 
I tried including a tab as an IFS, but now I get the old fortran = updated fortran. Here is my code in its entirety:

Code:
#!/usr/bin/awk

BEGIN {
   FS = "[(,]\t+" # make any number of commas, brackets, tabs,
         # and spaces the field separator
}
# match vvd preceded by white space
/^[     ]*vvd/ {
        # assign fields to global arrays
        ind=tolower($3)
        prec[ind]=$4; tol[ind]=$5
        # change exponent syntax
        sub("e","D",tol[ind])
        sub("e","D",prec[ind])
        next # skip to the next record
}
# match CALL VVD anywhere on a line
/CALL VVD/ {
        ind=tolower($4)
        sub($5,prec[ind])
        oldtol=$6
        # if the tolerance is not on this line, it
        # must be on the next
        if (oldtol == "") { print; getline; oldtol=$3 }
        sub(oldtol,tol[ind])
        print
        next
}
# print any other lines in the Fortran input
FNR != NR { print }

Did I include the tab correctly? Your output for the new fortran file looks exactly like what I want.

My awk file is called transfer_fields. I do not include an awk extension. At the command line, I type awk -f transfer_fields c.file f.file > newf.file.

I ran dos2unix on the c.file, but that made no difference.
 
The \t should be inside the [ ], otherwise it will try to match either a ( followed by a tab or a , followed by a tab. Also it looks like you've lost the space again. Try:

Code:
        FS="[(, \t]+"

Annihilannic.
 
Thanks! It finally has worked for me. I'm going to test it on a bigger piece of the code, but thanks again!

Sorry, I was not listening when you said I was missing a space :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top