Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

reading missing values in array? 2

Status
Not open for further replies.

Ahn210

Technical User
Jun 18, 2013
11
Hello,

Can someone help me reading arrays with missing values?

Following is data that I got in text file:

Code:
A     B               C           D              E           F
85337 20120531        23.2457     20.08000       148210.48   -0.017605
85337 20120629        23.2457     21.95000       162012.96    0.093128
85339 20101231                    13.46000       414500.70   -0.061960
85339 20110131                    13.05000       401874.76   -0.024522

I declared arrays a/b/c/d/e/f and and read those data through do loops.


When all columns are filled, read is successful (ie. 85337).

But for 85339 with missing data in column C, result of reading gets mixed up:

Code:
A     B               C              D              E           F
85339 20110131        13.05000       401874.76      -0.024522    85339.0
(column D data is read into column C // and Column F repeats column A's data // shifted to left)

How can I read the missing value into array?

thanks,
 
When there are missing values, you cannot read data in the typical list-directed style; instead, you need to specify a format that tells exactly what value to read from which columns.

I have never had the need for this, so, I don't remember how to go about it when there are no values whatsoever (only empty spaces)...I do remember that some commercial applications that use this style of input, have the ability to interpret empty spaces as if they were zeroes. Read up and find out if that is the case for the READ statement out of the box or whether you need to specify somehow (compile flag?, another argument to READ?)

Otherwise, you first need to read the entire line of data into a single character*130 kind of string variable and THEN perform your internal readings from there...after making sure that there is data to be read from the target columns.

There are a couple of threads in this forum where such thing was addressed already, you know, reading variable number of items per row.

 
do you know in advance :-

1) which column has missing values ? is it always column C ?
2) the value of A in each of the rows that have missing values ? is it always either 85337 or 85339 ?

if so, then why not have something like :-

Code:
DO I = 1 , N
   READ (1,*) A (I)
   IF ( A (I) == 85337 ) THEN
      READ (1,*) B (I) , C (I) , D (I) , E (I) , F (I)
   ELSE IF ( A (I) == 85339 ) THEN
      READ (1,*) B (I) ,         D (I) , E (I) , F (I)
   END IF
END DO

where (unit = 1) is the file containing your data
 
Bill, that won't really work because the first read will take the whole line and the next read will take the input on to the next line. If you want to do something like that, then read the whole line into a string first and then read from the string.
 
xwb -- you are right ! i completely forgot that little detail ... i'm a bit rusty :(

instead, try something like the following revised code :-

Code:
DO I = 1 , N
   READ (1,*) A (I)
   BACKSPACE 1

   IF ( A (I) == 85337 ) THEN
      READ (1,*) A (I), B (I) , C (I) , D (I) , E (I) , F (I)
   ELSE IF ( A (I) == 85339 ) THEN
      READ (1,*) A (I), B (I) ,         D (I) , E (I) , F (I)
   END IF
END DO

unless i'm missing something (please let me know if i am), that "should" work ... without the need for using a string ... although, the string idea should work, too
 
something else to add to the above

my suggestion AND the "string" suggestion should both work.

my suggestion might be less tedious -- if you know in advance WHICH column has blank entries, WHERE (in the column) those blank entries are, etc

otherwise, if you don't know that info in advance, then i agree that the "string" suggestion is probably the better option
 
If the data are separated by TABs you could write a short script, that replaces missing data with zeros. I tried it with awk:
Code:
$ cat Ahn210_tab.txt
A       B               C       D               E               F
85337   20120531        23.2457 20.08000        148210.48       -0.017605
85337   20120629        23.2457 21.95000        162012.96       0.093128
85339   20101231                13.46000        414500.70       -0.061960
85339   20110131                13.05000        401874.76       -0.024522 

$ awk -f missing_values_tab.awk Ahn210_tab.txt
A       B               C       D               E               F
85337   20120531        23.2457 20.08000        148210.48       -0.017605
85337   20120629        23.2457 21.95000        162012.96       0.093128
85339   20101231        0       13.46000        414500.70       -0.061960
85339   20110131        0       13.05000        401874.76       -0.024522
where the awk code I used is:
missing_values_tab.awk
Code:
# Run: awk -f missing_values_tab.awk Ahn210_tab.txt
BEGIN { 
  FS = OFS = "\t" 
}

NR !=1 { 
  for(i=1; i<=NF; i++) {
    if($i ~ /^ *$/) $i = 0
  }
  #print NF
}

{
  print $0
}
 
I thought about how to correct the data if it's not separated by Tabs, but by spaces and aligned to the columns. But, nothing simpler as something like this has occurred to me:

missing_values_space.awk
Code:
# Run: awk -f missing_values_space.awk Ahn210_space.txt
BEGIN { 
  FS = " "
  OFS = "\t"

  # Data file settings:
  # number of data columns
  nr_columns = 6
  # beginning position of all columns
  col[1] = 1
  col[2] = 7
  col[3] = 23
  col[4] = 35
  col[5] = 50
  col[6] = 62
}

NR !=1 {
  if (nr_columns != NF) {
    # some elements are absent and should be zero
    j = 1
    for (i=1; i<= nr_columns; i++) {
      # examine substr from col[i]-1 to col[i]+1
      substring = substr($0, col[i]-1, 3)
      if ((substring == "   ") || (substring == "")) {
        line_array[i] = 0
      }  
      else {
        line_array[i] = $j
        j++    
      }
    }
    # at end move array elements into awk-columns
    for (i=1; i <= nr_columns; i++) {
      $i = line_array[i]
    }
  }
}

{
  print $0
}

Output:
Code:
$ cat Ahn210_space.txt
A     B               C           D              E           F
85337 20120531        23.2457     20.08000       148210.48   -0.017605
85337 20120629        23.2457     21.95000       162012.96    0.093128
85339 20101231                    13.46000       414500.70   -0.061960
85339 20110131                    13.05000       401874.76   -0.024522 
$ awk -f missing_values_space.awk Ahn210_space.txt
A     B               C           D              E           F
85337   20120531        23.2457 20.08000        148210.48       -0.017605
85337   20120629        23.2457 21.95000        162012.96       0.093128
85339   20101231        0       13.46000        414500.70       -0.061960
85339   20110131        0       13.05000        401874.76       -0.024522
$ cat Ahn210_space_01.txt
A     B               C           D              E           F
85337 20120531        23.2457
85337                 23.2457                    162012.96
85339 20101231                    13.46000                   -0.061960
      20110131                    13.05000       401874.76
$ awk -f missing_values_space.awk Ahn210_space_01.txt
A     B               C           D              E           F
85337   20120531        23.2457 0       0       0
85337   0       23.2457 0       162012.96       0
85339   20101231        0       13.46000        0       -0.061960
0       20110131        0       13.05000        401874.76       0

I tried it with awk - something similar could be surely done in Fortran too. But I personally prefer to correct data outside of processing program and then feed the corrected data into processing.
 
Formatted read would work just fine, no pre-processing, no nothing...blank spaces would be interpreted as zeros.
Code:
program rrr
    integer a(10), b(10)
    real    c(10), d(10), e(10), f(10)
    character*80 fmt
    fmt='(i5,1x,i8,7x,f8.4,4x,f9.5,6x,f7.2,3x,f9.6)'
    
    do i = 1,6
        read(*,fmt) a(i), b(i), c(i), d(i), e(i), f(i)
    end do    
    write(*,fmt) (a(i), b(i), c(i), d(i), e(i), f(i), i=1,6)
    
end program rrr

compile this program and run it with input redirection with a file with at least 6 lines
 
Really, I thought that it doesn't work in Fortran when OP asked. But it's nice feature of fortran to replace spaces with zeros.
:)
 
But is it possible to use formatted read, if the data were TAB delimited?
Is there a format specifier for TABulator - like nX for n SPACEs ?
 
mikrom said:
Really, I thought that it doesn't work in Fortran when OP asked
More often than that, that's the problem. People use computer programs to try different things in regards to their model, but often overlook using the compiler to try different things in regards to their program!

mikrom said:
But is it possible to use formatted read, if the data were TAB delimited?
Is there a format specifier for TABulator - like nX for n SPACEs ?
Yes and yes. There is the T specifier with Tn for absolute referencing and TLn/TRn for relative referincing to the left/right from current position.

If the missing column is totally gone and there are no blank spaces as wide as the format spec but instead, there are two consecutive tabs, then, the read does not work.

If, on the other hand, the missing data is not there but there are as many blank spaces in its place, then, the read does work and interprets the spaces as zeros (as before)
Code:
program rrr
    integer a(10), b(10)
    real    c(10), d(10), e(10), f(10)
    
    character*80 fmt
    fmt='(i5,TR1,i8,TR1,f7.4,TR1,f8.5,TR1,f9.2,TR1,f9.6)'
    
    do i = 1,6
        read(*,fmt) a(i), b(i), c(i), d(i), e(i), f(i)
    end do    
    write(*,fmt) (a(i), b(i), c(i), d(i), e(i), f(i), i=1,6)
    
end program rrr
 
Hi salgerman,
I didn't know the TR descriptor before. I tried your example and it works.
Thank you very much!
 
Thank you for everyone who kindly provided suggestions.
I applied salgerman's read in format method as it seems to be the easiest and it works!.

Thank you!
 
Ahn210 -- i'm glad you found a method that works for you
salgerman -- thanks to you, too. just like mikrom, i also did not know about the TR descriptor. we learn something new every day :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top