Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Intrpolating missing values in Access table 1

Status
Not open for further replies.

adfo

Programmer
Oct 23, 2009
30
US
Dear all,

I have an Access table contains a "column" field , "row" field and some values under "Value" field (example will follow below). I need to interoplate / extrapolate missing values within each COLUMN if enough minimum number of values exist to allow interpolation. Next, interpolate within each ROW for any missing values that havn't been filled from the prvious process. example below:

Group Column Row Value
1 1 A -
1 1 B 8
1 1 C -
1 1 D 9
1 1 E -
1 1 F -
1 1 G 12
1 2 A -
1 2 B 6
1 2 C -
1 2 D -
1 2 E -
1 2 F 9
1 3 G -
1 3 A 7
1 3 B -
1 3 C 8
1 3 D -
1 3 E -
1 3 F 12
1 3 G -
2 1 A -
2 1 B -

STEP 1:

- For each COLUMN within the group, if there's at least 3 values, interpolate/ extrapolate linearly all missing values as necessary (column 1 within Group 1 above meets the criteria of 3 values)

- Column 2 within only 2 values, doesn't meet the criteria then skip

- Column 3 meets the criteria( at least 3 values), then interpolate / extrapolate and so on

STEP 2:

Now, for each ROW across all columns within the group (row A, B, etc.) , interpolate / extrapolate all missing values. This should fill all values skiped from step 1 above. After finishing, move to the next group and so on

Thanks
 
How big is the table and how many times do you need to do this? If it's a one-off job then you might do better to export all the data to Excel rather than writing some special-purpose VBA code in Access.

Geoff Franklin
 
about 10,000 records, I'm looking for a VBA code to do this either in excel or access. Thanks
 
How are ya adfo . . .\

Wow! You may not know it, but you've embarked on [blue]one of the most mathamatical endeavors ever known to man![/blue] PHD's have problems in this area! To be sure, your [blue]extrapolating[/blue] as you have data points that define curves in question.

So what do you use to extrapolate? There are a number of solution plans available ... however ... I recommend [blue]Regression Analysis[/blue].

[blue]Regression Analysis[/blue] involves four specific states of data ... [blue]Linear, Logarimthic, Exponential, and Power.[/blue] In a nutshell its called [blue]curve fitting[/blue], and the data points will determine which of the four types will [blue]fit[/blue] best. I mention this because you say:
adfo said:
[blue]For each COLUMN within the group, if there's at least 3 values, interpolate/ extrapolate [purple]linearly[/purple] all missing values as necessary ... [/blue]
The datapoints you've revealed already show that altrnate analysis of data points you've presented could easily fit the other types besides [blue]linear![/blue] [surprise]

If the points dont fit linear you in for a surprise!

See Ya! . . . . . .

Be sure to see thread181-473997 [blue]Worthy Reading![/blue] [thumbsup2]
Also faq181-2886 [blue]Worthy Reading![/blue] [thumbsup2]
 
Yeah I have the same question as AceMan. Are you purely interpolating between points like in the referenced thread or are you doing a "best fit" analysis over the whole set? Either way can be done, even if you have to leverage Excel functions to do some regression. Any chance you could provide what you think is the answer for the following sets:

Group Column Row Value
1 1 A -
1 1 B 8
1 1 C -
1 1 D 9
1 1 E -
1 1 F -
1 1 G 12


1 3 A 7
1 3 B -
1 3 C 8
1 3 D -
1 3 E -
1 3 F 12
1 3 G -

I am especially curious about 1 1 A, and 1 3 G.

If you want a least squares solution, then for column 3 the values are:

A 7
B 7.63
C 8
D 9.68
E 10.71
F 12
G 12.76

Is that what you are looking for?

Coding this would be rather simple. Here is how.

1) Not sure if this is hypothetical or if your field names are actually:
Group
Column
Row
Value

Every single one of those names are Reserved words in Access/VBA. If you have not run into a problem yet, you will. Rename every one! From your data it looks like they are integer values and strings so you could do something like
intGroup
intColumn
strRow
snglValue

or add something more descriptive, or just myGroup, myColumn...

2) Make a query to return only the Groups and columns that have three or more values:
qryHasThree
Code:
SELECT 
   tblInterpolate.intGroup, 
   tblInterpolate.intColumn, 
FROM 
   tblInterpolate
WHERE 
   Not (tblInterpolate.snglValue) Is Null
GROUP BY 
   tblInterpolate.intGroup, tblInterpolate.intColumn
HAVING 
   Count(tblInterpolate.intColumn>2
ORDER BY 
   tblInterpolate.intGroup, tblInterpolate.intColumn;

Returns:
Code:
intGroup	intColumn
1                      1
1                      3

3) For each row and column in the above query open a recordset:
...
Do While Not rsGroupColumn.EOF
grp = rsGroupColumn!intGroup
col = rsGroupColumn!intColumn
strSql = "select snglValue from tblInterpolate where (intGroup = " & grp & " AND intColumn = " & col & ") ORDER BY strRow"
Debug.Print strSql
Set rsInterp = CurrentDb.OpenRecordset(strSql, dbOpenDynaset)
...

4) Now pull the values into an array as the Y's
5) Pull the absolute position's of the values into an array as the known X's
6) Open an Excel comm object
7) Use these arrays as the parameters for the Slope and Intercept functions, and return the slope and intercept
8)Now Simply read through the missing values and multiply the absolue postion by the slope and add the intercept.

If you are looking for something more exact solution than a linear regression, you will have to specify. But your data appears to be very linear. If interested, please post what you are looking for. Also what is the purpose? Maybe there is a better overall solution than "interpolating/extrapolating"?
 

Thanks all for your feedback!

Now let me expand a little bit to explain the actual problem in more details.

The table includes about 8000 records. It's basically a database for a group of concrete columns for an engineering site which are arranged in a grid layout. The "value" refers to the depth of the columns "concrete colums". I'll replace the older field name "Column" to "Y" and "row" to "X" for now to avoid terminolgy confusion. There's another field in the database, "offset", which I excluded before to simplify the problem but I'll include it here to show the bigger picture. Offset is the horizontal distance of 'Y' from a fixed datum, and is the same for each column (Y) within the group (logically, all columns in 'Y' are equidistant from that datum in the construction site!).

The first interpolation for each "Y" within the group should be performed in a very simple way, assuming the missing values for depth are located equidistant from the surrounding two values. (i.e. 8.5 if one value is missing between 8 & 9, and 9.33, 10.66 if 'two' values are missing between 8 & 12 for instance) no regression or complex approach is required here. A simple and separate linear interpolation between any two consecutive points. If the missing value at the edge "such as row "A" or "G" in this example, the extrapolation will be also linear based on the line slope consequent or precedes it". The only requirment for interpolation is to have at least 3 values in a column (just QA/QC requirment for accuracy, otherwise skip to later step)

After doing this on all columns within the group, then I look at the horizontal rows "X", and interpolate between everyt two given values in a similar way to the above concept. *The only difference here is that the missing points may not be equidistant* I'll consider the *offset* value to interpolate proportionally between any two points. (i.e. if value "8" at row A, column "1" has offset=150, and value "10" for row A, column "3" has offset=300, then row A , column "2" , which has an offset of 200 should be 1/3 of the incremental value between column 1 & 3 "8.66" (200 is located at 1/3 the distance between 150 & 300).

Having said that, I expect the output of this hypothetical example "for now" (the actual table has 8000 records) will be as follows:

values in "" are the interpolated ones, in ()are the extrapolated for each column. Values in []are the interpolated for each row.

Column 1 & 3:

Group Y X Value offset(Y)
1 1 A (7.5) 150
1 1 B 8 150
1 1 C "8.5" 150
1 1 D 9 150
1 1 E "10" 150
1 1 F "11" 150
1 1 G 12 150

1 3 A 7 300
1 3 B "7.5" 300
1 3 C 8 300
1 3 D "9.33" 300
1 3 E "10.66" 300
1 3 F 12 300
1 3 G (13.66) 300

and column 2, would be:

1 2 A [7.33] 200
1 2 B 6 200
1 2 C [8.33] 200
1 2 D [9.11] 200
1 2 E [10.22] 200
1 2 F 9 200
1 2 G [12.55] 200


Then the same should be performed on the next group until the end of the table. I appreciate a VBA code with either access or excel "if I export this into excel". Thanks
 
Ok I will take a look now that you provided the information. Should be easy considering here is the solution for a pure linear regression, assuming the best fit line for all points. Actually I have to admit this is pretty slick if I say so myself.

1) I added a new table that listed the letters and the corresponding row values
rowID intRow
A 1
B 2
C 3
D 4
E 5
F 6
G 7
..........
Z 26

2) Now build a query
qryBestFitData
Code:
SELECT tblInterpolate.ID, tblInterpolate.intGroup, tblInterpolate.intColumn, tblInterpolate.strRow, tblRowInteger.intRow, tblInterpolate.snglValue
FROM tblInterpolate INNER JOIN tblRowInteger ON tblInterpolate.strRow = tblRowInteger.rowID
ORDER BY tblInterpolate.intGroup, tblInterpolate.intColumn, tblRowInteger.intRow;

3)Run the following code
Code:
 Do While Not rsGroupColumn.EOF
     grp = rsGroupColumn!intGroup
     col = rsGroupColumn!intColumn
     strSql = "select intRow, snglValue from qryBestFitData where (intGroup = " & grp & " AND intColumn = " & col & ") ORDER BY strRow"
     Debug.Print strSql
     Set rsInterp = CurrentDb.OpenRecordset(strSql, dbOpenDynaset)
     Do While Not rsInterp.EOF
       If Not IsNull(rsInterp!snglValue) Then
          If blnXdimensioned Then
             ReDim Preserve Xvalues(0 To UBound(Xvalues) + 1)
             ReDim Preserve Yvalues(0 To UBound(Yvalues) + 1)
          Else
            ReDim Xvalues(0 To 0)
            ReDim Yvalues(0 To 0)
            blnXdimensioned = True
          End If
          Xvalues(UBound(Xvalues)) = rsInterp!intRow
          Yvalues(UBound(Yvalues)) = rsInterp!snglValue
       Else
          If blnMissingXdimensioned Then
             ReDim Preserve missingXvalues(0 To UBound(missingXvalues) + 1)
          Else
             ReDim missingXvalues(0 To 0)
             blnMissingXdimensioned = True
          End If
          missingXvalues(UBound(missingXvalues)) = rsInterp!intRow
       End If
           rsInterp.MoveNext
     Loop
     slope = EXCELapp.WorksheetFunction.slope(Yvalues, Xvalues)
     yIntercept = EXCELapp.WorksheetFunction.Intercept(Yvalues, Xvalues)
     strSql = "select intRow, snglValue from qryBestFitData where (intGroup = " & grp & " AND intColumn = " & col & " AND snglValue is NULL) ORDER BY strRow"
     Set rsInterp = CurrentDb.OpenRecordset(strSql, dbOpenDynaset)
     Do While Not rsInterp.EOF
       rsInterp.Edit
         rsInterp!snglValue = (slope * rsInterp!intRow) + yIntercept
         Debug.Print rsInterp!snglValue
       rsInterp.Update
       rsInterp.MoveNext
     Loop
     rsGroupColumn.MoveNext
     blnXdimensioned = False
     blnMissingXdimensioned = False
  Loop
End Sub

Results:
Code:
intGroup intColumn strRow intRow snglValue
1	1	A	1	6.947369
1	1	B	2	8
1	1	C	3	8.578947
1	1	D	4	9
1	1	E	5	10.21053
1	1	F	6	11.02632
1	1	G	7	12
1	2	A	1	
1	2	B	2	6
1	2	C	3	
1	2	D	4	
1	2	E	5	
1	2	F	6	9
1	3	A	1	7
1	3	B	2	7.631579
1	3	C	3	8
1	3	D	4	9.684211
1	3	E	5	10.71053
1	3	F	6	12
1	3	G	7	12.76316
2	1	A	1	
2	1	B	2
 
I believe this does part one. Can you run it against some real data. Obviously you will have to change all the field, query, and table names.

The way this works it finds the first and last values. It then does interpolation for each gaps between the first and last values.
Then it checks if there are missing values at the leading "edge", and fixes that. Then it checks for missingvalues at the tail end and fixes that
Code:
Public Sub interpExtrap()

  Dim rsGroupColumn As DAO.Recordset
  Dim rsInterp As DAO.Recordset
  Dim strSql As String
  Dim col As Integer
  Dim grp As Integer
  Dim beginValue As Single
  Dim beginRow As Integer
  Dim endValue As Single
  Dim EndRow As Integer
  Dim increment As Single
  Dim lastValuePosition As Integer
  Dim bk As Variant
  Set rsGroupColumn = CurrentDb.OpenRecordset("qryHasThree", dbReadOnly)
  
  Do While Not rsGroupColumn.EOF
     grp = rsGroupColumn!intGroup
     col = rsGroupColumn!intColumn
     strSql = "select intRow, snglValue from qryBestFitData where (intGroup = " & grp & " AND intColumn = " & col & ") ORDER BY strRow"
     'Debug.Print strSql
     Set rsInterp = CurrentDb.OpenRecordset(strSql, dbOpenDynaset)
     'Do the straight interpolation first
     rsInterp.FindLast "not snglValue is null"
     lastValuePosition = rsInterp.AbsolutePosition
     'Move to first value
     rsInterp.FindFirst "not snglValue is null"
     
     Do
       'Find first gap
       rsInterp.FindNext "snglValue is null"
      
       bk = rsInterp.Bookmark
       'Find beginning value
       rsInterp.FindPrevious "not snglValue is Null"
       beginValue = rsInterp!snglValue
       beginRow = rsInterp!intRow
       'Find ending value
       rsInterp.FindNext "Not snglValue is Null"
       endValue = rsInterp!snglValue
       EndRow = rsInterp!intRow
       'Return to first gap
       rsInterp.Bookmark = bk
       increment = getIncrement(beginValue, endValue, beginRow, EndRow)
       rsInterp.Edit
         rsInterp!snglValue = beginValue + increment
       rsInterp.Update
     Loop Until rsInterp.NoMatch Or rsInterp!intRow >= lastValuePosition
     
     'Check beginning and get following increment
     rsInterp.MoveFirst
     If IsNull(rsInterp!snglValue) Then
        rsInterp.FindFirst "not snglValue is Null"
         beginValue = rsInterp!snglValue
         rsInterp.MoveNext
         endValue = rsInterp!snglValue
         increment = getIncrement(beginValue, endValue, 1, 2)
         rsInterp.FindPrevious "snglValue is null"
         Do While Not rsInterp.BOF
            rsInterp.Edit
              rsInterp!snglValue = beginValue - increment
            rsInterp.Update
            beginValue = beginValue - increment
            rsInterp.MovePrevious
         Loop
     End If
     
     'Check the end and get Preceeding increment
     
     rsInterp.MoveLast
     If IsNull(rsInterp!snglValue) Then
        rsInterp.FindLast "not snglValue is Null"
         endValue = rsInterp!snglValue
         MsgBox "end " & endValue
         rsInterp.MovePrevious
         beginValue = rsInterp!snglValue
         MsgBox "begin " & beginValue
         increment = getIncrement(beginValue, endValue, 1, 2)
         MsgBox increment
         rsInterp.FindNext "snglValue is null"
         Do While Not rsInterp.EOF
            rsInterp.Edit
              rsInterp!snglValue = endValue + increment
            rsInterp.Update
            endValue = endValue + increment
            rsInterp.MoveNext
         Loop
     End If
     
     rsGroupColumn.MoveNext
  Loop
 End Sub

With this code I get
Code:
intGroup	intColumn	strRow	intRow	snglValue
1	A	1	7.5
1	1	B	2	8
1	1	C	3	8.5
1	1	D	4	9
1	1	E	5	10
1	1	F	6	11
1	1	G	7	12
1	2	A	1	
1	2	B	2	6
1	2	C	3	
1	2	D	4	
1	2	E	5	
1	2	F	6	9
1	3	A	1	7
1	3	B	2	7.5
1	3	C	3	8
1	3	D	4	9.333333
1	3	E	5	10.66667
1	3	F	6	12
1	3	G	7	13.33333
2	1	A	1	
2	1	B	2
See if you can get this part working, then I will demo the part two using the offset.
 
For part two "interpolotion of rows within groups", you show 3 columns in the group. Column 1 and 3 have 3 values to start with so end up with complete values after step 1. So you can get values for column 2 from column 1 and 3. That is easy if this is always the case. Is this always the case? Could you have many more than 3 columns? Can you have trailing and leading nulls? Here is the data sorted by Group, Row, Col. This looks very easy, but not sure if I had a lot of columns with big gaps and leading and trailing nulls how to handle it.
Code:
intGroup	intRow	intColumn	strRow	snglValue	offSet
1	1	1	A	7.5	150
1	1	2	A		200
1	1	3	A	7	300
1	2	1	B	8	150
1	2	2	B	6	200
1	2	3	B	7.5	300
1	3	1	C	8.5	150
1	3	2	C		200
1	3	3	C	8	300
1	4	1	D	9	150
1	4	2	D		200
1	4	3	D	9.3333	300
1	5	1	E	10	150
1	5	2	E		200
1	5	3	E	10.6667	300
1	6	1	F	11	150
1	6	2	F	9	200
1	6	3	F	12	300
1	7	1	G	12	150
1	7	3	G	13.3333	300
2	1	1	A		300
2	2	1	B		300

can you provide more details?
Could you have something like this?
intGroup intRow intColumn strRow snglValue offSet
1 1 1 A 150
1 1 2 A 7.5 200
1 1 3 A 300
1 1 4 A 400
1 1 5 A 7 600
1 1 6 A 800

If so what would you expect is the answers?
 
Thanks MajP,

I see the effort you put into this which I really appreciate it!

I added more data 8 columns in total(I'm trying to keep this post at reasonable length "not too long" since I can't use file attachment capability) to attach bigger file.

You may see the actual database has location field as well (another counter to loop on), you can ignore it if you want. I couldn't test it on my end yet, since the creator of the original file locked some features on the file which giving me some hard time to execute the code. I'll definitely let you know the outcome once I work this out. Please find more data below (I inserted '0.0' instead of the blanks to make it looks nicer! I hate this text editor!). Thanks again



location intGroup intRow IntCol strRow snglValue offset
4 15 1 1 A 0.0 218.9
4 15 2 1 B 0.0 218.9
4 15 3 1 C 0.0 218.9
4 15 4 1 D 4.2 218.9
4 15 5 1 E 0.0 218.9
4 15 6 1 F 4.9 218.9
4 15 7 1 G 0.0 218.9
4 15 8 1 H 0.0 218.9
4 15 10 1 J 6.0 218.9
4 15 11 1 K 7.0 218.9
4 15 12 1 L 0.0 218.9
4 15 13 1 M 7.6 218.9
4 15 1 2 A 3.5 224.1
4 15 2 2 B 0.0 224.1
4 15 3 2 C 0.0 224.1
4 15 4 2 D 5.0 224.1
4 15 5 2 E 0.0 224.1
4 15 6 2 F 5.3 224.1
4 15 7 2 G 0.0 224.1
4 15 8 2 H 0.0 224.1
4 15 10 2 J 0.0 224.1
4 15 11 2 K 0.0 224.1
4 15 12 2 L 8.2 224.1
4 15 13 2 M 0.0 224.1
4 15 1 3 A 0.0 228.7
4 15 2 3 B 4.0 228.7
4 15 3 3 C 0.0 228.7
4 15 4 3 D 0.0 228.7
4 15 5 3 E 0.0 228.7
4 15 6 3 F 0.0 228.7
4 15 7 3 G 6.8 228.7
4 15 8 3 H 0.0 228.7
4 15 10 3 J 0.0 228.7
4 15 11 3 K 0.0 228.7
4 15 12 3 L 9.1 228.7
4 15 13 3 M 0.0 228.7
4 15 1 4 A 0.0 234.6
4 15 2 4 B 0.0 234.6
4 15 3 4 C 0.0 234.6
4 15 4 4 D 0.0 234.6
4 15 5 4 E 5.3 234.6
4 15 6 4 F 0.0 234.6
4 15 7 4 G 0.0 234.6
4 15 8 4 H 0.0 234.6
4 15 10 4 J 0.0 234.6
4 15 11 4 K 0.0 234.6
4 15 12 4 L 0.0 234.6
4 15 13 4 M 8.8 234.6
4 15 1 5 A 0.0 239.4
4 15 2 5 B 0.0 239.4
4 15 3 5 C 0.0 239.4
4 15 4 5 D 0.0 239.4
4 15 5 5 E 0.0 239.4
4 15 6 5 F 0.0 239.4
4 15 7 5 G 0.0 239.4
4 15 8 5 H 0.0 239.4
4 15 10 5 J 0.0 239.4
4 15 11 5 K 0.0 239.4
4 15 12 5 L 10.0 239.4
4 15 13 5 M 0.0 239.4
4 15 1 6 A 3.6 242.8
4 15 2 6 B 0.0 242.8
4 15 3 6 C 5.1 242.8
4 15 4 6 D 0.0 242.8
4 15 5 6 E 6.2 242.8
4 15 6 6 F 6.5 242.8
4 15 7 6 G 0.0 242.8
4 15 8 6 H 7.6 242.8
4 15 10 6 J 8.2 242.8
4 15 11 6 K 0.0 242.8
4 15 12 6 L 9.2 242.8
4 15 13 6 M 10.1 242.8
4 15 1 7 A 4.1 249.6
4 15 2 7 B 5.2 249.6
4 15 3 7 C 0.0 249.6
4 15 4 7 D 0.0 249.6
4 15 5 7 E 7.3 249.6
4 15 6 7 F 0.0 249.6
4 15 7 7 G 8.2 249.6
4 15 8 7 H 0.0 249.6
4 15 10 7 J 8.6 249.6
4 15 11 7 K 0.0 249.6
4 15 12 7 L 0.0 249.6
4 15 13 7 M 9.7 249.6
4 15 1 8 A 0.0 252.9
4 15 2 8 B 0.0 252.9
4 15 3 8 C 0.0 252.9
4 15 4 8 D 0.0 252.9
4 15 5 8 E 0.0 252.9
4 15 6 8 F 0.0 252.9
4 15 7 8 G 0.0 252.9
4 15 8 8 H 0.0 252.9
4 15 10 8 J 0.0 252.9
4 15 11 8 K 0.0 252.9
4 15 12 8 L 0.0 252.9
4 15 13 8 M 10.9 252.9
 

Hi MajP,

Just a follow up on my last post.

I could test the code, couple of issues so far:
- getIncrement function ?
- "ORDER BY strRow" in the strSql. the code shows an error unless I remove the whole order by clause

Thanks
 
Hi MajP,

I appreciate if you forward:

- The declaration section for part one code
- What is "getIncrement" function in part two code?

Thanks a lot!
 

Here is my db. I added some improvements. Namely I added a field to your table "interpValue". This way you never mess with the original data "snglValue". All the interpolation results go into this new column. So you can clear this field out and overwrite it with new data. The chance of something going wrong and messing up your raw data is pretty high. This way you never lose or manipulate data in the field. I added some new queries that you would have to add

The procedure is called "interpExtrap"
my results
Code:
intColumn strRow snglValue interpValue
1	A	0	3.15
1	B	0	3.5
1	C	0	3.85
1	D	4.2	4.2
1	E	0	4.55
1	F	4.9	4.9
1	G	0	5.267
1	H	0	5.634
1	 J	6	6
1	 K	7	7
1	 L	0	7.3
1	 M	7.6	7.6
2	A	3.5	3.5
2	B	0	4
2	C	0	4.5
2	D	5	5
2	E	0	5.15
2	F	5.3	5.3
2	G	0	5.88
2	H	0	6.46
2	 J	0	7.04
2	 K	0	7.62
2	 L	8.2	8.2
2	 M	0	8.78
3	A	0	3.44
3	B	4	4
3	C	0	4.56
3	D	0	5.12
3	E	0	5.68
3	F	0	6.24
3	G	6.8	6.8
3	H	0	7.375
3	 J	0	7.95
3	 K	0	8.525
3	 L	9.1	9.1
3	 M	0	9.675
4	A	0	0
4	B	0	0
4	C	0	0
4	D	0	0
4	E	5.3	5.3
4	F	0	0
4	G	0	0
4	H	0	0
4	 J	0	0
4	 K	0	0
4	 L	0	0
4	 M	8.8	8.8
5	A	0	0
5	B	0	0
5	C	0	0
5	D	0	0
5	E	0	0
5	F	0	0
5	G	0	0
5	H	0	0
5	 J	0	0
5	 K	0	0
5	 L	10	10
5	 M	0	0
6	A	3.6	3.6
6	B	0	4.35
6	C	5.1	5.1
6	D	0	5.65
6	E	6.2	6.2
6	F	6.5	6.5
6	G	0	7.05
6	H	7.6	7.6
6	 J	8.2	8.2
6	 K	0	8.7
6	 L	9.2	9.2
6	 M	10.1	10.1
7	A	4.1	4.1
7	B	5.2	5.2
7	C	0	5.9
7	D	0	6.6
7	E	7.3	7.3
7	F	0	7.75
7	G	8.2	8.2
7	H	0	8.4
7	 J	8.6	8.6
7	 K	0	8.967
7	 L	0	9.333
7	 M	9.7	9.7
8	A	0	0
8	B	0	0
8	C	0	0
8	D	0	0
8	E	0	0
8	F	0	0
8	G	0	0
8	H	0	0
8	 J	0	0
8	 K	0	0
8	 L	0	0
8	 M	10.9	10.9

However, I have thought of a way cleaner solution.
1) Make a query that returns for each 0 value the previous non zero and its row, the following non zero and its row, the current row. so for row 2H I would have

snglValue currentRow beginValue beginRow EndValue EndRow
0 8 5.3 6 8.2 11

It is not a simple query, but it is not overly difficult.
2) in the same query you build a calculated field the does the interp
InterpValue: ((EndValue - BeginValue) / (endRow - BeginRow)) * (currentRow - beginRow) + beginValue

3)do an update query to write to the table
4) the above query would handle everything but the leading and trailing edges that have 0 values.

5)You would write a different query for the leading edge and another for the trailing edges. And do two more update queries.

It would be a more "bullet proof" method less likely to throw an error. The code I provided works fine, but the data needs to be perfect. One null field, one bad value, can cause it to crash.
 

Thanks alot MajP!

I very much appreciate your help on this.

I see you also added a field "intRow" to tblInterpolate instead of 'inner joining' with tblRowInteger in the updated qryInterpData. Is this a preferrable approach? I'll try it with real data (thousands of records) and see what will happen

I also see that 'mdlBestfit' module hasn't been used in your latest approach. I think we don't need it here. Am I right?

Any idea on how to do the rows now based on offset? I think it will be similar to columns but the new increment function should be proportional to the actual distances. Am I right?

Thanks

 
1) I added the introw to the table just for demo purposes and ease of use. You can do it either way. My code runs off of the query "qryInterpData"

" strSql = "select intRow, interpValue, interpValue from qryInterpData where (intGroup = " & grp & " AND intColumn = " & col & ") ORDER BY intRow" "

So as long as you can build that query, you are fine. One thing nice about queries is that you can alias all of your fields. I actually do not really even need that field as long as there is a field to sort the data. I use the absolute position of the record as my row.

2) However one question. You do not have a row I in your data set. Is that on purpose or a typing error? Assume the following
G 2
H 0
J 6

Do you want the result to be
G 2
H 4
J 6

or

G 2
H 3.33
J 6
(as if there is a row I not shown)

3) The mdlBestFit was for the linear regression method. That can go away, but you have to admit it was pretty slick.

4) Can you post your actual data? 4Shared works well, and it is free

 
After running the interpolation within a group. The data can be sorted by Group, Row, then column. This provides something like (group, :
[/code]
ID intRow intColumn strRow interpValue offset
1 1 1 A 3.15 218.9
13 1 2 A 3.5 224.1
25 1 3 A 3.44 228.7
37 1 4 A 0 234.6
49 1 5 A 0 239.4
61 1 6 A 3.6 242.8
73 1 7 A 4.1 249.6
85 1 8 A 0 252.9
2 2 1 B 3.5 218.9
14 2 2 B 4 224.1
26 2 3 B 4 228.7
38 2 4 B 0 234.6
50 2 5 B 0 239.4
62 2 6 B 4.35 242.8
74 2 7 B 5.2 249.6
86 2 8 B 0 252.9
[/code]

From your example, you said
After doing this on all columns within the group, then I look at the horizontal rows "X", and interpolate between every two given values in a similar way to the above concept.
So just to verify you only would do this for ID 85.?
(The only value "surrounded" by two known values)

If you want to do it for IDs 37 and 49, this is how I would do it in general.
Code:
ID intRow intColumn strRow interpValue	offset
25	1	3	A	3.44	228.7
37	1	4	A	0	234.6
49	1	5	A	0	239.4
61	1	6	A	3.6	242.8

x1 = 3.44
x2 = 3.6
x3 = x
y1 = 228.7
y2 = 242.8
y3 = 234.6

y2-y1 / x2 - x1 = y3-y1 / X - x1
14.1 / .16 = 5.9 / X - 3.44
X = 5.9/88.125 + 3.44

X3 = X = 3.467

Is that correct?

How about trailing and leading zeros such as 85?
 
Thanks MajP,


- The letter "I" is not used at all in the database, "J" follows "H" . So in the example you provided, the answer will be:

G 2
H 4
J 6


- Your concept for row interpolation is correct. The math is fine, however, there was a little typo in your final result (X3 =X = 3.507 "not 3.467"), and this should be for ID 37. Similarly, for ID 39, I expect the result 3.56 "same slope".

The original database file ,is huge, I created a snapshot of it. The original depths are not complete in the database, so I populated some for testing.

The original database file has few special cases. For example, in the intColums there may be non integer values (not a lot though), I ignore the decimals (i.e. if 3.75, I consider it 3). Also in the strRow, there could be a combination of two letters (such as JK). This indicates that it's located between J & K, and the interpolation will be made made between (J & K) at the column level only if the the whole column is valid for interpolation otherwise I would ignore it. I specified a definition for such cases in the tblRows so I think there should be no problem on looping on intRow. There may still few weird rows (i.e. FG1, FG2 etc.) I would ignore the whole record I didn't specify them in the tblRows database.

Just to give an idea on the nature of the database, each group is independent (no interpolation between the groups), all interpolations are performed and contained for each group. If we move to another group, then we extablish new interpolation relations within this new group. Similarly, each location is independent from other locations and so on. I understand from a previous post that snglValue shouldn't contain NULLs, I may haven't done that in this database, but will do may be it's a good idea.

I created 'qrytblInterpolation' and uploaded a testing file (hope it works!):


I appreciate you help, please let me know if you have any comment. Thanks!
 
1) I think where you have dual row designators you should list them as fractions in the row table. Then For the row values you can handle the interpolation of the dual row designators as follows. I would think your "row" table would have the following:

A 1
AB 1.5
B 2
BC 2.5
C 3

2) The database is huge because it is not compacted and is/was part of a replication set. I ran a make table query using the visible fields and compacted the data. The db went from 7MB to 336K

3)The orginal code and queries assumed that the table had null values where no depth data was available. I wrote the original code to support that. Then you provided a data set with 0 values, and I changed the code. Now you provided the real data that has nulls for the snglValue field. I do not feel like switching the code and queries back. So I will run an update query to update the null values into zeroes

4)You did not answer. When doing the row interpolation do you also do "leading edge / trailing edge" extrapolations? Or do you only do interpolation between known points.?

5) I wil get you the solution tomorrow. I figured out how to do this in SQL. The queries are somewhat involved, but overall it is a more understandable solution.
 

Thanks MajP

The original database has 'nulls'. I put zeros in a copnsequent post to overcome the "text editor's" misalignment problem. I apologize for the confusion, but either option you suggest would work.

The row interpolation may 'also' have the trailing or leading edges issue. In addition, the row interpolation will be proportional to the distance (offset difference) between columns.

Thanks
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top