Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Calculating Correlation Coefficient in SQL 1

Status
Not open for further replies.

IT4EVR

Programmer
Feb 15, 2006
462
US
A lot of reporting and statistical packages will have a correlation coefficient expression, but Access seems to be missing this one from its reportoire.

Below is an example that creates a correlation coefficient based on two fields of data. It is assumed that the data is normally distributed. Field1 and Field2 as well as table name can be substituted by your fields, table, etc.

Code:
SELECT (Count([Field1])*Sum([Field1]*[Field2])-Sum([Field1])*Sum([Field2]))/(Sqr((Count([Field1])*Sum([Field1]^2)-(Sum([Field1])^2))*(Count([Field2])*Sum([Field2]^2)-(Sum([Field2])^2)))) AS [Correlation Coefficient]
FROM tbl_Correlation;

The number you get will be the same as you would with the CORREL formula in Excel.
 
Nice bit of code. Thanks for posting.

One comment, I think the caveat regarding normal distribution would only be relevant to hypothesis testing using this statistic, the Pearson correlation coefficient.
Regardless of the interpretation of the value obtained, the calculation may always be performed.

Except when there is missing data. If a row should have a NULL value in one of the fields but not the other, the formula would not be correct because it requires pairs of values, one value from each column for every row. To guard against this a WHERE clause might be added.

Code:
SELECT (Count([Field1])*Sum([Field1]*[Field2])-Sum([Field1])*Sum([Field2]))/(Sqr((Count([Field1])*Sum([Field1]^2)-(Sum([Field1])^2))*(Count([Field2])*Sum([Field2]^2)-(Sum([Field2])^2)))) AS [Correlation Coefficient]
FROM tbl_Correlation
WHERE Field1 IS NOT NULL
  AND Field2 IS NOT NULL;
 
You're right, this is the Pearson's correlation. And thanks for your revision. I should have accounted for the possibility of having nulls.

I had actually gotten the statistical formula out of an old college text book, with all the wonderful greek symbols in it.

Then the next job was to convert it to a query.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top