Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

stronger correlation formula

Status
Not open for further replies.

bardcan

Technical User
Nov 6, 2009
3
AU
If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill average to 101. There is a HUGE amount of data, so I am trying to find the formula that will show that there is a strong correlation between the two - that if the patient takes the pill, the most likely result is that their B measurement will go up by 1 point. A standard correlative coefficient shows a low correlation... around .15. Any help would be greatly appreciated.
 
This is more of a statistical test than data mining. Actually, a paired t-test. You have two sets of data, one with pill, one without. The null hypothesis is that the two data sets are identical. Then you (likely) disprove the hypothesis with (say) 90% certainty using the t-test.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Correction - this is not a paired t-test, it is an unpaired t-test.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
johnherman, it would have been better had the study been designed properly to allow a paired t-test. The usual strategy is to give the product or a placebo to everyone at day 1, do the analysis, wait until you are certain the effects of the product have gone, then give product to all the former placebo people, and the placebo to all the product people. This way you have paired measurements, and the analysis gets much more sensitive.
 
what formula would you enter to get the kind of result I'm looking for?
 
Data Mining is used to find "unknown" trends and relationships in the data. You have a hypothesis regarding a relation in the data and are seeking to prove or disprove it, or, in other words, determine the degree of confidence in which the data supports your hypothesis. I would venture to guess that every statistical package on the market supports t-test.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill average to 101. There is a HUGE amount of data, so I am trying to find the formula that will show that there is a strong correlation between the two - that if the patient takes the pill, the most likely result is that their B measurement will go up by 1 point. A standard correlative coefficient shows a low correlation... around .15. Any help would be greatly appreciated."

The most commonly used correlation measure(Pearson's correlation) is not well-suited to this problem. I will suggest that you measure two things:

1. Magnitude: The difference between the mean of variable B for variable A 0s and for variable A 1s.

and...

2. Significance: Try a t-test or bootstrap to establish that the difference between the two means is unlikely to be zero.


 
Right, but how would you write this as a formula in a spreadsheet format?
 
It's probably not worth the effort to write your own t-test within Excel. It's been done and statistical packages are relatively cheap. Some stat packages might have Excel compatibility or plug-ins. Good Luck

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top