Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

scoring records that were used in training/validation

Status
Not open for further replies.

woodlumn

IS-IT--Management
Oct 16, 2008
3
0
0
US
Hello,

I have a question pertaining to data partitioning, model training, and ultimately, scoring a data set (predictive modeling).

The heart of the question is this: If you have a population of 20,000 divided up into training/validating/testing (40/30/30) for modeling purposes, is it incorrect to use the resulting score code to score the same population of 20,000?

That was the way I did it, accidentally. So I went back and sampled the entire database (rather than using the very specific population of 20,000), reconstructed my modeling table, and went through the modeling/scoring process again. This time, I used my new score code to re-score the original 20,000, so that I could compare the results.

I compared the scores of 100 records. I found the difference between scores, took the absolute value, and calculated the average. My number was .05. This means that on average, a probability score of 80% may be off plus or minus 5%. So there was a difference, but that can be attributed to many things. All it really told me was that I need to ask the question!

So back to the question - what is considered "best practice" as far as scoring records that you trained your model from?

Many thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top