Guest_imported
New member
- Jan 1, 1970
- 0
Hi everyone,
I hope someone can help me
I´m not sure if I should balance my data for predictions or decision trees or I should not: Imagine you have 10000 cases (customers). 1000 answered your first direct mail, 9000 customers did not. For your next direct mail you want to select only profitable customers (who may answer you mail) with binary logistic regression or may be a decision
tree. Dependent variable is "Answer" vs. "No answer". Do you use your sample like it is (1000 customers vs. 9000 customers) or do you first balance your sample like taking all the 1000 customers who answered your first direct mail vs. a random sample of 9000 customers who didn´t answered you first mail?
Thank you very much for you help, best regards
Markus
I hope someone can help me
I´m not sure if I should balance my data for predictions or decision trees or I should not: Imagine you have 10000 cases (customers). 1000 answered your first direct mail, 9000 customers did not. For your next direct mail you want to select only profitable customers (who may answer you mail) with binary logistic regression or may be a decision
tree. Dependent variable is "Answer" vs. "No answer". Do you use your sample like it is (1000 customers vs. 9000 customers) or do you first balance your sample like taking all the 1000 customers who answered your first direct mail vs. a random sample of 9000 customers who didn´t answered you first mail?
Thank you very much for you help, best regards
Markus