I have a data set that looks like this:
bill# seq# line # reason
123 1 1 18
123 1 1 R1
456 1 1 18
789 1 1 R1
789 1 2 18
789 1 2 18
789 2 1 18
Each bill and seq# is unique and is considered a separate bill. Each bill can have multiple lines. I want to find a list of all duplicates, based on the reason of R1. Each bill can have multiple reasons. If a bill has a reason of R1 for all lines, then it is a true duplicate (ie bill#123). If a bill has a reason of R1 for at least 1 line, but not all, it's a partial duplicate (ie bill# 789, sequence 1). If a bill does not have R1 as a reason on any lines, it is not a duplicate (ie #456). I'm not sure how to code this to determine the 3 different categories. Any help is greatly appreciated!
CJ
bill# seq# line # reason
123 1 1 18
123 1 1 R1
456 1 1 18
789 1 1 R1
789 1 2 18
789 1 2 18
789 2 1 18
Each bill and seq# is unique and is considered a separate bill. Each bill can have multiple lines. I want to find a list of all duplicates, based on the reason of R1. Each bill can have multiple reasons. If a bill has a reason of R1 for all lines, then it is a true duplicate (ie bill#123). If a bill has a reason of R1 for at least 1 line, but not all, it's a partial duplicate (ie bill# 789, sequence 1). If a bill does not have R1 as a reason on any lines, it is not a duplicate (ie #456). I'm not sure how to code this to determine the 3 different categories. Any help is greatly appreciated!
CJ