Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete (Mostly) Duplicate Records Based On Criteria 1

Status
Not open for further replies.

SgtJarrow

Programmer
Apr 12, 2002
2,937
US
I haven't posted a question in a while thanks to all the help I have gotten in the past here but I am now stuck on a problem and would like to ask for some help. Must be my end-of-week brain drain catching up with me. :)

The university I work for has a student population of about 10,000 students. The students take online classes using a software we did not write. A bug in the software is causing a major error in our reporting. When an instructor grades a student's assignment, if the assignment needs to be returned to the student for additional work, the instructor must give the student a grade of 0 for that assignment and then create a duplicate assignment for the student to resubmit the assignment through. What this does is creates two (or more as needed) records in our database for that assignment for the student.

I have a stored procedure that gathers up a bunch of data and places it into a table for reporting students' graded activities. Prior to publishing the report, I need to check this table and delete any records that have a STATUS of GRADED and a FINAL_VALUE of 0 if the student also has another assignment record for that assignment. I know...I have already expressed the fact that this will delete a valid record if the student truly earned a zero for the assignment, but management is willing to accept that risk.

I have been fumbling over this a day or so and just can seem to find the right code...Here is my sample data:

Code:
IF OBJECT_ID('tempdb..#GAR') IS NOT NULL
		DROP TABLE #GAR

--Create a temp table for working
CREATE TABLE #GAR
(
	RUN_TIMESTAMP			VARCHAR(50),
	JOB_NAME				VARCHAR(50),
	SYSTUDENTID				VARCHAR(50),
	FIRST_NAME				VARCHAR(50),
	LAST_NAME				VARCHAR(50),
	COURSE_CODE				VARCHAR(50),
	SECTION_NUM				VARCHAR(50),
	SECTION_CODE			VARCHAR(50),
	SECTION_START			VARCHAR(50),
	SECTION_END				VARCHAR(50),
	WEEK_NUM				INT,
	ASSIGNMENT_TYPE			VARCHAR(500),
	ASSIGNMENT_NAME			VARCHAR(500),
	FINAL_VALUE				NUMERIC(18, 1),
	MX_POINTS				NUMERIC(18, 1),
	GRADE					NUMERIC(18, 1),
	STATUS					VARCHAR(50)
)

--Insert a sample of data.  Actual record count in table is approx 79,000 (about 10,000 unique students spread out over multiple courses and assignments per student)
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 3 - Test', 'Week 3 - Test', 43.2, 90.0, 48.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 1 - Test', 'Week 1 - Test', 32.4, 90.0, 36.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 1 - TBLS Assign', 'Week 1 - TBLS Assignment', 6.0, 10.0, 60.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 4 - TBLS Assign', 'Week 4 - TBLS Assignment', NULL, 10.0, NULL, 'COMPLETE')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - Test', 'Week 2 - Test', 54.0, 90.0, 60.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 3 - Assign', 'Week 3 - Assignment', 60.0, 70.0, 86.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 1 - Assign', 'Week 1 - Assignment', NULL, 70.0, NULL, 'IN PROGRESS')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 1 - Assign', 'Week 1 - Assignment', 0.0, 70.0, 0.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - Assign', 'Week 2 - Assignment', NULL, 70.0, NULL, 'IN PROGRESS')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - Assign', 'Week 2 - Assignment', 0.0, 70.0, 0.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 4 - Test', 'Week 4 - Test', NULL, 90.0, NULL, 'NOT COMPLETE')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - TBLS Survey', 'Week 2 - TBLS Survey', NULL, 0.0, NULL, 'COMPLETE')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 4 - Assign', 'Week 4 - Assignment', NULL, 70.0, NULL,	'COMPLETE')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 3 - TBLS Assign', 'Week 3 - TBLS Assignment', 0.0, 10.0, 0.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 3 - TBLS Assign', 'Week 3 - TBLS Assignment', NULL, 10.0, NULL, 'IN PROGRESS')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - TBLS Assign', 'Week 2 - TBLS Assignment', 0.0, 10.0, 0.0, 'GRADED')
INSERT INTO #GAR VALUES ('Aug  3 2012  7:00AM', 'GRADED_ACTIVITY', '123456', 'Robert', 'Johnson', 'ME2400', '2', '26785', '07/09/2012', '08/12/2012', 4, 'Wk 2 - TBLS Assign', 'Week 2 - TBLS Assignment', NULL, 10.0, NULL, 'IN PROGRESS')

--Delete records as follows:
--If two records exist where all columns up to ASSIGNMENT_NAME are identical and one of the records has STATUS = GRADED and FINAL_VALUE = 0, delete record with STATUS = 0 and FINAL_VALUE = 0
--In the above sample code, I expect only 13 of the 17 records to be returned.

--My attempt to find the duplicate records but I can't figure out the last bit to delete the specific records as defined above
SELECT
	RUN_TIMESTAMP
	, JOB_NAME
	, SYSTUDENTID
	, FIRST_NAME
	, LAST_NAME
	, COURSE_CODE
	, SECTION_NUM
	, SECTION_CODE
	, SECTION_START
	, SECTION_END
	, WEEK_NUM
	, ASSIGNMENT_TYPE
	, ASSIGNMENT_NAME
FROM #GAR
GROUP BY RUN_TIMESTAMP
	, JOB_NAME
	, SYSTUDENTID
	, FIRST_NAME
	, LAST_NAME
	, COURSE_CODE
	, SECTION_NUM
	, SECTION_CODE
	, SECTION_START
	, SECTION_END
	, WEEK_NUM
	, ASSIGNMENT_TYPE
	, ASSIGNMENT_NAME
--HAVING COUNT(*) > 1
ORDER BY RUN_TIMESTAMP
	, JOB_NAME
	, SYSTUDENTID
	, FIRST_NAME
	, LAST_NAME
	, COURSE_CODE
	, SECTION_NUM
	, SECTION_CODE
	, SECTION_START
	, SECTION_END
	, WEEK_NUM
	, ASSIGNMENT_TYPE
	, ASSIGNMENT_NAME

--Actual ouput should be return with this
SELECT
	RUN_TIMESTAMP,
	JOB_NAME,
	SYSTUDENTID,
	FIRST_NAME,
	LAST_NAME,
	COURSE_CODE,
	SECTION_NUM,
	SECTION_CODE,
	SECTION_START,
	SECTION_END,
	WEEK_NUM,
	ASSIGNMENT_TYPE,
	ASSIGNMENT_NAME,
	FINAL_VALUE,
	MX_POINTS,
	GRADE,
	STATUS
FROM #GAR

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
C#.NET Programmer
 
Here's what I would do...

First, add the having back in to the query so that we get just the dupes. Then, make this a CTE where we join back to the original table, which allows us to delete the rows.

[tt]
;With Duplicates As
(
Your query for the duplicates
)
Delete OriginalTable
From Duplicates
Inner Join OriginalTable
On -- All your join conditions here
Where OriginalTable.final_value = Convert(Numeric(18,1), 0.0)
and status = 'GRADED'
[/tt]

Based on your sample data...

Code:
; With Duplicates As
(
	SELECT
		RUN_TIMESTAMP
		, JOB_NAME
		, SYSTUDENTID
		, FIRST_NAME
		, LAST_NAME
		, COURSE_CODE
		, SECTION_NUM
		, SECTION_CODE
		, SECTION_START
		, SECTION_END
		, WEEK_NUM
		, ASSIGNMENT_TYPE
		, ASSIGNMENT_NAME
	FROM #GAR
	GROUP BY RUN_TIMESTAMP
		, JOB_NAME
		, SYSTUDENTID
		, FIRST_NAME
		, LAST_NAME
		, COURSE_CODE
		, SECTION_NUM
		, SECTION_CODE
		, SECTION_START
		, SECTION_END
		, WEEK_NUM
		, ASSIGNMENT_TYPE
		, ASSIGNMENT_NAME
	HAVING COUNT(*) > 1
) 
DELETE	G
From	Duplicates As D
		Inner Join #GAR As G
			On	D.RUN_TIMESTAMP = G.RUN_TIMESTAMP
			AND D.JOB_NAME = G.JOB_NAME
			AND D.SYSTUDENTID = G.SYSTUDENTID
			AND D.FIRST_NAME = G.FIRST_NAME
			AND D.LAST_NAME = G.LAST_NAME
			AND D.COURSE_CODE = G.COURSE_CODE
			AND D.SECTION_NUM = G.SECTION_NUM
			AND D.SECTION_CODE = G.SECTION_CODE
			AND D.SECTION_START = G.SECTION_START
			AND D.SECTION_END = G.SECTION_END
			AND D.WEEK_NUM = G.WEEK_NUM
			AND D.ASSIGNMENT_TYPE = G.ASSIGNMENT_TYPE
			AND D.ASSIGNMENT_NAME = G.ASSIGNMENT_NAME
WHERE	G.FINAL_VALUE = 0 AND G.STATUS = 'GRADED'
--Actual ouput should be return with this
SELECT
	RUN_TIMESTAMP,
	JOB_NAME,
	SYSTUDENTID,
	FIRST_NAME,
	LAST_NAME,
	COURSE_CODE,
	SECTION_NUM,
	SECTION_CODE,
	SECTION_START,
	SECTION_END,
	WEEK_NUM,
	ASSIGNMENT_TYPE,
	ASSIGNMENT_NAME,
	FINAL_VALUE,
	MX_POINTS,
	GRADE,
	STATUS
FROM #GAR 
ORDER BY RUN_TIMESTAMP
	, JOB_NAME
	, SYSTUDENTID
	, FIRST_NAME
	, LAST_NAME
	, COURSE_CODE
	, SECTION_NUM
	, SECTION_CODE
	, SECTION_START
	, SECTION_END
	, WEEK_NUM
	, ASSIGNMENT_TYPE
	, ASSIGNMENT_NAME



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
As always, George to the rescue. To me it looks correct. I have ran my report and sent it to the business for validation. But I think you hit is spot on. I'll give a final report back in a bit. THANKS.

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
C#.NET Programmer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top