Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Grouping people on consumption

Status
Not open for further replies.

pjenei

Programmer
Sep 27, 2007
2
HU
Hi,
I have to create groups of people based on their drug consumption and then I have to track the movements between the groups over time.
So I have the following table (simplified for this task):
Code:
CONSUMPTION (
  SSN VARCHAR2(9),
  DRUG_ID VARCHAR2(9),
  SELL_DATE DATE)
(SSN is the social security number of the consumer.)

First, I have to create groups. Every group is a combination of drugs, that people consume together. I have to find all combinations.
Next, I have to find the group that the consumer belongs to.
I tried to do it with a cursor based solution. It worked, but it is too slow. I have to find a faster one, because we have to pay for this data and the price depends on the runtime of the batch. I'm trying to find a set based solution, but it's really hard.
Does anyone have any idea?

Thanks,
Peter

PS: sorry for my poor English.
 
Any chance you could post the code of the cursor based solution ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Of course. I tried to translate the original Hungarian names and comments to English for easier reading.
It is written in Oracle PL/SQL (it runs on 9i).
The tables:
PATIENT: the patient's ID's and the date of their death.
CONSUMPTION: drug consumption (who, what, when).
SEARCHED_DRUG: drug_id's of drugs we are currently colleting data about.
TMPGROUP: temporary table for the groups.
TMPMEMBER: temporary table for the members of the groups in a specific quarter.
RESULTGROUP: I collect the results here. (quarters, groups, the number of members)
There are columns named INSERTED in the last 3 tables to collect the time of inserts for measuring performance.

I use the PREPARE_QUARTER procedure to create groups and to put patints in groups for sepcific quarters. (Now it is the slowest part of the batch.) The COUNT_GROUP procedure counts the changes in the group membership accross quarters. (It runs fast enough if using indexes. Now I drop indexes before running PREPARE_QUARTER to speed up the inserts, then I recreate them before running COUNT_GROUP to speed up selects. It is much faster then before, but isn't fast enough.)

There are 187 drugs we collect data about, nearly 650000 patients and about 8.5 million records in the CONSUMPTION table (all reduced to contain only rows that we are interested in).

Code:
CREATE SEQUENCE TMPSEQGROUP INCREMENT BY 1 START WITH 1 MINVALUE 1 NOCYCLE CACHE 20 NOORDER;

CREATE TABLE PATIENT
(
  SSN        VARCHAR2(9),
  EXIT_DATE  DATE
);

CREATE TABLE CONSUMPTION
(
  SSN       VARCHAR2(9),
  DRUG_ID   VARCHAR2(9),
  SELL_DATE DATE
);

CREATE TABLE SEARCHED_DRUG
(
  DRUG_ID  VARCHAR2(9)
);

CREATE TABLE TMPGROUP
(
  GROUP_ID         NUMBER(5) NOT NULL,
  GROUP_DEFINITION VARCHAR2(1500),
  INSERTED         DATE
);

CREATE TABLE TMPMEMBER
(
  GROUP_ID     NUMBER(5),
  SSN          VARCHAR2(9),
  QUARTER      VARCHAR2(6),
  INSERTED     DATE
);

CREATE TABLE RESULTGROUP
(
  PREV_QUARTER     VARCHAR2(6),
  ACT_QUARTER      VARCHAR2(6),
  PREV_GROUP       VARCHAR2(1500),
  ACT_GROUP        VARCHAR2(1500),
  SSN_COUNT        NUMBER(8),
  INSERTED         DATE
);

  PROCEDURE PREPARE_QUARTER(ABEGIN IN DATE, AEND IN DATE, AQUARTER VARCHAR2) IS
  /*
	Creates all combinations for a specific quarter, and put each patient in a group for that quarter.
  */
    varSSN VARCHAR2(9);
	varDRUG_ID VARCHAR2(9);
	varPREV_SSN VARCHAR2(9);
	varGROUP_DEFINITION VARCHAR2(1500);
	varGROUP_ID NUMBER(5, 0);
	varEXISTS NUMBER(1, 0);
    CURSOR CUR_SSN_DRUG_ID IS
    SELECT SSN, CONSUMPTION.DRUG_ID 
    FROM CONSUMPTION
    INNER JOIN SEARCHED_DRUG ON CONSUMPTION.DRUG_ID = SEARCHED_DRUG.DRUG_ID
	WHERE (SELL_DATE BETWEEN ABEGIN AND AEND)
    GROUP BY SSN, CONSUMPTION.DRUG_ID
	ORDER BY SSN, CONSUMPTION.DRUG_ID; 
  BEGIN
	varGROUP_DEFINITION := '-';
	OPEN CUR_SSN_DRUG_ID;
	LOOP
		FETCH CUR_SSN_DRUG_ID INTO varSSN, varDRUG_ID;
		EXIT WHEN CUR_SSN_DRUG_ID%NOTFOUND;
		varEXISTS := -1;
		IF varPREV_SSN IS NULL THEN
		   varPREV_SSN := varSSN;
		END IF;
		--Same patient as previous
		IF varSSN = varPREV_SSN THEN
		    varGROUP_DEFINITION := varGROUP_DEFINITION || varDRUG_ID || '-';
		ELSE
			--Is it an existing group?
			SELECT CASE WHEN EXISTS(SELECT 1 FROM tmpGROUP WHERE GROUP_DEFINITION = varGROUP_DEFINITION) THEN 1 ELSE 0 END INTO varEXISTS FROM DUAL;
			IF varEXISTS > 0 THEN
			   SELECT GROUP_ID INTO varGROUP_ID
			   FROM tmpGROUP
			   WHERE GROUP_DEFINITION = varGROUP_DEFINITION;
			ELSE
				varGROUP_ID := NULL;
			END IF;
			IF varGROUP_ID IS NULL THEN
				--New group
				SELECT tmpseqGROUP.NEXTVAL INTO varGROUP_ID FROM DUAL;
				INSERT INTO	tmpGROUP VALUES (varGROUP_ID, varGROUP_DEFINITION, CURRENT_DATE);		
			END IF;
			--Inserting a member in the group			 
			INSERT INTO tmpMEMBER VALUES (varGROUP_ID, varPREV_SSN, AQUARTER, CURRENT_DATE);
			varPREV_SSN := varSSN;
			varGROUP_DEFINITION := '-' || varDRUG_ID || '-';
		END IF;
	END LOOP;
	--Maybe the last one wasn't inserted  
	IF varEXISTS = -1 THEN
		--Is it an existing group?
		SELECT CASE WHEN EXISTS(SELECT 1 FROM tmpGROUP WHERE GROUP_DEFINITION = varGROUP_DEFINITION) THEN 1 ELSE 0 END INTO varEXISTS FROM DUAL;
		IF varEXISTS > 0 THEN
		   SELECT GROUP_ID INTO varGROUP_ID
		   FROM tmpGROUP
		   WHERE GROUP_DEFINITION = varGROUP_DEFINITION;
		ELSE
			varGROUP_ID := NULL;
		END IF;
		IF varGROUP_ID IS NULL THEN
			--New group
			SELECT tmpseqGROUP.NEXTVAL INTO varGROUP_ID FROM DUAL;
			INSERT INTO	tmpGROUP VALUES (varGROUP_ID, varGROUP_DEFINITION, CURRENT_DATE);		
		END IF;
		--Inserting a member in the group			 
		INSERT INTO tmpMEMBER VALUES (varGROUP_ID, varPREV_SSN, AQUARTER, CURRENT_DATE);
	END IF;
	CLOSE CUR_SSN_DRUG_ID;
  END;

  PROCEDURE COUNT_GROUP(APREV_QUARTER VARCHAR2, APREV_BEGIN IN DATE, APREV_END IN DATE, AACT_QUARTER VARCHAR2, ABEGIN IN DATE, AEND IN DATE)IS
  /*
	Counts the patients' movements between groups for the sepcified quarters.
  */
    varACT_GROUP_DEFINITION VARCHAR2(1500);
    varPREV_GROUP_DEFINITION VARCHAR2(1500);
	varSSN_COUNT NUMBER(8, 0);
    CURSOR CUR_GROUP_OUTER IS
    SELECT GROUP_DEFINITION 
    FROM tmpGROUP
	ORDER BY GROUP_DEFINITION;
    CURSOR CUR_GROUP_INNER IS
    SELECT GROUP_DEFINITION
    FROM tmpGROUP
	ORDER BY GROUP_DEFINITION;
  BEGIN
    --Steps through all groups
	OPEN CUR_GROUP_OUTER;
	LOOP
		FETCH CUR_GROUP_OUTER INTO varACT_GROUP_DEFINITION;
		EXIT WHEN CUR_GROUP_OUTER%NOTFOUND;
		--Steps through all groups again to create all pairs for the movements
		OPEN CUR_GROUP_INNER;
		LOOP
			FETCH CUR_GROUP_INNER INTO varPREV_GROUP_DEFINITION;
			EXIT WHEN CUR_GROUP_INNER%NOTFOUND;
			--Counts the movements between the two groups
			--(if the two groups are the same, the count means patients who consume the same set of drugs as in the previous quarter)
			SELECT COUNT(DISTINCT ACT_MEMBER.SSN)
			INTO varSSN_COUNT
			FROM tmpMEMBER ACT_MEMBER
			INNER JOIN tmpMEMBER PREV_MEMBER ON ACT_MEMBER.SSN = PREV_MEMBER.SSN
			INNER JOIN tmpGROUP ACT_GROUP ON ACT_MEMBER.GROUP_ID = ACT_GROUP.GROUP_ID
			INNER JOIN tmpGROUP PREV_GROUP ON PREV_MEMBER.GROUP_IDE = PREV_GROUP.GROUP_ID
			WHERE (ACT_MEMBER.QUARTER = AACT_QUARTER) AND (PREV_MEMBER.QUARTER = APREV_QUARTER) AND
				  (ACT_GROUP.GROUP_DEFINITION = varACT_GROUP_DEFINITION) AND
				  (PREV_GROUP.GROUP_DEFINITION = varPREV_GROUP_DEFINITION);
			INSERT INTO RESULTGROUP VALUES (
				   APREV_QUARTER, AACT_QUARTER, varPREV_GROUP_DEFINITION, varACT_GROUP_DEFINITION, varSSN_COUNT, CURRENT_DATE);
		END LOOP;
		CLOSE CUR_GROUP_INNER;
		--Counts patients, who were in the group in the previous quarter, but now they aren't in any of the groups
		SELECT COUNT(DISTINCT SSN)
		INTO varSSN_COUNT
		FROM tmpMEMBER
		INNER JOIN tmpGROUP ON tmpMEMBER.GROUP_ID = tmpGROUP.GROUP_ID
		INNER JOIN PATIENT ON PATIENT.SSN = tmpMEMBER.SSN
		WHERE (tmpMEMBER.QUARTER = APREV_QUARTER) AND
			  (tmpGROUP.GROUP_DEFINITION = varACT_GROUP_DEFINITION) AND
			  (NOT EXISTS(SELECT 1 FROM tmpMEMBER INNER_MEMBER
			  	   				 WHERE (INNER_MEMBER.QUARTER = AACT_QUARTER) AND
								 	   (INNER_MEMBER.SSN = tmpMEMBER.SSN))) AND
				(PATIENT.EXIT_DATE IS NULL OR PATIENT.EXIT_DATE > AEND);
		INSERT INTO RESULTGROUP VALUES (
			   APREV_QUARTER, AACT_QUARTER, varACT_GROUP_DEFINITION, 'NOMORE', varSSN_COUNT, CURRENT_DATE);
		--Counts patients, who were in the group in the previous quarter, but now are dead
		SELECT COUNT(DISTINCT SSN)
		INTO varSSN_COUNT
		FROM tmpMEMBER
		INNER JOIN tmpGROUP ON tmpMEMBER.GROUP_ID = tmpGROUP.GROUP_ID
		INNER JOIN PATIENT ON PATIENT.SSN = tmpMEMBER.SSN
		WHERE (tmpMEMBER.QUARTER = APREV_QUARTER) AND
			  (tmpGROUP.GROUP_DEFINITION = varACT_GROUP_DEFINITION) AND
			  (NOT EXISTS(SELECT 1 FROM tmpMEMBER INNER_MEMBER
			  	   				 WHERE (INNER_MEMBER.QUARTER = AACT_QUARTER) AND
								 	   (INNER_MEMBER.SSN = tmpMEMBER.SSN))) AND
				(PATIENT.EXIT_DATE BETWEEN ABEGIN AND AEND);
		INSERT INTO RESULTGROUP VALUES (
			   APREV_QUARTER, AACT_QUARTER, varACT_GROUP_DEFINITION, 'EXIT', varSSN_COUNT, CURRENT_DATE);
		--Counts new patients in this group (they weren't in any of the groups in the previous quarter)
		SELECT COUNT(DISTINCT SSN)
		INTO varSSN_COUNT
		FROM tmpMEMBER
		INNER JOIN tmpGROUP ON tmpMEMBER.GROUP_ID = tmpGROUP.GROUP_ID
		WHERE (tmpMEMBER.QUARTER = AACT_QUARTER) AND
			  (tmpGROUP.GROUP_DEFINITION = varACT_GROUP_DEFINITION) AND
			  (NOT EXISTS(SELECT 1 FROM tmpMEMBER INNER_MEMBER
			  	   				 WHERE (INNER_MEMBER.QUARTER = APREV_QUARTER) AND
								 	   (INNER_MEMBER.SSN = tmpMEMBER.SSN)));
		INSERT INTO RESULTGROUP VALUES (
			   APREV_QUARTER, AACT_QUARTER, 'NEW', varACT_GROUP_DEFINITION, varSSN_COUNT, CURRENT_DATE);
	END LOOP;
	CLOSE CUR_GROUP_OUTER;
  END;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top