Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MEDIAN THRU A SQL QUERY PROBLEM ... 2

Status
Not open for further replies.

msng

Programmer
Oct 14, 2003
27
US
how can i calculate median thru a SQL QUERY. To the query below, I need to add MEDIAN(TOTAL_HOURS) for each HRID.
Please help.

My query is as follows:

SELECT A.HRID,
LTRIM(RTRIM(A.LAST_NAME)) || ', '||LTRIM(RTRIM(A.FIRST_NAME)) || ' '||LTRIM(RTRIM(A.MIDDLE_NAME)) AS EMPLOYEE_NAME,
NVL(SUM(B.TOTAL_HOURS),0) AS TOTAL_HOURS,
COUNT(B.HRID) AS TOTAL_COUNT,
E.SUPERVISOR_HRID,
A.MRG,
E.MANAGER_HRID,
E.DISTRICT_HRID,
E.DIVISION_HRID
FROM A,
B,
E
WHERE A.HRID = B.HRID(+)
AND A.HRID = E.HRID
AND B.ATTENDANCE_START_DATE >= TO_DATE(FROM_DATE,'MM/DD/YYYY')
AND B.ATTENDANCE_START_DATE <= TO_DATE(T_DATE,'MM/DD/YYYY')
GROUP BY A.HRID,
LTRIM(RTRIM(A.LAST_NAME)) || ', '||LTRIM(RTRIM(A.FIRST_NAME)) || ' '||LTRIM(RTRIM(A.MIDDLE_NAME)),
E.SUPERVISOR_HRID,
A.MRG,
E.MANAGER_HRID,
E.DISTRICT_HRID,
E.DIVISION_HRID
ORDER BY EMPLOYEE_NAME;

Please help. Thanks.
 
Give a glance at PERCENTILE_CONT function, isn't it what you're looking for?

Regards, Dima
 
MSNG,

Sorry, but Oracle doesn't have a built-in MEDIAN function. You will need to build your own user-defined function, &quot;MEDIAN&quot;. I've thought it through and I believe the function will need to be either 1) rather specific to your table, or 2) it will need parameters from which your function will need to formulate and execute Dynamic SQL. (That's probably why Oracle hasn't created a built-in MEDIAN function.)

Any other ideas, Oracle Gurus?

Dave
Sandy, Utah, USA @ 17:45 GMT, 10:45 Mountain Time
 
SantaMufasa:

thanks for your response.

do you have an example of how i can create the median function. please let me know.
 
Dima,

Could you please explain how to use the &quot;PERCENTILE_CONT function&quot; with a coded example?

Thanks.
 
BJ and Dima,

Are you suggesting that MSNG can use the &quot;PERCENTILE_CONT function&quot; with Oracle 8i? If so, for clarity and correctness could you please illustrate its use in MSNG's code, above?
 
All:

DeepDiverBecca is right. I had tried using PERCENTILE_CONT function but it does not recognize this since we are using Oracle 8i. I am still struggling to determine how to find MEDIAN using my query (as mentioned in my first question above).

It will be very nice and helpful if anyone could give me suggestion related to Oracle 8i.

Thanks for help.
 
I have never had a need to use this function but there is enough info on this function in the 8i documentation that msng can probably work it out. Unfortunately I posted a link to 9i - here is another link:


[sup]Beware of false knowledge; it is more dangerous than ignorance.[/sup][sup] ~George Bernard Shaw[/sup]
Consultant/Custom Forms & PL/SQL - Oracle 8.1.7 - Windows 2000
 
Ok, PERCENTILE_CONT really appeared only in 9i. In 8i you may use CUME_DIST

select deptno, ename, sal, CUME_DIST()
over (partition by deptno order by sal) percentile
from emp


Regards, Dima
 
MSNG (and my other Tek-Tips buddies),

This has been a fascinating activity for me. MEDIAN is a
function that has been glaringly absent from Oracle's repertoire
of built-in functions since Oracle's beginnings. Until now,
I've summarily dismissed MEDIAN as impossible without some
higher horsepower in Oracle than it has. I'm
not aware that even in Oracle 9i's (or &quot;10g's&quot;) &quot;Analytical&quot;
features there is a reliable, non-estimating-based MEDIAN
function.

So, I was rather surprised when a reliable MEDIAN function
(generalised for character, number, and date median values)
jumped off the screen at me in only 7 lines of PL/SQL procedure
code. Since I haven't before seen such a procedure in Oracle,
I took the time to &quot;productise&quot; it (below) and offer it to
my Tek-Tips buddies first.

In return, I encourage your comments, observations,
criticisms, and suggestions.
Code:
REM **************************************************************
REM David L. Hunt (file author) distributes this and other 
REM files/scripts for educational purposes only, to illustrate the 
REM use or application of various computing techniques. Neither the 
REM author nor Dasages, LLC, makes any warranty regarding this 
REM script's fitness for any industrial application or purpose nor is 
REM there any claim that this or any similarly-distributed scripts 
REM are error free or should be used for any purpose other than
REM illustration.
REM **************************************************************
REM About this script/file:
REM
REM NAME: &quot;Median.sql&quot; - PL/SQL code to create a function that
REM	determines the MEDIAN (i.e., middle) value among an ordered
REM	set of values. MEDIAN is the value in the ordered set at
REM	which we find as many values &quot;above&quot; that point as we find
REM	below that point within the ordered set. This function can
REM	find the MEDIAN among any datatypes (NUMBER, VARCHAR2, DATE,
REM	et cetera) except LONG datatypes.
REM
REM AUTHOR: Dave Hunt
REM	Co-principal, Dasages, LLC
REM	dave@dasages.com
REM	1-801-733-5333
REM
REM VERSION: 1.0
REM
REM USAGE PRIVILEGES:
REM	The author, Dave Hunt, reserves all rights to this script,
REM	including common copyrights. Under his reserved rights,
REM	the author grants usage privileges to the public. Anyone
REM	may use or distribute this script freely provided 1) there
REM	is no fee associated with its use and 2) All REMARKS in
REM	this script remain intact. Any publication of this
REM	script or inclusion in another publication or body of work
REM	occurs only by express, written permission from the author.
REM
REM MAINTENANCE HISTORY:
REM	06-NOV-2003: Original Code
REM
REM SYNTAX:
REM
REM	MEDIAN (table_list,expr,WHERE_condition | NULL)
REM
REM USAGE NOTES:
REM	The MEDIAN function may appear in any location in a SQL
REM	statement or a PL/SQL block where any other single-row or
REM	or group function may appear. Since MEDIAN includes a
REM	source-table list as its first argument, any existing
REM	table (example: DUAL) may drive use of MEDIAN.
REM
REM	Although MEDIAN's arguments may originate from any SQL or
REM	bind variable, usually the arguments take the form of
REM	quoted literals. Example:
REM
REM	MEDIAN ('S_EMP','SALARY','WHERE dept_id in(10,20,35)')
REM
REM	Argument notes:
REM
REM	table_list	one or more tables, views, or synonyms (of
REM			a table or view) necessary to derive the
REM			ordered set of values. If there is more 
REM			than one table, then use comma separators:
REM			MEDIAN('s_emp,s_dept',...).
REM
REM	expr		any valid Oracle character, number, or date
REM			expression which ordered set of values
REM			represents the pool from which the MEDIAN
REM			function derives its &quot;middle&quot; RETURN value.
REM
REM	WHERE_condition an optional WHERE-clause specification that
REM	| NULL		specifies which rows from the table_list
REM			tables shall contribute values to the 
REM			ordered set. If argument-2 expression values
REM			come from all table_list table rows, then
REM			there is no need for the WHERE_condition and
REM			an un-quoted named literal, NULL, must
REM			appear as the MEDIAN function's third argument:
REM
REM			SELECT MEDIAN('s_emp','salary',NULL) from DUAL;
REM			
REM			...results in the MEDIAN salary from the entire
REM			s_emp table.
REM
REM	Dealing with quoted values inside MEDIAN's quoted arguments:
REM		Oracle is notorious for its difficulty working with
REM		single quotes inside of single-quoted literal strings.
REM		Although MEDIAN can handle Oracle's standard
REM		&quot;two quotes result in one quote&quot; methodology within
REM		single-quoted literals, MEDIAN makes it much easier:
REM		Within a single-quoted MEDIAN-argument, when you want
REM		a quoted literal, you may instead use a &quot;back-quote&quot;
REM		(`) (usually above the <left-tab> of most PC keyoards).
REM		MEDIAN translates the back-quote to a regular single-
REM		quote, as needed, as part of its internal processing.
REM
REM	RETURN value: To accommodate character, numeric, and date
REM		median values, MEDIAN returns a VARCHAR2 representation
REM		of argument-2. MEDIAN can then be an argument itself 
REM		to a data-type-conversion function such as
REM		TO_NUMBER(MEDIAN...), TO_DATE(MEDIAN...), et cetera.
REM	
REM EXAMPLES:
REM
REM	select median ('s_emp','last_name',null) name from dual;
REM
REM	Median Name
REM	-----------
REM	Newman
REM
REM	select median ('s_emp','last_name',
REM	'where last_name between `A` and `H`') name from dual;
REM
REM	Median Name
REM	-----------
REM	Chang
REM
REM	select median('s_emp','start_date',null) HDate from dual;
REM
REM	Median Hire Date
REM	----------------
REM	27-FEB-91
REM
REM	select dept_id,
REM	  to_number(median('s_emp','salary','where dept_id = '||dept_id)) sal
REM	from s_emp
REM	group by dept_id;
REM
REM	                Median
REM	   DEPT_ID      Salary
REM	---------- -----------
REM	        10    1,450.00
REM	        31    1,400.00
REM	        32    1,490.00
REM	        33    1,515.00
REM	        34      795.00
REM	        35    1,450.00
REM	        41    1,200.00
REM	        42    1,200.00
REM	        43      850.00
REM	        44      800.00
REM	        45    1,100.00
REM	        50    1,550.00
REM
REM	Important notes for above example:
REM		1) TO_NUMBER converts MEDIAN's return value from varchar2.
REM		2) Since the &quot;GROUP BY&quot; clause considers salaries
REM			BY DEPARTMENT, MEDIAN's WHERE_condition must
REM			also limit its ordered-set of salaries to
REM			just those in the current department.
REM			Therefore, MEDIAN's argument-3 WHERE clause is a
REM			concatenation of 'where dept_id = ' to whatever
REM			is the current value of DEPT_ID for the current
REM			grouped row.
REM **************************************************************
REM	MEDIAN function definition:
REM **************************************************************
Code:
create or replace function median
		( tab_name	varchar2
		, target_exp	varchar2
		, where_cond	varchar2
		) return varchar2
is
	rec_cnt			number;
	sql_stmt		varchar2(4000);
	mid_point		number;
	median_value		varchar2(4000);
begin
	sql_stmt	:= 'select count(*) from '||tab_name||' '||where_cond;
	sql_stmt	:= translate(sql_stmt,chr(96),chr(39)); /* This command
		turns back-quotes (`) into single-quotes ('). */
	execute immediate sql_stmt into rec_cnt;
	mid_point	:= ceil(rec_cnt/2);
	sql_stmt	:= 
		'select xxx from (select rownum rn, xxx'||
			' from (select '||target_exp||' xxx from '||tab_name||
			' '||where_cond||' order by xxx)) where rn = '||
			mid_point;
	sql_stmt	:= translate(sql_stmt,chr(96),chr(39)); /* This command
		turns back-quotes (`) into single-quotes ('). */
	execute immediate sql_stmt into median_value;
	return median_value;
end;
/
REM **************************************************************
REM	End of MEDIAN function definition
REM **************************************************************
Here are the pertinent table values with which I tested the code (First, in SALARY order to confirm MEDIAN(...SALARY...):

select start_date, last_name, salary from s_emp order by salary;
Code:
Start
Date      LAST_NAME            SALARY
--------- --------------- -----------
21-JUL-91 Newman               750.00
06-AUG-91 Patel                795.00
17-OCT-90 Patel                795.00
30-NOV-90 Chang                800.00
26-MAY-91 Markarian            850.00
17-MAR-91 Dancs                860.00
08-MAR-90 Smith                940.00
07-APR-90 Biri               1,100.00
09-MAY-91 Schwartz           1,100.00
18-JAN-91 Urguhart           1,200.00
09-FEB-91 Nozaki             1,200.00
14-MAY-90 Menchu             1,250.00
09-FEB-92 Catchpole          1,300.00 <-- Median Salary
27-FEB-91 Havel              1,307.00
17-JUN-91 Nagayama           1,400.00
07-FEB-92 Maduro             1,400.00
14-MAY-90 Magee              1,400.00
08-MAR-90 Ngao               1,450.00
09-OCT-91 Dumas              1,450.00
07-APR-90 Quick-To-See       1,450.00
18-JAN-92 Giljum             1,490.00
18-FEB-91 Sedeghi            1,515.00
22-JAN-92 Nguyen             1,525.00
04-MAR-90 Ropeburn           1,550.00
03-MAR-90 Velasquez         11,110.00

25 rows selected.

select median('s_emp','salary',null) from dual;

1300

1 row selected.
Second, in LAST_NAME order to confirm MEDIAN(...LAST_NAME...):
Code:
select start_date, last_name, salary from s_emp order by last_name;

Start
Date      LAST_NAME            SALARY
--------- --------------- -----------
07-APR-90 Biri               1,100.00
09-FEB-92 Catchpole          1,300.00
30-NOV-90 Chang                800.00
17-MAR-91 Dancs                860.00
09-OCT-91 Dumas              1,450.00
18-JAN-92 Giljum             1,490.00
27-FEB-91 Havel              1,307.00
07-FEB-92 Maduro             1,400.00
14-MAY-90 Magee              1,400.00
26-MAY-91 Markarian            850.00
14-MAY-90 Menchu             1,250.00
17-JUN-91 Nagayama           1,400.00
21-JUL-91 Newman               750.00 <-- Median Name
08-MAR-90 Ngao               1,450.00
22-JAN-92 Nguyen             1,525.00
09-FEB-91 Nozaki             1,200.00
06-AUG-91 Patel                795.00
17-OCT-90 Patel                795.00
07-APR-90 Quick-To-See       1,450.00
04-MAR-90 Ropeburn           1,550.00
09-MAY-91 Schwartz           1,100.00
18-FEB-91 Sedeghi            1,515.00
08-MAR-90 Smith                940.00
18-JAN-91 Urguhart           1,200.00
03-MAR-90 Velasquez         11,110.00

25 rows selected.

select median('s_emp','last_name',null) from dual;

Newman

1 row selected.
Lastly, in START_DATE order to confirm MEDIAN(...START_DATE...):
Code:
select start_date, last_name, salary from s_emp order by start_date;

Start
Date      LAST_NAME            SALARY
--------- --------------- -----------
04-MAR-90 Ropeburn           1,550.00
08-MAR-90 Ngao               1,450.00
08-MAR-90 Smith                940.00
07-APR-90 Quick-To-See       1,450.00
07-APR-90 Biri               1,100.00
14-MAY-90 Menchu             1,250.00
14-MAY-90 Magee              1,400.00
17-OCT-90 Patel                795.00
30-NOV-90 Chang                800.00
18-JAN-91 Urguhart           1,200.00
09-FEB-91 Nozaki             1,200.00
18-FEB-91 Sedeghi            1,515.00
27-FEB-91 Havel              1,307.00 <-- Median START_DATE
17-MAR-91 Dancs                860.00
09-MAY-91 Schwartz           1,100.00
26-MAY-91 Markarian            850.00
17-JUN-91 Nagayama           1,400.00
21-JUL-91 Newman               750.00
06-AUG-91 Patel                795.00
09-OCT-91 Dumas              1,450.00
18-JAN-92 Giljum             1,490.00
22-JAN-92 Nguyen             1,525.00
07-FEB-92 Maduro             1,400.00
09-FEB-92 Catchpole          1,300.00
03-MAR-90 Velasquez         11,110.00

25 rows selected.

select median('s_emp','start_date',null) from dual;

27-FEB-91

1 row selected.
Now, MSNG, Following is my &quot;hand-coded&quot; (but not Oracle-checked)
revision of your original code with the addition of my MEDIAN
function. (Since I don't have your A, B, and C tables, their
definitions, or their data, I really could not do any more than
a &quot;desk check&quot; of the code. Since you have a moderately complex
combination of &quot;WHERE&quot; conditions and &quot;GROUP BY&quot; clauses that
significantly affect the pool of &quot;ordered-set&quot; TOTAL_HOURS for
each employee, and must, therefore, translate into identical
filtering for the MEDIAN function, I've done the best I can
with my argument-3 WHERE_condition formulation. As a result,
you may need to &quot;tweak&quot; argument-3. Once the WHERE
condition is correct, the MEDIAN return value will be correct:
Code:
SELECT	A.HRID
	,LTRIM(RTRIM(A.LAST_NAME)) ||
	   ', '||LTRIM(RTRIM(A.FIRST_NAME)) || ' '||
	   LTRIM(RTRIM(A.MIDDLE_NAME)) AS EMPLOYEE_NAME
	,NVL(SUM(B.TOTAL_HOURS),0) AS TOTAL_HOURS
	,TO_NUMBER(MEDIAN('A,B,E','B.TOTAL_HOURS',
	 	'WHERE	A.HRID = B.HRID(+) '||
		'AND	A.HRID = E.HRID '||
		'AND	B.ATTENDANCE_START_DATE >= TO_DATE(FROM_DATE,`MM/DD/YYYY`) '||
		'AND	B.ATTENDANCE_START_DATE <= TO_DATE(T_DATE,`MM/DD/YYYY`) '||
		'AND	LTRIM(RTRIM(A.LAST_NAME)) = `'||LTRIM(RTRIM(A.LAST_NAME))||
		'` AND	LTRIM(RTRIM(A.FIRST_NAME)) = `'||LTRIM(RTRIM(A.FIRST_NAME))||
		'` AND	LTRIM(RTRIM(A.MIDDLE_NAME)) = `'||LTRIM(RTRIM(A.MIDDLE_NAME))||
		'` AND	E.SUPERVISOR_HRID = '||E.SUPERVISOR_HRID||
		'AND	E.DISTRICT_HRID = '||E.DISTRICT_HRID||
		'AND	E.DIVISION_HRID = '||E.DIVISION_HRID||'`')) Middle
	,COUNT(B.HRID) AS TOTAL_COUNT
	,E.SUPERVISOR_HRID
	,A.MRG
	,E.MANAGER_HRID
	,E.DISTRICT_HRID
	,E.DIVISION_HRID
FROM  	A,B,E
WHERE	A.HRID = B.HRID(+)
AND	A.HRID = E.HRID
AND	B.ATTENDANCE_START_DATE >= TO_DATE(FROM_DATE,'MM/DD/YYYY')
AND	B.ATTENDANCE_START_DATE <= TO_DATE(T_DATE,'MM/DD/YYYY')
GROUP BY A.HRID
	,LTRIM(RTRIM(A.LAST_NAME)) || ', '||LTRIM(RTRIM(A.FIRST_NAME)) || ' '||LTRIM(RTRIM(A.MIDDLE_NAME)),
	,E.SUPERVISOR_HRID
	,A.MRG
	,E.MANAGER_HRID
	,E.DISTRICT_HRID
	,E.DIVISION_HRID
ORDER BY EMPLOYEE_NAME
/

I know that the MEDIAN function works fine when the WHERE_condition is well done.
So, cross your fingers and say your prayers that I've properly constructed the
WHERE_condition from your out WHERE and GROUP BY clauses.

Let me know how it works for you, (I believe I'll go to bed for a couple of hours before tomorrow/today starts all over again.)

Cheers,

Dave
Sandy, Utah, USA @ 13:53 GMT, 06:53 Mountain Time
 
Santa,

Wow ! Your post was just like Christmas in November for me. I've been needing a good MEDIAN function in Oracle FOR YEARS ! But I've had to do without and make excuses to my users. NO MORE EXCUSES !!! If you don't mind my saying, you must be obsessive/compulsive or something to stay up all night to do this.

I believe I speak for all Oracle Tek-Tipsters when I say, &quot;Many Thanks&quot;. I've tried the code out and it works great. (Here comes a purple star your way.)

Bec
 
SantaMufasa:

Thanks a ton for all your help and efforts.
I really appreciate your hard work.

I was looking at your function MEDIAN.

When there is an odd number of items in the list, then the middle item is the median.

However, if there is an even number of items in the list, then the median is the average of the two middle items.

But in your case, it will only take into consideration one item no matter if the list is even or odd.
mid_point := ceil(rec_cnt/2);

Let me know what you think about this.
 
(Thanks, Ponerse !),
MSNG,

You are correct: the &quot;ceil(rec_cnt/2)&quot; opts for the row_value just beyond the middle in the case of even numbers of rows. Since the RETURN data_type is varchar2, I can adjust the code to return both &quot;middles&quot;, separated by a slash, or surrounded in brackets: &quot;[middle_1][middle_2]&quot;, but that might get a bit dicey if it's inside a TO_NUMBER or TO_DATE function. How would you like me to handle it?

Dave
Sandy, Utah, USA @ 18:01 GMT, 11:01 Mountain Time
 
SantaMufasa:

I will just go with whatever way you wanna implement.

Also, I am extremely grateful for your help. It is nice to know that we have such talented people on this board.
 
Ms. Ng,

I will try to make the modification as soon as I can, but your contribution is to do some homework for me: I want to resolve this based upon solid statistical theory. Your job is to post (here) a quote from a statistical text that gives the solid statistical basis for a resolution when there is an even number in the set population.

Thanks,

[santa]Mufasa
(aka Dave of Sandy, Utah, USA @ 19:27 (06Jan04) GMT, 12:27 (06Jan04) Mountain Time)
 
Dave's in general excellent function has a minor defect: it doesn't provide &quot;read-consistency&quot;. In fact the count(*) may be calculated on another source data than the value. If some records are changed between those commands the whole result may be incorrect.

I'd like to suggest an idea of improvement:

select * from (select row_number() over (order by sal) rn
, emp.* from emp)
where rn = (select round(count(*)/2) from emp)

This way has a number of advantages over Dave's one:
1. It does provide correct result regardless on the intensity of data changing.
2. It allows selecting the whole median row, not only median value.
3. It uses pure sql

Though its main drawback is that this is just an idea, not a &quot;reusable implementation&quot; :)

Regards, Dima
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top