Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

subquery 1

Status
Not open for further replies.

DonaZ

Technical User
Oct 10, 2007
41
US
The two queries below work; however, is there a way to combine the two queries?

proc sql;
select term, count(id)as Applicants ,avg(GPA)as AvgGPA
from
SAMPLE
where level="H"
group by term;
quit;

proc sql;
select term,
count(id)as Admits, /*this gives you the Admit count*/
avg(gpa) as AvgAdmitGPA /*this gives you the average Admit gpa*/
from
SAMPLE
where level="H" and (code like "A%" or
code like "X%")
group by term;
quit;

Below is my attempt to combine the two queries; however, it does not work. Any Suggestions? What am I doing wrong?


proc sql;
select term, a.Applicants, a.AvgGPA, b.admits, b.AvgAdmitGPA
from

(select count(id)as Applicants ,avg(GPA)as AvgGPA
from
SAMPLE
Where level="H"
group by term
order by applicants, avgGPA)as a,

(select term,
count(id)as Admits,
avg(gpa) as AvgAdmitGPA
from
SAMPLE
where level="H" and (code like "A%" or
code like "X%")
group by term
order by admits, AvgAdmitGPA)as b
group by term;
quit;
 
Here's the message.
The SAS System stopped processing this step because of errors.
 
sorry, this was the message:

NOTE: The execution of this query involves performing one or more Cartesian product joins that
can not be optimized.
NOTE: Table WORK.MARKED4 created, with 25 rows and 5 columns.

There should only be 5 rows not 25 rows.

I should have a comma after the first subquery not a semi-colon.
 
You've not told it what to join the queries together on, that's why you get a cartesian join (If you don't know, a cartesian join is where every record on table A gets joined to every record on table B, can be very dangerous with big tables).
My advice (I'm afraid I don't have time to go through all the code to work it all out) is to go back to the original queries and add in the "CREATE TABLE x AS" statements to the beginning of the queries so the data for each query gets written out to tables.
The write a final query to join them all together. My guess is that the original queries don't seem to create output which includes the key for joining them together.

Finally, once you've got this all working, you can replace the table names in the final query, with the queries that created the tables, enclosed in brackets, then get rid of the earlier queries.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
The data is in one table Not two.
 
The source data is.
But you write 1 query to do one part and write that data out to a table.
Then you write out the data from the seoncd query to another table etc.
Then you write a final query to join these tables together.

Logically, when performing subqueries against the same table, you need to think of it as if it is multiple tables containing the same data. This is how SQL treats it. When you do the joins you give each subquery a different table alias to distinguish between them.
EG.
Code:
proc sql;
  create table AX as
  select A.ID
        ,A.data
  from file1  A
  where cat = 'FIRST'
  ;

  create table BX as
  select A.ID
        ,A.data  as data2
  from file1  A
  where cat = 'SECOND'
  ;

  create table X  as
  select J.ID
        ,J.DATA
        ,K.data2
  from AX    J
     left join
       BX    K
       on  J.ID = K.ID
   ;
quit;

is the same as
Code:
proc sql;
  create table X  as
  select J.ID
        ,J.DATA
        ,K.data2
  from (select A.ID
              ,A.data
        from file1  A
        where cat = 'FIRST')    J
     left join
       (create table BX as
        select A.ID
              ,A.data  as data2
        from file1  A
        where cat = 'SECOND')    K
       on  J.ID = K.ID
   ;
quit;

Note, you have to select the field you are joining the 2 logical tables together on, even if they are both just subqueries on the same table

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Chris,

I very new to SQL and SAS and am trying to understand the rules.

How do I know when I must use joins like your example above?


For example this query worked without using joins.
Will I run into problems later on?
See below:

proc sql; create table WANT as select term,count(id)as appl,avg(gpa)as avggpa,
count(Aid)as adm,avg(Agpa)as avgadmgpa from
(select*,case when code like'A%'!code like'X%' then
1 else . end as Aid,
case when calculated Aid then gpa else . end as Agpa from HAVE where level='H')
group by term order by term;
quit;

*case when substr(code,1,1)in('A','X')then 1 else .
end as Aid;

Thank you again for taking the time to explain.

Dona
 
It looks like there's only 1 part to the from statement there, so only one logical table, therefore no need to do a join.
My advice for someone new to SQL is to keep things simple. Subqueries complicate things and make it harder to piece together what is happening. I would start by writing individual queries and leave subqueries alone. Personally, I rarely use subqueries myself. They can improve performance if used correctly, but most of the time it's probably best to just get comfortable with SQL first.
With subqueries, the subquery is treated as if it is a separate table, the subquery is resolved first, data put into temporary storage, then accessed as a table. The parent query then needs to know what to use to join the 2 tables together.
I hope this helps.
PS, I would also recommend careful use of indenting with SQL, it makes it alot easier to read, and alot easier to spot errors and typos.
Below is your query above laid out in the standard I follow. There are various standards different people use, I like to have my commas in front of each line for instance, others put them at the end. The layout though is pretty much the same in all standards I've seen.
Code:
proc sql;
  create table WANT as
  select term
        ,count(id)   as appl
        ,avg(gpa)    as avggpa
        ,count(Aid)  as adm
        ,avg(Agpa)   as avgadmgpa
 from ( select *
              ,case 
                  when code like 'A%' ! code like'X%' then 1
                  else . 
               end as Aid
              ,case
                  when calculated Aid then gpa 
                  else . 
               end as Agpa 
         from HAVE 
         where level='H'
       )
    group by term 
    order by term
    ;
quit;

I don't understand this line though:-
Code:
  when calculated Aid then gpa
can you explain that?

Oh, and you can probably do change this
Code:
when code like 'A%' ! code like'X%' then 1
to this
Code:
when code in:('A','X') then 1
The colon basically means "starts with" and can be used with = or in for comparison checking.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Chris,

In response to your question above:

I don't understand this line though:-
CODE
when calculated Aid then gpa

Correct me if I'm wrong?
The Aid is referring to

select *
,case
when code like 'A%' ! code like'X%' then 1
else .
end as Aid

The case statement is saying only calculate those GPA as identified as Aid.


I will take your advice about indenting and writing simple queries. I was concerned about performance. I guess I'm trying to do too much.

Thank you again for your assistance.

Dona
 
No worries. I've always been lucky when learning new technologies I've always had experts on hand to check over what I've done and then show me how to improve it.
My programming process has always been to build it up in stages, get the 1 thing working at a time. I would literally write those subqueries out with multiple steps and tables, then maybe go back afterwards and see how to shrink it down, rather than trying to write out the most concise code first time.
To put it succinctly - Get it right, then make it good.

Mostly I don't tend to use subqueries at all, I don't think there's much performance improvement in it.

And it was the "calculated" part of that line I didn't recognise, I see now that it's to let the process know it's a calculated field. I've never seen that before. You learn something new every day eh?
Good luck.

Chris
Business Analyst, Code Monkey, Data Wrangler.
SAS Guru.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top