Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Explanation of subqueries 1

Status
Not open for further replies.

nc297

Programmer
Apr 7, 2010
162
US
Hi could someone explain to me this sub query. The first select station is what I don't really understand why set it up this way?

Select n.doc, ISNULL(PendinDDS,0) as PendinDDS
from Nation n
Left Join

(Select t.Doc, ISNULL(Count(T.cossn), )) AS PendinDDS
From (select max(SeqNum) as SeqNum, clms
from T2Nation
group by clms)a
Left Join
T2Nation t on a.SeqNum = t.SeqNum and a.clms=t.clms
where dib_mvt = 'r' and t.SeqNum is not null
group by doc)b
on n.doc = b.doc

 
There seems to be errors in the way it's written. Can you correct them and re-post?

I suggest to start from the very inner subquery. Try running it stand alone and then work your way up.

PluralSight Learning Library
 
I'm sorry here's another one.

Truncate Table T2Counts
insert t2counts

Select n.doc, ISNULL(PendinDDS, 0) as PendinDDS,
ISNULL(PendDDSover300, 0) as PendDDSover300

From nation n
LEFT JOIN

--pending in DDS
(SELECT t.Doc, ISNULL(COUNT(t.clmns),0) AS PendinDDS
FROM (select max(dib_mvt_seq_num) as dib_mvt_seq_num, clms
from t2dibpend
group by clmns) a
left join
t2dibpend t on a.dib_mvt_seq_num=t.dib_mvt_seq_num and a.clmns=t.clmns
WHERE Dib_Mvt_Typ = 'R' and
(LOREC4 like 'R%' or LOREC4 like 'S%' or LOREC4 like 'V%') AND
t.clmns not IN (SELECT clms FROM seb2a21.specnew.dbo.people WHERE CompleteDt IS NULL)
and t.dib_mvt_seq_num is not null
group by doc)b
on n.doc = b.doc

left join


(Select t.doc, ISNULL(count(t.clmns),0) as PendDDSover300
from
(select max(dib_mvt_seq_num) as dib_mvt_seq_num, clmns
from t2dibpend t
group by clmns) f
left join
t2dibpend t on f.dib_mvt_seq_num=t.dib_mvt_seq_num and f.clmns=t.clmns
where DIB_MVT_TYP='R' and (LOREC4 like 'R%' or LOREC4 like 'S%' or LOREC4 like 'V%')
and app_rcpdt < dateadd(dd,-300,getdate()) and t.clmns not in (select clmns from seb2a21.specnew.dbo.people where completedt is null)
group by t.doc)g
on n.doc = g.doc
 
This query makes heavy use of derived tables. Derived tables are not the same thing as sub-queries. Let me explain.

A derived table is when you have a query that is acting like a table in your database. It's not really a table, but from the perspective of the larger query, it behaves that way.

[tt][blue]

From [!]([/!]
[green]select max(dib_mvt_seq_num) as dib_mvt_seq_num,
clms
from t2dibpend
group by clmns[/green]
[!]) a[/!]
left join t2dibpend t
on [!]a[/!].dib_mvt_seq_num=t.dib_mvt_seq_num
and [!]a[/!].clmns=t.clmns

[/blue][/tt]

Look at the code snippet shown above. The query in green is just that. You can run the green part all by itself and it will return results. The outer query can use the derived table as though it were an actual table.

The parts in red are the required syntax to make a derived table. You need to put parenthesis around the query and you need to give the query a name (called an alias). In this case, the derived table is named [!]a[/!]. Whenever the outside query references the inner query, you must reference it by the alias, just like you see in the on clause.

Does this make sense to you now?

I know you didn't specifically ask for help optimizing the query, but there are several changes I would suggest that would likely speed up the query and would certainly make it easier to understand. If you are interested in my other thoughts about this query, let me know.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
WOW thanks for the explanation!! YES I would like to learn more. Please tell me your changes you would make to the query.
 
ok. Let's start with this:

[tt][blue]OREC4 like 'R%' or LOREC4 like 'S%' or LOREC4 like 'V%'[/blue][/tt]

Those 3 conditions can be reduced to this:

[tt][blue]OREC4 like '[RSV]%'[/blue][/tt]

Basically, you only want rows where OREC4 starts with the letters R, S or V. My substitution like clause does exactly the same thing. In the like pattern, I added square brackets, which is necessary. The square brackets are basically saying, "A Single Character". So, you can read my like pattern as: The first character must be R, S or V.

I just checked on a table in my database, and from a performance perspective, there is no difference between my suggested like pattern and your 3 individual like patterns. The real difference (and the reason I encourage you to change it) is code readability. Glancing at my code, I can easily determine what is being searched for. With your original like pattern, I need to read 3 different criteria to see it. Not a big difference, but an important one (in my opinion).

In some respects, you can view this as "clutter". You have additional code that can be rewritten in a more concise manner. You can simply say, "It's less code", but I don't like that saying because less code does not always mean better code. In this situation, it hold true for me, though.

The next thing I would change (well... maybe it would have been the first [smile]), would be to the layout of the code. I find the query difficult to read because of the "lack of" indentation. Not everyone will agree with how I format code, but it works for me. I would format this code like this:

Code:
Truncate Table T2Counts

insert t2counts
Select n.doc, 
       ISNULL(PendinDDS, 0) as PendinDDS,
       ISNULL(PendDDSover300, 0) as PendDDSover300
From   nation n
       LEFT JOIN
         --pending in DDS
         (
         SELECT t.Doc, 
                ISNULL(COUNT(t.clmns),0) AS PendinDDS
         FROM   (
                  select max(dib_mvt_seq_num) as dib_mvt_seq_num, 
                         clms 
                  from   t2dibpend 
                  group by clmns) a
                left join t2dibpend t 
                  on  a.dib_mvt_seq_num=t.dib_mvt_seq_num 
                  and a.clmns=t.clmns
         WHERE  Dib_Mvt_Typ = 'R' 
                and (LOREC4 like 'R%' or LOREC4 like 'S%' or LOREC4 like 'V%') 
                AND t.clmns not IN 
                  (
                  SELECT clms 
                  FROM   seb2a21.specnew.dbo.people 
                  WHERE  CompleteDt IS NULL
                  )
                and t.dib_mvt_seq_num is not null
         group by doc)b
         on n.doc = b.doc
       left join
         (
         Select t.doc, 
                ISNULL(count(t.clmns),0) as PendDDSover300
         from   (
                  select max(dib_mvt_seq_num) as dib_mvt_seq_num, 
                         clmns 
                  from   t2dibpend t
                  group by clmns) f
                left join t2dibpend t 
                  on  f.dib_mvt_seq_num=t.dib_mvt_seq_num 
                  and f.clmns=t.clmns
         where  DIB_MVT_TYP='R' 
                and (LOREC4 like 'R%' or LOREC4 like 'S%' or LOREC4 like 'V%')
                and app_rcpdt < dateadd(dd,-300,getdate()) 
                and t.clmns not in 
                  (
                  select clmns 
                  from   seb2a21.specnew.dbo.people 
                  where  completedt is null)
         group by t.doc) g
         on n.doc = g.doc

Now that I have the code formatted the way I like to see (and can more easily understand it), I noticed something very significant here. Specifically, from the outer most query, you are left joining to two rather significant queries. The 2 left joined queries are EXTREMELY similar.

The derived table immediately following the comment "--pending in DDS" has an additional where clause for "and t.dib_mvt_seq_num is not null" and the last query has an additional where clause for "and app_rcpdt < dateadd(dd,-300,getdate())".

Given the similarities between the queryies, we can combine the 2 queries in to one. This would almost certainly cause the query to execute faster.

The following query should return identical results, but execute faster.

Code:
Truncate Table T2Counts

insert t2counts
Select n.doc, 
       ISNULL(PendinDDS, 0) as PendinDDS,
       ISNULL(PendDDSover300, 0) as PendDDSover300
From   nation n
       LEFT JOIN
         --pending in DDS
         (
         SELECT t.Doc, 
                ISNULL(COUNT(t.clmns),0) AS PendinDDS[!],[/!]
                [!]ISNULL(COUNT(Case When app_rcpdt < dateadd(dd,-300,getdate()) Then 1 End), 0) As PendDDSover300[/!]
         FROM   (
                  select max(dib_mvt_seq_num) as dib_mvt_seq_num, 
                         clms 
                  from   t2dibpend 
                  group by clmns) a
                left join t2dibpend t 
                  on  a.dib_mvt_seq_num=t.dib_mvt_seq_num 
                  and a.clmns=t.clmns
         WHERE  Dib_Mvt_Typ = 'R' 
                and LOREC4 like '[RSV]%' 
                AND t.clmns not IN 
                  (
                  SELECT clms 
                  FROM   seb2a21.specnew.dbo.people 
                  WHERE  CompleteDt IS NULL
                  )
         group by doc)b
         on n.doc = b.doc

For the remainder of this discussion, I will assume that this query works... in that it returns the correct results. You'll note that I previously mentioned that the first derived table had a where clause on "and t.dib_mvt_seq_num is not null". I think this criteria is not necessary. Let me explain.

Take a look at the inner most derived table (with the alias a). This derived table only returns 2 columns , clms and Max(dib_mvt_seq_num). This derived table is then left joined to the t2dibpend table. Then, there is a where clause with things like Dib_Mvt_Type = 'R'.

In this query, the left table is the derived table a, and the right table is the real table t2dibpend. When you have a left join with where clause criteria on columns in the right table, you effectively have an inner join. As such, I would recommend that you change the query to an inner join instead. Inner joins are a little more efficient (from a performance perspective) and left joins.

Now... you originally had a where clause on "and t.dib_mvt_seq_num is not null", but this was also used in the join clause between your derived table and the real table. Since your query was already behaving like an inner join, it's not possible for there to be null's returned by this query for that column. This situation is not likely to cause any performance issues, but including that where clause criteria has absolutely no effect on the data returned by the query, so... again... it's just clutter.

Lastly, you should also consider the outer most query. It is left joining to the outer most derived table (b). I can't really say, but I suspect this could also be changed to an inner join. Technically, I cannot say this for sure, but I suspect you wouldn't want any rows in the t2counts table where PendinDDS and PendDDSover300 are both 0's. With the left join, it is technically possible to get the 0's. If you inner join the nation table with the derived table, you will not get any rows with both columns being 0. I'm guessing this is what you want, but cannot say for sure.

You see... there are some SQL Programmers that are absolutely in love with the LEFT JOIN. They'll often use it instead of an inner join because they don't want to think too hard about their data. Unfortunately, the left joins end up hurting performance. Basically, you should only use LEFT JOIN when you need to.

I know I kinda threw a lot of things at you, for which I apologize. If there is any part of my explanation that doesn't make sense, let me know and I will explain further.



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
No this is perfect George this is exactly what I needed. I didn't know how to add the two queries together so I did them separately. When would I use a left join? I do tend to use that all of the time.

Thanks for the lesson regarding derived tables and formatting the query you are correct it is easily readable. I will begin to set mine up as such for now on.

I will try your query tomorrow at work and let you know how it goes. Thanks again.
 
THANKS George it works fine and much faster too. I have another one if you don't mind...

Truncate table psccounts

insert psccounts
select t.doc,
t.clms,
t.cos,
org_id

from
(select max(juris_mvt_seq_num) as juris_mvt_seq_num, cos,
clms, org_id, juris_mvt_typ
From t2pendjuris j
group by cos, clms,org_id, juris_mvt_typ)b
left join
t2dibpend t on b.cos=t.cos
where t.cos not in (select claim from
seb2a21.specnew.dbo.people
where completedt is null )and juris_mvt_typ='r'
and org_id like'P%'
group by t.doc, t.cos, t.clms, org_id
 
You see... there are some SQL Programmers that are absolutely in love with the LEFT JOIN. They'll often use it instead of an inner join because they don't want to think too hard about their data. Unfortunately, the left joins end up hurting performance. Basically, you should only use LEFT JOIN when you need to.
i know this isn't directed at nc297, but i just wanted to say that i, too, am annoyed at all the "sql programmers" who use LEFT JOINs without knowing what they're doing

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
I know and you see I used it again in the above query. Please explain when to use it. Thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top