Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to join one to many accurately?

Status
Not open for further replies.

hanchilicious

Programmer
May 23, 2002
156
0
0
GB
Hi,

How do I join these two tables accurately?

I have three records in the FORECAST table:
Code:
Customer  Product  Wk04  Wk05  Wk06
-----------------------------------
TANDY     PLY001      4     3     2
ULYSSES   FEV999      5     2     1
NEWGUY    RUF101                  9
and I want to join them to the correct corresponding records in the SALES table:
Code:
Customer  Product  Area  Agent  Wk01  Wk02  Wk03
------------------------------------------------
TANDY     PLY001   GR31  10001    10    11    12
TANDY     PLY001   SP20  12302     0     0     0
ULYSSES   FEV999   BG27   9999     0     0     0
ULYSSES   FEV999   BG27  88882    15    15    15
OLDGUY    OLD001   ED11  10101     1     1     1
Here're the rules:
(a) I never want to assign any records from FORECAST to SALES where the Area = SP20.

(b) I have to support a FULL OUTER JOIN as there may be independent records in either FORECAST or SALES which need to be represented.

(c) In the case where there are multiple Agents per Customer and Product, only one of the Agents will have values in the SALES.WK* columns, and that is the row that should have the FORECAST row linked to it.

The end result should look something like this:
Code:
Customer  Product  Area  Agent  Wk01  Wk02  Wk03  Wk04  Wk05  Wk06
------------------------------------------------------------------
TANDY     PLY001   GR31  10001    10    11    12     4     3     2
TANDY     PLY001   SP20  12302     0     0     0  null  null  null
ULYSSES   FEV999   BG27   9999     0     0     0  null  null  null
ULYSSES   FEV999   BG27  88882    15    15    15     5     2     1
NEWGUY    RUF101   null  null   null  null  null  null  null     9
OLDGUY    OLD001   ED11  10101     1     1     1  null  null  null

I don't seem to be able to support all three rules at the same time without losing some records from the forecast table...

Can someone point me in the right direction?

Thanks a lot!
 
Instead of showing you code that can do this, why don't you show us what you have tried so far.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Hi George,

Well, I've tried a few approaches. I've tried like this:
Code:
FULL OUTER JOIN FORECAST
ON  (SALES.CUSTOMER = FORECAST.CUSTOMER) 
AND (SALES.PRODUCT = FORECAST.PRODUCT)
AND (SALES.AGENT = 
       (select agent from sales 
        where ts is not null and ts <> 0
        and area <> 'SP20'
        group by apple_hq_id,pl6_ppnid))
-- *ts is a total wk to date column capturing all sales.wks
Using this method, I am missing forecast records.

Also, I've tried this:
Code:
FROM TSR_Actuals_Split
FULL OUTER JOIN 
(select forecast.customer,
	forecast.product,
	forecast.area, --a function derives and populates this field from SALES values
	avg(wk01) as week01,
	avg(wk02) as week02,
	avg(wk03) as week03,
	avg(wk04) as week04,
	avg(wk05) as week05,
	avg(wk06) as week06,
	avg(wk07) as week07,
	avg(wk08) as week08,
	avg(wk09) as week09,
	avg(wk10) as week10,
	avg(wk11) as week11,
	avg(wk12) as week12,
	avg(wk13) as week13,
	avg(wk14) as week14,
	(select agent from sales 
        where ts is not null and ts <> 0
        and area <> 'SP20'
        group by apple_hq_id,pl6_ppnid) as fake_agent
	from forecast
	  full outer join sales on
          sales.product  = forecast.product and
          sales.customer = forecast.customer
	group by 
          forecast.customer,
          forecast.product,
          forecast.area) tableB
on sales.product = tableB.product and
sales.customer = tableB.customer and
sales.agent = tableB.fake_agent
In the second way, I am trying to insert the conclusion of the join into a third table, which is ultimately the goal. But using this technique, I miss some records in both tables - most notably where there is more than one agent in the SALES table.

I'm not sure how wide I am of the mark, being quite new to the game.
 
NB: "TSR_Actuals_Split" and "SALES" are the same table in the example.
 
Alright. I will show you some code. In the code I show, I am taking your original data (from the original question), and dummying it up in to table variables. Then, the query I show uses the data from the table variables to present the output.

Unfortunately, I am having a problem with this:
(c) In the case where there are multiple Agents per Customer and Product, only one of the Agents will have values in the SALES.WK* columns, and that is the row that should have the FORECAST row linked to it.

I think I handled it appropriately by converting 0's to NULL using the NullIf function. Anyway, you can copy/paste this to query analyzer and run it. If you are satisifed with it, then you can change the table names in the query so that it is using tables in your database. Let us know how it turns out.

Code:
[green]-- Dummy Data[/green]
Declare @Forecast Table(Customer varchar(20), Product varchar(20), Wk04 int, Wk05 int, Wk06 int)
-----------------------------------
Insert Into @Forecast Values('TANDY'     ,'PLY001',      4     ,3     ,2)
Insert Into @Forecast Values('ULYSSES'   ,'FEV999',      5     ,2     ,1)
Insert Into @Forecast Values('NEWGUY'    ,'RUF101',      null  ,null  ,9)

Declare @Sales Table(Customer varchar(20), Product varchar(20), Area varchar(20),Agent int, Wk01 int, Wk02 int,  Wk03 int)
------------------------------------------------
Insert Into @Sales Values('TANDY'     ,'PLY001'   ,'GR31',  10001,    10,    11,    12)
Insert Into @Sales Values('TANDY'     ,'PLY001'   ,'SP20',  12302,     0,     0,     0)
Insert Into @Sales Values('ULYSSES'   ,'FEV999'   ,'BG27',   9999,     0,     0,     0)
Insert Into @Sales Values('ULYSSES'   ,'FEV999'   ,'BG27',  88882,    15,    15,    15)
Insert Into @Sales Values('OLDGUY'    ,'OLD001'   ,'ED11',  10101,     1,     1,     1)

[green]-- Query starts here[/green]
Select Coalesce(F.Customer, S.Customer), 
       Coalesce(F.Product, S.Product),
       S.Area,
       S.Agent,
       NullIf(S.WK01, 0) As WK01,
       NullIf(S.WK02, 0) As WK02,
       NullIf(S.WK03, 0) As WK03,
       Case When S.Area = 'SP20' Then NULL Else F.WK04 End As WK04,
       Case When S.Area = 'SP20' Then NULL Else F.WK05 End As WK05,
       Case When S.Area = 'SP20' Then NULL Else F.WK06 End As WK06
From   @Forecast F
       Full Outer Join @Sales S
         On  F.Customer = S.Customer
         And F.Product = S.Product


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Hey there George,

Thank you very much for taking the time out to write some code up. But I don't think this will achieve what I'm after. You see, it's not the select that is the problem, it's the joining of SALES to FORECAST.

Let me clear up the obscurity around point (c) and it will probably help clear things up.

For the ULYSSES customer, you see how there are two SALES rows, but only one FORECAST. Both SALES rows have Areas <> SP20, but only agent 88882 has any values in the Wk columns. That's the row that the FORECAST row should link to, like in the 3rd example in the first post. The other row, combined with the SP20 rows should be ignored. I know I should be handling this in the join conditions, I'm just not sure how.

For the nulls, I never want to manually control nulls. They just appear where there is only data in one or the other of the tables.
 
Using the same dummy data (from above)...

Code:
Select Coalesce(F.Customer, S.Customer), 
       Coalesce(F.Product, S.Product),
       S.Area,
       S.Agent,
       S.WK01,
       S.WK02,
       S.WK03,
       Case When S.Area = 'SP20' Then NULL Else F.WK04 End As WK04,
       Case When S.Area = 'SP20' Then NULL Else F.WK05 End As WK05,
       Case When S.Area = 'SP20' Then NULL Else F.WK06 End As WK06
From   @Forecast F
       Full Outer Join @Sales S
         On  F.Customer = S.Customer
         And F.Product = S.Product
         [!]And S.wk01 + s.wk02 + s.wk03 <> 0[/!]

Also noticed that I removed the NullIf functions, and it still returns the same data as your expected output. Is this what you were looking for?

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Hi George,

Thank you so much for coming back. We're so close! That last line you added nearly cracked it, but threw up an issue that I didn't realise before.

Your code solves pretty much all the conditions I set, but it seems that I'm wrong about (c). :( In 95% of instances where there are two agents, one agent has no sales.wk* values - which satisfies the last line of your code. But it turns out that in 5% of the cases, the agent which should have 0 wk values has a very small amount. (1 unit compared to the other units 20 units, for example.)

This is causing the FORECAST numbers to inflate, because in this situation, the FORECAST row will be attached twice. Is there a way where we can replace the <> 0 logic for something like "just attach the forecast to the max agent where the area <> 'SP20'" or something like that?

Thanks again for all your time.
 
Do you have an primary keys in the sales and forecast table? The reason I ask is that we are going to want to get the record that has the largest week values. This will eventually be a subquery with another link etc...

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
The key of the FORECAST table is Customer + Product.

For SALES it's Customer + Product + Area + Agent.

I tried substituting your <> 0 line for SALES.TS = (select max(ts) from SALES group by <key fields as above>) where ts is the total of the SALES.Wk* columns, but I got an indistinct values error.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top