Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Curious question

Status
Not open for further replies.

LarrySteele

Programmer
May 18, 2004
318
US
Background first...

I had a table with fields such as
Code:
PERSON_ID    V
0001         A1
0001         A2
0001         A3
0002         A1
0002         B1
0003         A1
0003         A3
0003         B1

The desired output was
Code:
PERSON_ID    V
0001         (A1,A2,A3)
0002         (A1,B1)
0003         (A1,A3,B1)

I searched and found a number of solutions available for v 8.2 and higher using various XML related functions. We have 8.1.5 and sure enough the functions didn't appear to work right.

Since the output was going to Microsoft Access I chose to just let the application concatenate the values. Quick, efficient, and works well.

Now, the question...

On one of the websites I visited, a poster was adamant that data manipulation like this should be handled by the client application, not the db engine.

I've never heard anything like this before and wanted to see what others think - especially DBA's. Should I be concerned with amount of work I put on the db engine versus the amount I put on the client application? If yes, what's a good rule of thumb?
 
what's a good rule of thumb?" Don't dim the datacenter lights. . .<g>

If something really is a one-time shot (unfortunately, many OTS live forever), i believe it is less important to ensure the most efficient execution. If a query is to be run over and over (especially if multi-user, online) it is important to conserve resources. A bit more developer time is worth the investment.

Something to keep in mind is that a query that requires an excessive amount of system resources hurts the environment, not just the individual query.

I don't know any specific numbers to use to measure. If you work with your dba(s), they may have some more specific guidance.
 
I've always been told as a rule of thumb that sql should never be used to edit data.

Philippe
 
Larry, in the past I have had similar issues and using a little VBA in Access was MUCH more efficient than trying to work out a solution in the DB. Anything not set-based should be frowned upon and in your case you want a programming solution..

Ties Blom

 
Thanks for the responses. Good to hear how others approach this question of balance.
 
This is solution. How I promissed.

Code:
with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_in i1, table
(select count(*) + 1 seq from Person_in i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
)
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Result:

PERSON_ID V
0001 (A1, A2, A3)
0002 (A1, B1)
0003 (A1, A3, B1)


Lenny
 
Lenny, this is a solution for the exact set. It would require going over the code again and again for different sets. A generic solution would involve some kind of programming don't you think?

Ties Blom

 
Hi Ties Blom !

How I know this is most generic solution for this kind of task.

If you have duplicates for this two columns, or Nulls, it could make solution a bit different, but you have to think in same way.

Lenny
 
In case if we have DB2 V9, we can use function rowno over() and Order by in Person_Seq step.

If we have duplicates and Nulls we have to add Person_dst and change Person_Seq step.
See bellow:


Code:
with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
[b],
Person_dst (person_id, v) as 
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
) [/b]
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny
 
Correction:
Code:
with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
,
Person_dst (person_id, v) as 
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_dst) mm
) 
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny
 
Lenny,

Interesting option for solving the problem, but the question still stands. Is it better to use the db engine's resources for data display manipulation like this, or is it better to let the client application manipulate the data display?

I do like that you provided the link and a workable solution in DB2. Your suggestion deserves a star.

Thanks,
Larry
 
Larry,
This is a very difficult question to answer as it depends on each different scenario. I'm sure that there are cases, dependent on data and table structure, where it is more efficient to use the DB engine, and other times when it is better to use a 3rd party programming language.

I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.

Far better to have a proper language that easily sets out, with comments if necessary, what it is trying to do and efficiently goes about it task. Even if there is little performance difference between a complicated piece of SQL and a proper language, I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.

Marc
 
I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.
Well said. IMHO, it is a technical hobbyist approach rather than a professional IT approach. Given that most work is maintenance, care must be taken that much later (when the author has moved on), the code is maintainable.

I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.
And/or reach a point where the "next requirement" could not be met with "only" SQL. . . Then watch the "fun" when someone must explain to management that the "really neat trick" has to be completely re-done.
 
Hi Larry !

Any client application manipulate the data faster then DB engine.

But anyway you have to declare cursor inside of the application and fetch rows one by one.

Therefore you can't make program without DB engine.

But if you want to show the result immediately, you have no choice.

Anyway, you have to know your real table would be involve only in the first step, then release, same like in an application program.

Lenny

 
I'm impressed of the Lenny's solution, but it doesn't work for me. When I try it, I get this error:
Code:
SQL State: 42999
Vendor Code: -255
Message: [SQL0255] Function not supported for query. Cause . . . . . :   The reason code is 5: -- Code 1 -- Scalar subselects and lateral correlation from a nested table expression are not allowed with distributed files. -- Code 2 -- Error occurred while using a temporary distributed file. -- Code 3 -- EXCEPT or INTERSECT not supported for this query. -- Code 4 -- A sequence reference is not supported with distributed files. -- Code 5 -- A recursive common table expression is not supported for this query. -- Code 6 -- An OLAP function is not supported for this query. -- Code 7 -- ORDER OF is not supported for this query. -- Code 8 -- Scalar fullselect is not supported for this query. -- Code 9 -- A distributed file is being processed in a multi-thread capable job or this is not the initial thread. Recovery  . . . :   A list of corrective actions follow: -- If code 1, change the query so it does not use scalar subselects or correlation from a nested table expression. -- If code 2, see the previous messages for more information. -- If code 3, remove EXCEPT or INTERSECT from the query. -- If code 4, remove the sequence reference from the query. -- If code 5, remove the recursive common table expression from the query. -- If code 6, remove the OLAP function from the query. -- If code 7, remove the ORDER OF from the query. -- If code 8, respecify the query without a fullselect. -- If code 9, do not use multiple threads to run the query.
I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

I don't know how to figure out an error in Lenny's big SQL-query, but I can solve the above task quickly using an embedded SQL in an programming languages.

First I created the table
Code:
[COLOR=#804040][b]CREATE[/b][/color] [COLOR=#6a5acd]TABLE[/color] IBPDDB.PERSONS (
  PERSON_ID [COLOR=#6a5acd]DECIMAL[/color]([COLOR=#ff00ff]4[/color], [COLOR=#ff00ff]0[/color]) [COLOR=#6a5acd]DEFAULT[/color] [COLOR=#6a5acd]NULL[/color],
  V [COLOR=#2e8b57][b]CHAR[/b][/color]([COLOR=#ff00ff]30[/color]) CCSID [COLOR=#ff00ff]870[/color] [COLOR=#6a5acd]DEFAULT[/color] [COLOR=#6a5acd]NULL[/color])
then I filled it with the above data and then I have written this short REXX-program which does the task:
persons.rexx
Code:
[COLOR=#0000ff]/* Global Environment is EXECSQL*/[/color]
[COLOR=#804040][b]address[/b][/color] [COLOR=#ff00ff]'EXECSQL'[/color]

[COLOR=#0000ff]/* Declare cursor for select statement */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]'DECLARE C1 CURSOR FOR'[/color][highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]' SELECT PERSON_ID, V FROM PERSONS'[/color]

[COLOR=#0000ff]/* Open Cursor */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]'OPEN C1'[/color]

[COLOR=#0000ff]/* Process records */[/color]
persons. [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]''[/color]
[COLOR=#804040][b]do[/b][/color][COLOR=#804040][b] while [/b][/color][COLOR=#008080]([/color]SQLCODE [COLOR=#804040][b]=[/b][/color] 0[COLOR=#008080])[/color]
  EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
         [COLOR=#ff00ff]'FETCH C1 INTO :pid, :val'[/color]

  [COLOR=#0000ff]/* If EOF, stop the loop and don't print out the fetched values */[/color]
  [COLOR=#804040][b]if[/b][/color] SQLCODE [COLOR=#804040][b]=[/b][/color] 100[COLOR=#804040][b] then [/b][/color]leave

  [COLOR=#0000ff]/* reformat pid from ' 0001.' to '0001' */[/color]
  pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]substr(strip([/color]pid[COLOR=#008080])[/color][COLOR=#804040][b],[/b][/color]1[COLOR=#804040][b],[/b][/color][COLOR=#008080]length(strip([/color]pid[COLOR=#008080]))[/color][COLOR=#804040][b]-[/b][/color]1[COLOR=#008080])[/color]
  [COLOR=#0000ff]/* Add values to persons-stem */[/color]
[COLOR=#804040][b]  if persons.pid = '' then do[/b][/color]  [COLOR=#0000ff]/* add first value */[/color]
    persons.pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]strip([/color]val[COLOR=#008080])[/color]
    persons.pids [COLOR=#804040][b]=[/b][/color] persons.pids [COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]' '[/color][COLOR=#804040][b]||[/b][/color] pid
  [COLOR=#804040][b]end[/b][/color]
[COLOR=#804040][b]  else [/b][/color]do [COLOR=#0000ff]/* append next value to the existing values */[/color]
    persons.pid [COLOR=#804040][b]=[/b][/color] persons.pid [COLOR=#804040][b]||[/b][/color] [COLOR=#ff00ff]', '[/color] [COLOR=#804040][b]||[/b][/color] [COLOR=#008080]strip([/color]val[COLOR=#008080])[/color]
  [COLOR=#804040][b]end[/b][/color]
[COLOR=#804040][b]end[/b][/color]

[COLOR=#0000ff]/* Print the resulting stem, i.e. the report */[/color]
[COLOR=#804040][b]say[/b][/color] PERSON_ID[COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]'    V'[/color]
[COLOR=#804040][b]do[/b][/color] i[COLOR=#804040][b]=[/b][/color]1[COLOR=#804040][b] to [/b][/color][COLOR=#008080]words([/color]persons.pids[COLOR=#008080])[/color]
  pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]word([/color]persons.pids[COLOR=#804040][b],[/b][/color] i[COLOR=#008080])[/color]
  [COLOR=#804040][b]say[/b][/color] pid  [COLOR=#ff00ff]'       ('[/color][COLOR=#804040][b]||[/b][/color]persons.pid[COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]')'[/color]
[COLOR=#804040][b]end[/b][/color]

[COLOR=#0000ff]/* Close Cursor */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
   [COLOR=#ff00ff]'CLOSE C1'[/color]

[COLOR=#0000ff]/* End Pgm */[/color]
[COLOR=#804040][b]exit[/b][/color]
After running the REXX member with
Code:
STRREXPRC SRCMBR(PERSONS)
I get the resulting report
Code:
PERSON_ID    V                  
0001        (A1, A2, A3)        
0002        (A1, B1)            
0003        (A1, A3, B1)

I used in the example REXX, because it's historically oldest scripting language I know and it's natively on iSeries, but I could use compiled language like C, COBOL or RPG too.
Howevever, my favorites for such jobs are scripting languages like Perl, Python, Ruby,... These are more powerful than REXX, so coding would be simpler and would make more fun as in the example above.
IMHO, an source written in an programming language is better maintanable then long SQL-statements.
 
There are numerous limitations when using common table expressions in iSeries.

I remember reading a document explaining the limitations, but unable to find it now

I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

For db2 resoruces visit More DB2 questions answered at &
 
Thanks everyone for posting. I was hoping to see a discussion. The example I posted was just to get the discussion going, I wasn't looking for a solution to a specific problem. Nevertheless, I'm glad to see Lenny's proposed solution got the discussion going and I find it useful, hope others do too.

Mikrom's post was an excellent example of Marc and papadba's posts. Trying to understand the sql error statement is hard enough, trying to understand how the sql & data caused the error is a real drain on developer resources. However, the REXX routine looks pretty straight forward, easy to understand, and would be much easier to troubleshoot errors. That's much like the approach I took, just different programming language/platform.

I still think Lenny's sql has merit and glad he posted. I think there's benefit in learning new methods, even if they're not used. Never know when something might be the right tool for the job.

Thanks again for the lively discussion!

Larry
 
Any problem has dozens solutions.

Programmer has to find the best way to solve problem.

One more solution.

If you know depth for person (in our case could be 3) you can use very simple query:


Code:
Select p1.person_id, 
'(' || p1.v   
|| ifnull(', ' || p2.v, '')
|| ifnull(', ' || p3.v, '') || ')' as V
from 
persons p1 left join persons p2
On p1.person_id = p2.person_id 
   and p2.v > p1.v
left join persons p3 
On p2.person_id = p3.person_id 
   and p3.v > p2.v
order by p1.person_id

Lenny Khiger, ADSPA&VP
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top