Curious question

LarrySteele · Sep 23, 2009

Background first...

I had a table with fields such as

Code:

PERSON_ID    V
0001         A1
0001         A2
0001         A3
0002         A1
0002         B1
0003         A1
0003         A3
0003         B1

The desired output was

Code:

PERSON_ID    V
0001         (A1,A2,A3)
0002         (A1,B1)
0003         (A1,A3,B1)

I searched and found a number of solutions available for v 8.2 and higher using various XML related functions. We have 8.1.5 and sure enough the functions didn't appear to work right.

Since the output was going to Microsoft Access I chose to just let the application concatenate the values. Quick, efficient, and works well.

Now, the question...

On one of the websites I visited, a poster was adamant that data manipulation like this should be handled by the client application, not the db engine.

I've never heard anything like this before and wanted to see what others think - especially DBA's. Should I be concerned with amount of work I put on the db engine versus the amount I put on the client application? If yes, what's a good rule of thumb?

papadba · Sep 23, 2009

what's a good rule of thumb?" Don't dim the datacenter lights. . .<g>

If something really is a one-time shot (unfortunately, many OTS live forever), i believe it is less important to ensure the most efficient execution. If a query is to be run over and over (especially if multi-user, online) it is important to conserve resources. A bit more developer time is worth the investment.

Something to keep in mind is that a query that requires an excessive amount of system resources hurts the environment, not just the individual query.

I don't know any specific numbers to use to measure. If you work with your dba(s), they may have some more specific guidance.

Mercury2 · Sep 23, 2009

I've always been told as a rule of thumb that sql should never be used to edit data.

Philippe

blom0344 · Sep 24, 2009

Larry, in the past I have had similar issues and using a little VBA in Access was MUCH more efficient than trying to work out a solution in the DB. Anything not set-based should be frowned upon and in your case you want a programming solution..

Ties Blom

LarrySteele · Sep 24, 2009

Thanks for the responses. Good to hear how others approach this question of balance.

LennyKhiger · Oct 31, 2009

Dear LarrySteele !

For solving your problem you have to use recursive common table expression technique.

This is the best way to solve your problem.

I can create this query to you, but would be much better if you will learn about this before:

http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/DSNSQK17/4.4.1?DT=20091010080512

Lenny

LennyKhiger · Nov 2, 2009

This is solution. How I promissed.

Code:

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_in i1, table
(select count(*) + 1 seq from Person_in i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
)
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Result:

PERSON_ID V
0001 (A1, A2, A3)
0002 (A1, B1)
0003 (A1, A3, B1)

Lenny

blom0344 · Nov 3, 2009

Lenny, this is a solution for the exact set. It would require going over the code again and again for different sets. A generic solution would involve some kind of programming don't you think?

Ties Blom

LennyKhiger · Nov 3, 2009

Hi Ties Blom !

How I know this is most generic solution for this kind of task.

If you have duplicates for this two columns, or Nulls, it could make solution a bit different, but you have to think in same way.

Lenny

SkipVought · Nov 3, 2009

Huh?

Skip,
_{Just traded in my old subtlety...

for a NUANCE!}

LennyKhiger · Nov 3, 2009

In case if we have DB2 V9, we can use function rowno over() and Order by in Person_Seq step.

If we have duplicates and Nulls we have to add Person_dst and change Person_Seq step.
See bellow:

Code:

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
[b],
Person_dst (person_id, v) as 
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_in) mm
) [/b]
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny

LennyKhiger · Nov 3, 2009

Correction:

Code:

with Person_in (person_id, v) as
(
select varchar('0001', 50), varchar('A1', 50)
from sysibm.sysdummy1 
union all
select '0001', 'A2'
from sysibm.sysdummy1 
union all
select '0001', 'A3'
from sysibm.sysdummy1 
union all
select '0002', 'A1'
from sysibm.sysdummy1 
union all
select '0002', 'B1'
from sysibm.sysdummy1 
union all
select '0003', 'A1'
from sysibm.sysdummy1 
union all
select '0003', 'A3'
from sysibm.sysdummy1 
union all
select '0003', 'B1'
from sysibm.sysdummy1 
)
,
Person_dst (person_id, v) as 
(select distinct
person_id, ifnull(v, ' ')
from Person_in
)
,
Person_Seq (person_id, v, seq, Mseq) as
(
select person_id, v, seq, Mseq
from Person_dst i1, table
(select count(*) + 1 seq from Person_dst i2
  where i1.person_id || i1.v > i2.person_id || i2.v ) ss
, table
(select count(*) + 1 Mseq from Person_dst) mm
) 
,
Person_Final (person_id, v, seq, Mseq) as 
(
select person_id, varchar('(' || v || ')', 2000), 1, Mseq 
  from Person_Seq where seq = 1
Union All
select f1.person_id, replace(f1.v, ')', ', ' || p1.v || ')'), 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq       = f1.seq + 1
  and p1.person_id = f1.person_id 
  and f1.seq + 1  <= f1.Mseq
Union All
select p1.person_id, '(' || p1.v || ')', 
       f1.seq + 1, f1.Mseq 
  from Person_Seq p1, Person_Final f1
where p1.seq        = f1.seq + 1
  and p1.person_id <> f1.person_id 
  and f1.seq + 1   <= f1.Mseq
) 
,
Person_Result (person_id, v) as
(select  person_id, v from Person_Final f2  
   where f2.seq = (select max(f3.seq) from Person_Final f3
                    where f3.person_id = f2.person_id      )
)
select * from Person_Result

Lenny

LarrySteele · Nov 3, 2009

Lenny,

Interesting option for solving the problem, but the question still stands. Is it better to use the db engine's resources for data display manipulation like this, or is it better to let the client application manipulate the data display?

I do like that you provided the link and a workable solution in DB2. Your suggestion deserves a star.

Thanks,
Larry

MarcLodge · Nov 3, 2009

Larry,
This is a very difficult question to answer as it depends on each different scenario. I'm sure that there are cases, dependent on data and table structure, where it is more efficient to use the DB engine, and other times when it is better to use a 3rd party programming language.

I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.

Far better to have a proper language that easily sets out, with comments if necessary, what it is trying to do and efficiently goes about it task. Even if there is little performance difference between a complicated piece of SQL and a proper language, I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.

Marc

papadba · Nov 3, 2009

I'm going to stick my neck out though and plump for programming languages. As good as Lenny's SQL is, in a business environment, it is a non starter as it is just too complicated to support. Making amendments to it or trying to find an error in it would be a nightmare.

Well said. IMHO, it is a technical hobbyist approach rather than a professional IT approach. Given that most work is maintenance, care must be taken that much later (when the author has moved on), the code is maintainable.

I would head down the language path because the unseen cost of maintaining large complicated swathes of SQL would eventually add up.

And/or reach a point where the "next requirement" could not be met with "only" SQL. . . Then watch the "fun" when someone must explain to management that the "really neat trick" has to be completely re-done.

LennyKhiger · Nov 3, 2009

Hi Larry !

Any client application manipulate the data faster then DB engine.

But anyway you have to declare cursor inside of the application and fetch rows one by one.

Therefore you can't make program without DB engine.

But if you want to show the result immediately, you have no choice.

Anyway, you have to know your real table would be involve only in the first step, then release, same like in an application program.

Lenny

mikrom · Nov 4, 2009

I'm impressed of the Lenny's solution, but it doesn't work for me. When I try it, I get this error:

Code:

SQL State: 42999
Vendor Code: -255
Message: [SQL0255] Function not supported for query. Cause . . . . . :   The reason code is 5: -- Code 1 -- Scalar subselects and lateral correlation from a nested table expression are not allowed with distributed files. -- Code 2 -- Error occurred while using a temporary distributed file. -- Code 3 -- EXCEPT or INTERSECT not supported for this query. -- Code 4 -- A sequence reference is not supported with distributed files. -- Code 5 -- A recursive common table expression is not supported for this query. -- Code 6 -- An OLAP function is not supported for this query. -- Code 7 -- ORDER OF is not supported for this query. -- Code 8 -- Scalar fullselect is not supported for this query. -- Code 9 -- A distributed file is being processed in a multi-thread capable job or this is not the initial thread. Recovery  . . . :   A list of corrective actions follow: -- If code 1, change the query so it does not use scalar subselects or correlation from a nested table expression. -- If code 2, see the previous messages for more information. -- If code 3, remove EXCEPT or INTERSECT from the query. -- If code 4, remove the sequence reference from the query. -- If code 5, remove the recursive common table expression from the query. -- If code 6, remove the OLAP function from the query. -- If code 7, remove the ORDER OF from the query. -- If code 8, respecify the query without a fullselect. -- If code 9, do not use multiple threads to run the query.

I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

I don't know how to figure out an error in Lenny's big SQL-query, but I can solve the above task quickly using an embedded SQL in an programming languages.

First I created the table

Code:

[COLOR=#804040][b]CREATE[/b][/color] [COLOR=#6a5acd]TABLE[/color] IBPDDB.PERSONS (
  PERSON_ID [COLOR=#6a5acd]DECIMAL[/color]([COLOR=#ff00ff]4[/color], [COLOR=#ff00ff]0[/color]) [COLOR=#6a5acd]DEFAULT[/color] [COLOR=#6a5acd]NULL[/color],
  V [COLOR=#2e8b57][b]CHAR[/b][/color]([COLOR=#ff00ff]30[/color]) CCSID [COLOR=#ff00ff]870[/color] [COLOR=#6a5acd]DEFAULT[/color] [COLOR=#6a5acd]NULL[/color])

then I filled it with the above data and then I have written this short REXX-program which does the task:
persons.rexx

Code:

[COLOR=#0000ff]/* Global Environment is EXECSQL*/[/color]
[COLOR=#804040][b]address[/b][/color] [COLOR=#ff00ff]'EXECSQL'[/color]

[COLOR=#0000ff]/* Declare cursor for select statement */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]'DECLARE C1 CURSOR FOR'[/color][highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]' SELECT PERSON_ID, V FROM PERSONS'[/color]

[COLOR=#0000ff]/* Open Cursor */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
  [COLOR=#ff00ff]'OPEN C1'[/color]

[COLOR=#0000ff]/* Process records */[/color]
persons. [COLOR=#804040][b]=[/b][/color] [COLOR=#ff00ff]''[/color]
[COLOR=#804040][b]do[/b][/color][COLOR=#804040][b] while [/b][/color][COLOR=#008080]([/color]SQLCODE [COLOR=#804040][b]=[/b][/color] 0[COLOR=#008080])[/color]
  EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
         [COLOR=#ff00ff]'FETCH C1 INTO :pid, :val'[/color]

  [COLOR=#0000ff]/* If EOF, stop the loop and don't print out the fetched values */[/color]
  [COLOR=#804040][b]if[/b][/color] SQLCODE [COLOR=#804040][b]=[/b][/color] 100[COLOR=#804040][b] then [/b][/color]leave

  [COLOR=#0000ff]/* reformat pid from ' 0001.' to '0001' */[/color]
  pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]substr(strip([/color]pid[COLOR=#008080])[/color][COLOR=#804040][b],[/b][/color]1[COLOR=#804040][b],[/b][/color][COLOR=#008080]length(strip([/color]pid[COLOR=#008080]))[/color][COLOR=#804040][b]-[/b][/color]1[COLOR=#008080])[/color]
  [COLOR=#0000ff]/* Add values to persons-stem */[/color]
[COLOR=#804040][b]  if persons.pid = '' then do[/b][/color]  [COLOR=#0000ff]/* add first value */[/color]
    persons.pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]strip([/color]val[COLOR=#008080])[/color]
    persons.pids [COLOR=#804040][b]=[/b][/color] persons.pids [COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]' '[/color][COLOR=#804040][b]||[/b][/color] pid
  [COLOR=#804040][b]end[/b][/color]
[COLOR=#804040][b]  else [/b][/color]do [COLOR=#0000ff]/* append next value to the existing values */[/color]
    persons.pid [COLOR=#804040][b]=[/b][/color] persons.pid [COLOR=#804040][b]||[/b][/color] [COLOR=#ff00ff]', '[/color] [COLOR=#804040][b]||[/b][/color] [COLOR=#008080]strip([/color]val[COLOR=#008080])[/color]
  [COLOR=#804040][b]end[/b][/color]
[COLOR=#804040][b]end[/b][/color]

[COLOR=#0000ff]/* Print the resulting stem, i.e. the report */[/color]
[COLOR=#804040][b]say[/b][/color] PERSON_ID[COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]'    V'[/color]
[COLOR=#804040][b]do[/b][/color] i[COLOR=#804040][b]=[/b][/color]1[COLOR=#804040][b] to [/b][/color][COLOR=#008080]words([/color]persons.pids[COLOR=#008080])[/color]
  pid [COLOR=#804040][b]=[/b][/color] [COLOR=#008080]word([/color]persons.pids[COLOR=#804040][b],[/b][/color] i[COLOR=#008080])[/color]
  [COLOR=#804040][b]say[/b][/color] pid  [COLOR=#ff00ff]'       ('[/color][COLOR=#804040][b]||[/b][/color]persons.pid[COLOR=#804040][b]||[/b][/color][COLOR=#ff00ff]')'[/color]
[COLOR=#804040][b]end[/b][/color]

[COLOR=#0000ff]/* Close Cursor */[/color]
EXECSQL[highlight #ffff00][COLOR=#000000],[/color][/highlight]
   [COLOR=#ff00ff]'CLOSE C1'[/color]

[COLOR=#0000ff]/* End Pgm */[/color]
[COLOR=#804040][b]exit[/b][/color]

After running the REXX member with

Code:

STRREXPRC SRCMBR(PERSONS)

I get the resulting report

Code:

PERSON_ID    V                  
0001        (A1, A2, A3)        
0002        (A1, B1)            
0003        (A1, A3, B1)

I used in the example REXX, because it's historically oldest scripting language I know and it's natively on iSeries, but I could use compiled language like C, COBOL or RPG too.
Howevever, my favorites for such jobs are scripting languages like Perl, Python, Ruby,... These are more powerful than REXX, so coding would be simpler and would make more fun as in the example above.
IMHO, an source written in an programming language is better maintanable then long SQL-statements.

sathyarams · Nov 4, 2009

There are numerous limitations when using common table expressions in iSeries.

I remember reading a document explaining the limitations, but unable to find it now

I'm on DB2 UDB on IBM iSeries (OS version: V5R4), where recursion in SQL should be supported too, so I don't understand this error. Maybe differs DB2 on iSeries from DB2 on other systems?

For db2 resoruces visit

http://www.db2click.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Curious question

Programmer

MIS

Technical User

Technical User

Programmer

Programmer

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

MIS

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor