Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can I rely on table order in update command for concatenation?

Status
Not open for further replies.

RAS56

Programmer
Sep 28, 2001
24
US
Hello all,

I have a process where I am converting records from an old FoxPro program into MSSQL 2000. One of the FP tables has a structure similar to the one shown below in entryitems. The FP table has multiple lines used for paragraphs of information that need to be concatenated. I've also included some notes about the actual data in the records of the sample data.

I am selecting these records into an intermediate table (wrktbl) in a specific order and then updating that table with the update command shown below. So far, this works on my small example and seems to work on the actual table that has about 375k rows. The update statement relies on the records processing in the correct order (key1, key2, eline desc) to work correctly.

Can anyone tell me if I can rely on the update statement to process these rows in this order consistently?

I've also solved this problem using a stored procedure with a cursor but the cursor method takes hours to run on the 375k row table while this process takes less than a minute.

Thanks for looking at this.


Here's my sample code:

-- create sample table with data
create table entryitems (key1 char(4), key2 char(5), eline int, edate char(8), itemtext varchar(60))
insert into entryitems values ('2005','00001',1,'20050101','This is the first row of a multiline item')
insert into entryitems values ('2005','00001',2,' ','that continues on this row in paragraph form.')
insert into entryitems values ('2005','00001',3,' ','There is no fixed number of rows for any given')
insert into entryitems values ('2005','00001',4,' ','group of items.')
insert into entryitems values ('2005','00001',5,'20050101','The indication that a new group is beginning is')
insert into entryitems values ('2005','00001',6,' ','that the edate field will contain a string in')
insert into entryitems values ('2005','00001',7,' ','YYYYMMDD format.')
insert into entryitems values ('2005','00002',1,'20050101','When a new key value (key1+key2) changes or appears.')
insert into entryitems values ('2005','00002',2,' ','the first time, the eline value will start at 1 or')
insert into entryitems values ('2005','00002',2,' ','increment from the last eline value matching that key.')
insert into entryitems values ('2005','00001',8,'20050102','Some rows stand alone.')
insert into entryitems values ('2005','00002',3,'20050101','')
insert into entryitems values ('2005','00002',4,' ','')
insert into entryitems values ('2005','00003',1,'20050103','Some rows are blank.')

-- create working table with records ordered
create table wrktbl (key1 char(4), key2 char(5), eline int, edate char(8), itemtext varchar(60), combtext varchar(8000))
insert into wrktbl select key1,key2,eline,edate,itemtext,itemtext from entryitems order by key1,key2,eline desc

-- process records - concatanating the text fields on multiple lines
declare @combtext varchar(8000), @combtext2 varchar(8000)
update wrktbl
set @combtext=isnull(combtext,'')+' '+isnull(@combtext,''),
@combtext2=@combtext,
combtext=case when edate>'!' then @combtext2 else '' end,
@combtext=case when edate>'!' then '' else @combtext end

-- display results
select * from wrktbl order by key1,key2,eline
drop table wrktbl
 
> Can anyone tell me if I can rely on the update statement to process these rows in this order consistently?

Good question.

This is gray zone of SQL - tables are unordered sets, therefore UPDATE order is meaningless, depends on internals blah blah blah. So general answer is "not 100% reliable"

Typical "preventive" measure is to create cluster on PK/sequence columns (key1, key2, eline) and turn SMP off in UPDATE query with OPTION( MAXDOP 1 ).

You can always add check column (combeline or something) to see is everything OK after update.

And there are other ways to do the same, not so elegant (heh, I like hacks) and not much slower.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Thanks for the tips. I had thought about creating a clustered index - thought that might help. I wasn't aware of the switch you mentioned - "and turn SMP off in UPDATE query with OPTION( MAXDOP 1 )." Will check that out.

I hadn't thought about some kind of check column - I guess I could do a len on all the fields when I copy to the working table, sum those as I'm concatenating and then check the sum of the original fields against the summed value created during concatenation. Is this what you meant by a check column?

Also very intrigued by your mention of other ways: "And there are other ways to do the same, not so elegant (heh, I like hacks) and not much slower." Could you elaborate?

This is not a process that I will be running often. It is a conversion and I should have exclusive use of the server while this is processing. Seems like that might make it a bit more reliable.

Thanks again for any insight into this puzzle.
 
375k rows is not much... I guess with maxdop 1 and cluster code is safe.

Like I said, alternative way is more complex, but does not depend on physical sort of data. Create this function first:
Code:
create function fn_concatentryitems( @key1 char(4), @key2 char(5), @elinefrom int, @elineto int)  
returns varchar(8000)  
as  
begin  
	declare @ret varchar(8000)  

	select @ret = coalesce(@ret+' ', '') + itemtext  
	from entryitems  
	where key1=@key1 and key2=@key2  
		and itemtext is not null and itemtext<>''  
		and eline between @elinefrom and @elineto
	order by eline asc  
  
	return @ret  
end
It simply returns concatenated string for specified composite key and eline range:
Code:
select dbo.fn_concatentryitems( '2005', '00001', 5, 7 )
So...:
Code:
-- extract only relevant rows (with filled elineto)
select key1, key2, eline as elinefrom, convert(int, null) as elineto, convert(varchar(7900), '') as combtext
into #blah
from entryitems
where edate <>'' -- and edate is not null

-- calculate max eline (where string ends)
update T
set elineto = 
	isnull(
		(	select min(elinefrom)
			from #blah T2
			where T.key1=T2.key1 and T.key2=T2.key2
			and T2.elinefrom > T.elinefrom
		), elineFrom+1)-1
from #blah T

-- concatenate strings
update #blah
set combtext = dbo.fn_concatentryitems( key1, key2, elinefrom, elineto )

select * from #blah
-- drop table #blah
Test:
Code:
select A.*,
	isnull(B.combText, '')
from entryitems A
left outer join #blah B on A.key1=B.key1 and A.key2=B.key2 and A.eline=B.elinefrom
order by A.key1, A.key2, A.eline

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Thank you very much for spending the time to give me such a clear example. I applied what you showed me here to my actual example and it worked beautifully.

My original solution with a stored procdure using a cursor looked like it would take several hours to run.

The solution I posted at the beginning of this thread took less than a minute but had the uncertainty of the order of the records in the update command.

The solution implemented with your suggestion took a little over 4 minutes to run but should be totally reliable.

Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top