Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Update column based on sum value of another column 1

Status
Not open for further replies.

bmacbmac

IS-IT--Management
Jan 26, 2006
392
US
Hello. This is a little difficult to explain....

I have a table with the following table:


Code:
create table mytable1 (counter int, pages int, Folder varchar(100))
insert into mytable1 (counter, pages) values (1, 12)
insert into mytable1 (counter, pages) values (2, 13)
insert into mytable1 (counter, pages) values (3, 6)
insert into mytable1 (counter, pages) values (4, 2)
insert into mytable1 (counter, pages) values (5, 1)
insert into mytable1 (counter, pages) values (6, 9)
insert into mytable1 (counter, pages) values (7, 17)
insert into mytable1 (counter, pages) values (8, 10)
insert into mytable1 (counter, pages) values (9, 11)
insert into mytable1 (counter, pages) values (10, 1)


I am trying to place no more than 30 units into a folder name.

I would like to take the 'pages' value for row1 and if it is less than 30, update a folder name.

Then I would like to add in row2 pages and if the total for row 1 and 2 are less than 30 then update to the same folder name.

Then I would like to add in row3 pages. If the pages total for rows 1,2,3 exceed 30 then I need to start a new folder name for row 3.


I'm hoping to end up with something like this:

Counter[tab]pages [tab]Folder
1 [tab] 12 [tab] Folder1
2[tab] 13[tab] Folder1
3[tab] 6[tab] Folder2 [tab](This is a next folder because it would take my sum total for Folder1 over 30)
4[tab] 2[tab] Folder2
5[tab] 1[tab] Folder2
6[tab] 9[tab] Folder2
7[tab] 17[tab] Folder3 [tab](This is the next folder because it would take my sum total for Folder2 over 30)
8[tab] 10[tab] Folder3
9[tab] 11[tab] Folder4 [tab](This is the next folder because it would take my sum total for Folder3 over 30)
10[tab] 1[tab] Folder4


I am trying to do this with some while loops, but my I'm putting While scripts within while scripts and I am not getting it correctly.

Any ideas?
 
Bit more info - depending on the real data and on what you are trying to achieve the best solution may not be the same.

how many rows will you be processing?
Is this a once off or will it be ongoing?
will this be a full process or will it be a delta from previous run?
What are the real rules for ordering the buckets? I know you are using the counter on your example but this may not be the real rule on live data.
is the ordering field always incremented by 1 or will it be random increments?

And what version of SQL Server does this need to run on - depending on it solution may also be different as 2014/2016 have new functionality that may eventually be used.

It would help if you could give us real data and describe the whole process you are doing and what is the ultimate goal of this exercise as we may be able to give a better solution or even suggest alternative ways.

Just as an example that will not work if the input rows are more than 32k
Code:
;with src (counter, pages, sum_pages, group_set) as 
(select top 1  m.counter
              ,m.pages
              ,m.pages as sum_pages
              ,cast(1 as int) as group_set
 from mytable1 m
 union all
 select  m.counter
        ,m.pages
        ,case
         when s1.sum_pages + m.pages > 30 
            then m.pages
         else s1.sum_pages + m.pages
         end as sum_pages
        ,case
         when s1.sum_pages + m.pages > 30 
            then s1.group_set + 1
         else s1.group_set
         end as group_set
 from mytable1 m
 inner join src s1
 on s1.counter + 1 = m.counter
)
select  counter
       ,pages
       ,'Folder' + convert(varchar(20), group_set)
from src
option (maxrecursion 32767)

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Hello and thanks for your quick reply and help.

SQL 2008 R2

I will have about 200,000 rows to process. The counter column represents a document number in our database and the pages = the number of pages for that specific document.

The pages sum will = about 1.2 million for all 200,000 rows.

I have a conversion project where I have to place the documents (in counter order) in folders containing 7500-8000 pages. I can't copy a blanket 7500 pages into each folder because most likely the 7500th page will be the middle of a document that will be cut off. I need to have complete documents to end each folder.

So my thought was to sum the pages one row at a time (in order) until I hit 7500, assign a folder, and then start over for the next rows.

The folder names just need to go in order. I could even just assign a value 1,2,3,4,5 and then update them later to an actual 'Folder' + convert(varchar (20), folder) or something like that.

I was going to use the final results to create a quick batch file to run the full copy overnight.

I know there are better programming ways to go about this, but SQL is all I know... And I'm hoping to find a way to not copy/paste 1.2 million images in groups of 7,500.


Live data looks like this:

Code:
create table RecordedDocs (counter int identity(1,1), imageid int, sdocumentno varchar(50), Folder varchar(50), pagenumto int)

Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8312,'0000-000000',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8313,'1882-01274',8)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8314,'1972-01712',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8315,'1976-00013',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8316,'1976-00033',12)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8317,'1976-00047',2)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8318,'1976-00054',19)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5794,'1976-00065',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5795,'1976-00066',2)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5796,'1976-00067',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5797,'1976-00068',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5798,'1976-00069',21)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5799,'1976-00070',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5800,'1976-00085',4)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5801,'1976-00103',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5802,'1976-00104',3)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (5803,'1976-00105',7)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8319,'1976-00148',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8320,'1976-00165',1)
Insert into recordeddocs (imageid, sdocumentno, pagenumto) values (8321,'1976-00166',16)

Thanks again for your time.

 
This is what I have so far. You can probably imagine how slow this is going.... but it works! Hoping to find a faster way... I am inserting the data one row at a time into another table...

I added a 'Process' column to my original RecordedDocs table.


Code:
create table Tempcount (imageid int, pagenumto int, folder int)



while (select count(counter) from RecordedDocs where process is null) > 0
begin
declare @mincounter int
set @mincounter=(select min(counter) from recordeddocs where process is null)

insert into Tempcount (imageid, pagenumto) select imageid, pagenumto from RecordedDocs where counter = @mincounter

declare @folder int
set @folder = (select isnull(max(folder), 0) from #tempcount)

if (select sum(pagenumto) from tempcount where folder is null) > 7900
update tempcount set folder = @folder + 1 where folder is null

update RecordedDocs set process = 'x' where counter = @mincounter
end
 
try the following.
change the 5000 value to your desired value.

Assumption made that sdocumentno is never null - if it is then further coding will be required

Code:
declare @runningtotal int = 0
declare @runningtotal_ver int = 0
declare @groupno int = 1
declare @cutoff int = 5000

if object_id('tempdb..#RecordedDocsSum') is not null
    drop table #RecordedDocsSum;

select min(rd.counter) as counter
      ,rd.sdocumentno
      ,sum(rd.pagenumto) as pagenumto
      , cast(null as int) as runningtotal
      , cast(null as int) as groupno
    into #RecordedDocsSum
from RecordedDocs rd
group by rd.sdocumentno

create clustered index #RecordedDocsSum_ix1 on #RecordedDocsSum
(counter
, sdocumentno
)


update recs
    set @runningtotal = runningtotal = case
        when @runningtotal_ver + recs.pagenumto > @cutoff
            then recs.pagenumto
        else @runningtotal + recs.pagenumto
        end
       ,@groupno = groupno = case
        when @runningtotal_ver + recs.pagenumto > @cutoff
            then @groupno + 1
        else @groupno
        end
       ,@runningtotal_ver = @runningtotal
from #RecordedDocsSum recs
option (maxdop 1)

update recs
    set Folder = 'Folder' + convert(varchar(20), rds.groupno)
from RecordedDocs recs
inner join #RecordedDocsSum rds
on recs.sdocumentno = rds.sdocumentno

-- visually verify that the number of records is not higher than the max
select top 5 rd.Folder
      ,sum(rd.pagenumto) numpages
from RecordedDocs rd
group by rd.Folder
order by numpages desc


Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Wow that is fantastic! I can't believe how fast it ran. I was able to follow that logic nicely and really appreciate the hours you saved me.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top