Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Move data from one table to another

Status
Not open for further replies.

Geates

Programmer
Aug 25, 2009
1,566
US
I have a database that has been collecting data for several years. Only the past 90 days is used daily, the rest can be archived. I would like to do so by moving it into another table. I have written a query that returns rows older than 90 days:

Code:
SELECT table.A, table.B, table.C, table.D
FROM AllData AS table
WHERE dDATEADD(day, -90, table.D > '2013-09-11')

How can I take these results and insert them into another table

Code:
INSERT INTO ArchiveData (A, B, C, D)
VALUES (???)

-Geates

 
sure
Just
SQL:
INSERT INTO ArchiveData (A, B, C, D)
SELECT table.A, table.B, table.C, table.D
FROM AllData AS table
WHERE dDATEADD(day, -90, table.D > '2013-09-11')
 
Out of curiosity, why do you want to archive the data to another table in the same database? Doing so will not reduce the size of the database (it may actually increase the size).

I suspect the answer is performance, but would like confirmation on that. If this is the case, can you tell us how many rows are in the table? I ask because it may make more sense to improve the performance of your existing queries instead of separating the data in multiple tables.

Alternatively, if you truly want to separate the data, you may want to consider adding an insert trigger on the table to copies the data to the Archive table. Then you could write a sql job that runs daily to delete the old data from the original table.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
@gmmastros. My goal isn't to reduce the size of the DB, it is to reduce the size of the main table. I wrote an asset and logon auditing application that has been recording elements of each logon on a computer since 2/2009. We experience about 500 logons audits a day from 300+ assets. There are about 500,000 rows. I built the db to (what I thought) maximizes performance (initial entries are hefty but only the data that has changed is updated in the DB on subsequent entries)

@gk53. Thanks! that was easy.

-Geates

 
Why do you want to reduce the size of the main table? I ask because 500,000 rows in a table should not pose any problems for SQL server under normal circumstances. People don't normally change their database without having a good reason, so I am wondering what your reason is. I suspect the application is becoming sluggish or even downright slow.

If this is the case, you may want to take a closer look at your indexes and your code rather then trying to split the data in to multiple tables. I say this because you likely have reports with your app. Now that you have another table with archived data, you will have a more difficult time actually using that data. If the data was in the same table (not split out), but you optimized your indexes and queries, then you won't need to change any of your front end code.

In my opinion, this would be a better approach than separating the data.

If you are having performance problems, then post the code here and I (among others) will help you to improve the performance of the queries.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
to delete the rows...

Code:
DELETE
FROM AllData AS table
WHERE dDATEADD(day, -90, table.D > '2013-09-11')

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
something like that


SQL:
delete from d
from AllData d
join ArchiveData a
	on d.D = a.D
 
I don't know the correct terminology but I'll try to explain it as best I can.

The DB consists of 2 tables. The "Values" table has only 2 columns; pk and value (varchar(50). The "Assets" table has 6 columns containing a fk to the Values table. One row in the Assets describes one element in the audit (in my case, it's the serial number, as it is unique [so far :) ]). The most basic of my queries INNER JOINs the Values table for each column. Furthermore, this Application in an HTA coded with an amalgam of VBS, JS, HTML, and CSS. It might be easier to re-write the HTA (which I plan to do) with the optimizations - the methods used in web development has change a lot has since 2009.

Example.
Code:
Values:
pk    value
1     red
2     blue
3     Cindy
4     Max
5     3-speed
6     18-speed
7     Color
8     Speed

Bikes:
pk    Owner    Attribute   Values
1     3        7           2
2     3        8           6
3     4        7           1        
4     4        8           5

One row describes one element of a bike.

Code:
SELECT tblOwners.value, tblAttributes.value, tblValues.value
FROM bikes AS tblBikes INNER JOIN
     values AS tblOwners ON tblOwners.pk = tblBikes.owner INNER JOIN
     values AS tblAttributes ON tblAttributes.pk = tblBikes.attribute INNER JOIN
     values AS tblValues

Max has a red 3-speed. Cindy has a blue 18-speed.

You know, I just realized that I added and "Active" column that is currently unused. Couldn't I assign Active = 1 for the valid rows and Active = 0' to the other rows and them re-index?

-Geates

 
You are describing an Entity-Attribute-Value model.


Generally speaking, EAV's have relatively bad performance because you end up writing more code.

I assume the bikes table is the one with approximately 500,000 rows. I'm guessing the Values table has just a couple thousand rows. Is this correct?

In your original post, you were treating a column as a DateTime value. Is this actually a DateTime value in the database table (bikes) or is this part of the values table?

Using the Active column may help, but I wouldn't bet on it. Unless I am mistaken, almost all of the data would be considered inactive (active = 0). For SQL Server to use a column in an index, it's better to have high selectivity. Think of it this way... a phone book is sorted by last name. Some last names will only appear once which is very selective. Other names may appear many times (think aunts, uncles and cousins living in the same town). This is still very selective because that name (as a percentage of the total) appears only a few times. Now consider a phone book that is first sorted by gender. 1/2 of the book would be women and the other 1/2 would be men. This wouldn't be too helpful when trying to find someone in the phone book even if you know the gender.

Can you run the following queries and post the results here:

Code:
Select table_name, 
       column_name, 
       data_type, 
       character_maximum_length 
From   information_schema.columns 
order by table_name, ordinal_position

Code:
sp_helpindex 'bikes'

Code:
sp_helpindex 'values'

Once I have a better idea regarding your table structures, I should be able to give better advice.



-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
This script produce result you are asking
SQL:
declare @Values as table
(
pk int,
value varchar(20)
)
insert into @Values
select 1,     'red'
union
select 2,     'blue'
union
select 3,     'Cindy'
union
select 4,     'Max'
union
select 5,     '3-speed'
union
select 6,     '18-speed'
union
select 7,     'Color'
union
select 8,     'Speed'

declare @Bikes as table
(
pk int,
Owner int,
Attribute int,
[Values] int
)

insert into @Bikes
select 1,     3,        7,           2
union
select 2,     3,        8,           6
union
select 3,     4,        7,           1        
union
select 4,     4,        8,           5

;with cte as
(		
select tblOwners.value as ownerName,
		tblAttributes.value as Attribute,
		tblValues.value as Value
from @Bikes tblBikes
	join @Values tblOwners
		on tblBikes.Owner = tblOwners.pk
	join @Values tblAttributes
		on tblBikes.Attribute =  tblAttributes.pk
	join @Values tblValues
		on tblBikes.[Values] = tblValues.pk	
)
SELECT ownerName, 
       STUFF((  SELECT ' ' + Value
          FROM cte T2
          WHERE T1.ownerName = T2.ownerName
          FOR XML PATH('')), 1, 1, '') AllNames
FROM cte T1
GROUP BY ownerName
:)
 
@gk53
Thanks, I'll review the code.

@gmmastros
"I assume the bikes table is the one with approximately 500,000 rows. I'm guessing the Values table has just a couple thousand rows. Is this correct?"
Sort of. The 'Assets' table (bikes) has about 500,000 rows and the 'Values' table has nearly 3 millions rows.

"In your original post, you were treating a column as a DateTime value. Is this actually a DateTime value in the database table (bikes) or is this part of the values table?"
The column I was treating as a DateTime value is actually a DateTime value from a 'TimeStamp' table with 2 columns - a PK and a DateTime value. These timestamps reference logon time, boot time, and net time. And "stringified" versions are stored on the 'Values' table for searching purposes (hence the 3 million rows). It was a quick solution to allow to a user request. However, I'd like to not include them on the 'Values' table (I would assume removing the need to add them to the 'Values' table will greatly improve performance).

"Unless I am mistaken, almost all of the data would be considered inactive (active = 0)."
About 90%

"Can you run the following queries and post the results here:"
Code:
select table_name, column_name, data_type, character_maximum_length 
from   information_schema.columns 
order by table_name, ordinal_position
Code:
Audits_Assets		pk		int		NULL
Audits_Assets		Host		int		NULL
Audits_Assets		Component	int		NULL
Audits_Assets		Attribute	int		NULL
Audits_Assets		Value		int		NULL
Audits_Assets		TimeStamp	int		NULL
Audits_Assets		Active		tinyint		NULL
Audits_TimeStamps	pk		int		NULL
Audits_TimeStamps	Value		datetime	NULL
Audits_Values		pk		int		NULL
Audits_Values		Value		varchar		75

I like this style of database because adding new elements to the audit is a breeze. But, as you pointed out, this method has a lot of cogs with a lot of teeth.

-Geates

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top