Delete problem.... help?!

DrD123 · Jan 4, 2005

Hi... (I'm mainly and Oracle guy) working w/DB2 7.2.9 32-bit w/FP11 on Solaris 8 on a v440 w/2 CPU's and a 3310 array. Using VxVM 4.0 to manage the striping (8k stripe). Using 6 spindles on that table.

We have a table w/~16mil rows, they are ~15 days of data, ~1 mil a day. (the table will be growing (maybe doubling in the very near future.)

Just FYI I did a DEL unload and it was 1.6GB.

This table needs to store only 15 days and I need to delete the 15 day every day, as new data is constantly streaming in 24 hours a day. We only need to house 15 days.

The developers created a delete script (KSH) that selects and creates files in /tmp for hourly lists and they do a SED and grab line by line from the list and delete from the dB. This is heavy on the CPU and extra I/O on /tmp FS. During this time (several hours at times) there are locks occuring in the dB and the delete script and program streaming in data, are fighting for the table.

I know there has to be a better way using the functions within DB2.

I've looked at trying to create a C/C++ program that could use a Cursor, but have only seen it used for updates and inserts and not Deletes. There is an index on the date when the record comes in and an ID# of the "customer".
Each row is only ~100bytes.

My goals:
1) delete the 15th day everyday in the shortest period of time.
2) get rid of the deadlocks, as they cause the program doing the 24-hr streaming to timeout, and can get behind in its' work.
3) get this done ASAP.

Any help would be greatly appreciated.
Thanx
DrD123

MarcLodge · Jan 5, 2005

DrD123,
How often do you wish to delete the data, and by that I mean that do you wish to delete as each row becomes greater than 15 days old, or check for > 15 days every hour, 2 hours or just once a day?

Also, is there any possibility of altering the structure of the table, by partitioning it? Lastly, you mention a date field, is this a date, or is it a timestamp?

(sorry for all the questions)

Marc

DrD123 · Jan 5, 2005

Hi Marc,

Thanx for the swift reply... and there are never too many questions...

The delete would check only once a day(during a less busy hour) for >15 days from current date, and delete just that day.
The index is on a column that'a a date, not timestamp.

The developers have chopped up other tables into "daily tables" that the same socket programs write to, as this table. I've created views for report purposes on those tables. Which would be easy and a good option.

But here's the kicker...> I've been assigned to find an option w/o having to slice and dice this table. That's why I was thinking of declaring a Cursor for the delete, but I've only seen examples of using it for updating and inserting of data in v7.2.9, and not deleting. Or if someone knows of a better solution. So it can be a compiled program or stored procedure... whichever will stop the deadlock/rollback, and several hour delete from going on.

This table I have to work with, takes the above mentioned data and summarizes it based on certain criteria. So there might be 5 millions rows of data coming in each day, but only 1 million might get summarized and need to go into this table.

So this table has 2 socket programs checking its' indexes, inserting or updating and the delete hitting it once a day, daily reports for the previous days' data. Also can have some customer interaction if they want to run ad hoc reports for the data leading back anytime to the 15 days.

When no other program is hitting the tables, while the deletes go on, it can be finished in well under an hour (which is more than acceptable), but with the sockets and reports, etc... a delete can take ~7hours at times to delete >1million rows (and this number will grow, as I've mentioned in the first post).

Just abit more info... the Logs are set for recovery and I don't have a problem w/them at all. They're maintained and rotated fine w/o log full problems.

Thanx again,
DrD123

stevexff · Jan 6, 2005

This is just an idea. Split the process into two parts.

Code:

SELECT unique_key FROM big_table WHERE create_timestamp < CURRENT TIMESTAMP - 15 DAYS WITH UR;

The dirty read won't lock the table at all, so your regular processing can carry on as normal. Also, crucially, it won't interfere with the second process. Feed the returned keys (maybe via a pipeline) into another process that can delete the rows one at a time using the unique key. After a given number of deletions (make it parameter driven, so you can tune it), take a SYNCPOINT to commit the changes. This will prevent the lock escalation that you are getting at the moment.

Hope this helps.

DrD123 · Jan 24, 2005

Hi...
Still looking for something... what I've been assigned is explicitly to use a Cursor. Partially, I think, to show that the developers didn't do what they could've, but that besides the point.
The unique_index is on a date field. So what I need to do is create a ksh script that: declare With Hold a cursor, opens, fetches, frequent commits (variable driven if possible), move the cursor to next row(s) or end, catches errors, and closes and deallocates the cursor when done. The number of days to delete is X from current date,(I'd also like to make that variable driven).
I can declare the cursor (select UQ_column from huge_table where days(UQ_column) > (current date - X days), it seems I can't get the syntax for the delete and commits. I see in all types of scripts, docs and examples of inserts and updates using Cursors, but not deletes.
Any help would be great...
Thnx again
DrD123

Hattusas · Jan 25, 2005

Does this table have many relations with the other tables?
Database might be spending so much time for checking RI.Did you analysis your query?(Explain sql).
You might need to add indexes as well

DrD123 · Jan 26, 2005

Hi... thnx for the response.
No FK to or from any other tables. This whole thing is result of poor development. They didn't even look at the DB2 App Development or Building Guide initally. Other wise they would've built something based on a cursor. What they have going on is a fat query that then sends the results to a file in the /tmp directory then they go back and use SED to re-insert each line of the file into a delete statement.
Now Development wants to go through something that will require recoding 6 other programs to accomodate splitting this table up into "individual tables for each day" then just inserting an empty DEL.unl file, instead of creating what I'm trying to come up with.
This should be able to be done through a compiled SQL and not have to touch the /tmp or any FileSystem at all. Because they'll still have to dynamically create an empty DEL.unl file on a FileSystem.
I'm resorting a C program as I see FETCH cannot be used on an interpreted language or in a KSH.
So where I'm stuck is after the FETCH, I want to commit every X amount, X=host-variable defined in EXEC SQL BEGIN DECLARE SECTION. This way I can control the redo logs as needed.
Or can I only create a LOOP that will fetch and delete one row at a time?
Any example(s) would be great too...
Thanx again
DrD123

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Delete problem.... help?!

DrD123

IS-IT--Management

MarcLodge

Programmer

DrD123

IS-IT--Management

stevexff

Programmer

DrD123

IS-IT--Management

Hattusas

Programmer

DrD123

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor