Sunday, July 13, 2014

[MSSQL] Deleting Large Number of Records

By Lynn Pettis, 2011/02/18 (first published: 2009/09/15)

Introduction

One of the frequent questions asked on forums is how to delete a large number of records from a table in SQL Server. The question can come from several directions. Sometimes it is the result of excess growth of the database's transaction log, or that the table must remain available to users during the delete process, or simple that the user is looking for an efficient means to delete records on a regular basis.
With this, I'd like to present one method that can be used to delete records from a table that will allow the table to remain available to users and will help keep the transaction log from growing excessively. I will show how this can be done in SQL Server 2000 and SQL Server 2005/2008.

The Setup

The first thing we need to do, however, is setup the table from which we will be deleting a large number of records. Most requests for help in this area come from individuals that are asking how to delete several million rows of data from a multi-million row table. For this code we will create a table with 20 million rows of data, and we will delete approximately 10%, or around 2 million, of those rows. The test data that will be generated below will have transaction dates (TDate) from January 1, 2000 through December 31, 2009. We will purge all data from the Year 2000.
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
USE [SandBox] -- This is a my test bed database
GO
-- Create the test table
create tabledbo.LAPTest (
     AccountID int,
     Amount money,
     TDate datetime
)
create clustered index IX_LAPTest_TDate on dbo.LAPTest (      TDate asc  )
GO
--===== Build the table 100000rows at a time to "mix things up"
DECLARE @CounterINT
SET @Counter = 0
WHILE @Counter < 20000000
BEGIN
     --===== Add 100000rows to the test table
     INSERT INTO dbo.LAPTest(
         AccountID,
         Amount,
         Date)
     SELECT TOP 100000
         AccountID = ABS(CHECKSUM(NEWID()))%50000+1,
         Amount    = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),
         Date      = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0ASDATETIME)
     FROM
         master.dbo.syscolumns t1
         CROSS JOINmaster.dbo.syscolumns t2
     --===== Increment the counter
     SET @Counter = @Counter + 100000
END
GO--===== Build the table 100000rows at atime to "mix things up"
Now that we have our test table and data, we are ready to start work on the code that will delete all records for the Year 2000.

SQL Server 2000

The first code snippet I will write will be for deleting the records from table in SQL Server 2000. Although this code will also work in SQL Server 2005, it will not work in future versions of SQL Server. For that reason, I will rewrite the code I develop for SQL Server 2000 to use the new capabilities available in SQL Server 2005.
I have made every effort to make this code generic, and it should work with minimal changes. As it may be used in databases that are set to use either the SIMPLE, BULK LOGGED, or FULL recovery models; I have included conditional logic to perform a transaction log backup if the database is not using the SIMPLE recovery model.
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
declare @recoverymodel varchar(12),
        @dbname sysname,
        @backupdevice sysname,
        @batchsize int,
        @loopcnt int,
        @year smallint,
        @backupfile nvarchar(512)
select
    @recoverymodel = cast(DATABASEPROPERTYEX ( db_name() , 'Recovery') as varchar(12)),
    @dbname = db_name(),
    @backupdevice = 'D:\Backups\',
    @batchsize = 10000,
    @loopcnt = 0,
    @year = 2000
   
set rowcount @batchsize -- used for SQL Server 2000 to limit rows affected
   
while @batchsize <> 0
begin
    delete from
        dbo.LAPTest
    where
        TDate >= dateadd(yyyy, (@year - 1900), 0) and
        TDate <  dateadd(yyyy, (@year - 1900) + 1, 0)
    set @batchsize = @@rowcount
    if @batchsize > 0
       and @recoverymodel <> 'SIMPLE'
        begin
            set @loopcnt = @loopcnt + 1
            set @backupfile = @backupdevice + @dbname + N'_' + cast(@loopcnt as nvarchar(8)) + N'.trn'
            backup log @dbname
            to disk = @backupfile
            with description = N'T-Log Backup during purge'
        end
end
   
set rowcount 0 -- reset rows affected after processing, SQL Server 2000

SQL Server 2005

Now we will take the code above and modify it to work using the new features introduced in SQL Server 2005. This will also allow this code to be used in SQL Server 2008 and future versions as well. As you look at the code you will very little difference between the two versions. What are missing from the SQL Server 2005 version are the set rowcount statements and the addition of the TOP clause in the delete statement.
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
declare @recoverymodel varchar(12),
        @dbname sysname,
        @backupdevice sysname,
        @batchsize int,
        @loopcnt int,
        @year smallint,
        @backupfile nvarchar(512)
select
    @recoverymodel = cast(DATABASEPROPERTYEX ( db_name() , 'Recovery') as varchar(12)),
    @dbname = db_name(),
    @backupdevice = 'D:\Backups\',
    @batchsize = 10000,
    @loopcnt = 0,
    @year = 2000
   
while @batchsize <> 0
begin
    delete top (@batchsize) from
        dbo.LAPTest
    where
        TDate >= dateadd(yyyy, (@year - 1900), 0) and
        TDate <  dateadd(yyyy, (@year - 1900) + 1, 0)
    set @batchsize = @@rowcount
    if @batchsize > 0
       and @recoverymodel <> 'SIMPLE'
        begin
            set @loopcnt = @loopcnt + 1
            set @backupfile = @backupdevice + @dbname + N'_' + cast(@loopcnt as nvarchar(8)) + N'.trn'
            backup log @dbname
            to disk = @backupfile
            with description = N'T-Log Backup during purge'
        end
end

Variations

Now that you have the basic generic code to delete large volumes of data from a table, you may decide that you don't want 200+ transaction log backup files should you have to do a restore to a point in time. If your database is not using the SIMPLE recovery model, you could modify the conditional used in the IF statement to decide if the transaction log backup should be run. One change you could make is to only run the transaction log backup if (@loopcnt modulo (some value) = 0 and @batchsize > 0) OR (@batchsize = 0). In addition to this logic, you could also add a differential backup after the purge is run as well.

Conclusion

With these sample code snippets, I hope you can see how easily you can batch your deletes allowing the table to remain relatively open for continued use, and at the same time keep your transaction log files manageable when your database is using the BULK LOGGED or FULL recovery model.
Source: http://www.sqlservercentral.com/articles/T-SQL/67898/

No comments:

Post a Comment