Tuesday, April 7, 2009

[t-sql] What is the difference between SET and SELECT when assigning values to variables?

Traditionally, SQL Server database developers are accustomed to using SELECT for assigning values to variables. This was fine and a perfectly valid practice right until SQL Server 6.5. Microsoft released SQL Server 7.0 in 1999. SQL Server 7.0 introduced the new SET statement for initializing and assigning values to variables. SQL Server 7.0 Books Online also stated: "It is recommended that SET @local_variable be used for variable assignment rather than SELECT @local_variable."

This caused some confusion in the database developer community, as Microsoft never mentioned, why SET is recommended over SELECT for assigning values to variables. In this article, I will highlight all the major differences between SET and SELECT, and things you should be aware of, when using either SET or SELECT.

If you are completely new to T-SQL, then the following examples give you an idea of what I am talking about:
    /* Declaring variables */
DECLARE @Variable1 AS int, @Variable2 AS int

/* Setting @Variable1 to a value of 1 using SELECT */
SELECT @Variable1 = 1

/* Setting @Variable2 to a value of 2 using SET */
SET @Variable2 = 2

Now coming to the differences between SET and SELECT! Are standards important to you? If your answer is 'yes', then you should be using SET. This is because, SET is the ANSI standard way of assigning values to variables, and SELECT is not.

Another fundamental difference between SET and SELECT is that, you can use SELECT to assign values to more than one variable at a time. SET allows you to assign data to only one variable at a time. Here's how:
    /* Declaring variables */
DECLARE @Variable1 AS int, @Variable2 AS int

/* Initializing two variables at once */
SELECT @Variable1 = 1, @Variable2 = 2

/* The same can be done using SET, but two SET statements are needed */
SET @Variable1 = 1
SET @Variable2 = 2

So far so good. But if you ever wrote error handling code in T-SQL, you most probably are aware that, the system variables @@ERROR and @@ROWCOUNT must be captured in one statement, immediately after a data manipulation (DML) statement like INSERT, UPDATE, DELETE, or else, these system variables get reset to 0. So, if you want to stick to the standards and use SET in this scenario, you are out of luck. The following example demonstrates the problem:
    DECLARE @Error int, @RowCount int
SELECT price/0 FROM dbo.titles
SET @RowCount = @@ROWCOUNT
SET @Error = @@ERROR
SELECT @Error AS Error
GO

If you run the above piece of code in pubs database, the value of @@ERROR system variable will be displayed as 0, even though the 'division by zero' resulted in error 8134. So, in this particular scenario, forget about standards and use SELECT, as shown below:
    DECLARE @Error int, @RowCount int
SELECT price/0 FROM dbo.titles
SELECT @RowCount = @@ROWCOUNT, @Error = @@ERROR
SELECT @Error AS Error

But if you insist on using SET even in this scenario, there's always a way out. Here's one example, though not readable and recommended:
    DECLARE @ErrorAndRowcount AS varchar(25), @Error int, @RowCount int
SELECT price/0 FROM dbo.titles

/* Capturing @@ERROR and @@ROWCOUNT into a dot separated string */
SET @ErrorAndRowcount = CAST(@@ERROR AS varchar(12)) + '.' + CAST(@@ROWCOUNT AS varchar(12))

/* One way to separate the string into error and rowcount variables */
SET @Error = CAST(PARSENAME(@ErrorAndRowcount, 2) AS int)
SET @RowCount = CAST(PARSENAME(@ErrorAndRowcount, 1) AS int)
SELECT @Error AS Error, @RowCount AS Row_Count

/* Another way of splitting the string into error and rowcount variables */
SET @Error = CAST(LEFT(@ErrorAndRowcount, CHARINDEX('.', @ErrorAndRowcount)-1) AS int)
SET @RowCount = CAST(RIGHT(@ErrorAndRowcount, CHARINDEX('.', REVERSE(@ErrorAndRowcount))-1) AS int)
SELECT @Error AS Error, @RowCount AS Row_Count
GO

Moving on to other differences between SET and SELECT: When using a query to populate a variable, SET will fail with an error, if the query returns more than one value. But SELECT will assign one of the returned rows and mask the fact that the query returned more than one row. As a result, bugs in your code could go unnoticed with SELECT, and this type of bugs are hard to track down too. Here is an example:
    /* Consider the following table with two rows */
SET NOCOUNT ON
CREATE TABLE #Test (i int, j varchar(10))
INSERT INTO #Test (i, j) VALUES (1, 'First Row')
INSERT INTO #Test (i, j) VALUES (1, 'Second Row')
GO

/* Following SELECT will return two rows, but the variable
gets its value from one of those rows, without an error.
This may not be what you were expecting. Since no error is returned,
you will never know that two rows existed for the condition, WHERE i = 1 */
DECLARE @j varchar(10)
SELECT @j = j FROM #Test WHERE i = 1
SELECT @j
GO

/* If you rewrite the same query, but use SET instead,
for variable initialization, you will see the following error */
DECLARE @j varchar(10)
SET @j = (SELECT j FROM #Test WHERE i = 1)
SELECT @j

Server: Msg 512, Level 16, State 1, Line -1074284106
Subquery returned more than 1 value.
This is not permitted when the subquery follows =, !=, <, <= , >, >= or
when the subquery is used as an expression.

Based on the above results, when using a query to populate variables, I suggest you always use SET, if you want to be sure that only one row is returned. If you hate SET for some reason, you could get the same behavior of SET, using SELECT, as shown below:
    DECLARE @j varchar(10)
SELECT @j = (SELECT j FROM #Test WHERE i = 1)
SELECT @j

Here is another difference with respect to assigning values based on a query, especially when the query doesn't return any rows. Run the following example in the pubs database, and you will see what I mean:
    /* Returns NULL */
DECLARE @Title varchar(80)
SET @Title = 'Not Found'

SET @Title = (SELECT title
FROM dbo.titles
WHERE title_id = 'InvalitTitleID')

SELECT @Title
GO

/* Returns the string literal 'Not Found' */
DECLARE @Title varchar(80)
SET @Title = 'Not Found'

SELECT @Title = title
FROM dbo.titles
WHERE title_id = 'InvalitTitleID'

SELECT @Title
GO

Last, but not the least! Is there any performance difference between SET and SELECT? Is one faster or slower than the other? This is one question most database developers and DBAs are not so sure about. So I decided to conduct a test and come up with some conclusive results. I picked a development SQL Server for this test. Closed all applications, and stopped all unnecessary services running on that machine. Stopped SQL Server agent service, to make sure, no jobs kick in during the performance test. Also, unplugged the machine from the network. So, this is one isolated SQL Server box, with nothing but just SQL Server service running on it. Then I created a test script, that continuously assigns values to variables inside a loop (of configurable iterations) using SET, SELECT and measures the time taken to complete each loop.

Here are the results:

There is hardly any performance difference between SET and SELECT, when initializing/assigning values to variables. BUT, I made one startling discovery. As you all know, one single SELECT statement can be used to assign values to multiple variables. This very feature of SELECT makes it a winner over SET, when assigning values to multiple variables. A single SELECT statement assigning values to 3 different variables, is much faster than 3 different SET statements assigning values to 3 different variables. In this scenario, using a SELECT is at least twice as fast, compared to SET. So, the conclusion is, if you have a loop in your stored procedure that manipulates the values of several variables, and if you want to squeeze as much performance as possible out of this loop, then do all variable manipulations in one single SELECT statement (or group the related variables into few SELECT statements) as show below:
    SELECT @TestVar1 = @TestVar1 + 1, @TestVar2 = @TestVar2 - 1, @CTR = @CTR + 1

I ran this test on SQL Server versions 7.0, 2000 and SQL Server 2005 (Yukon), and the results were consistent. I even tested this on single and multi-processor boxes, and the results were the same. If you want to test this yourself, feel free to use the following test script. A word of caution though, do not run this script on a production SQL Server, as it could lead to 100% CPU utilization for the duration of the test. Also, if you think the test is taking too long, reduce the value of the variable @TimesToLoop2, to reduce the number of iterations. At the end of the test, the script displays how much time (in Seconds) it took to assign values to variables using SET, SELECT and SELECT with multiple assignments. Here's the script:
    DECLARE @Test1 int,  @Test2 int, @Test3 int, @TestVar1 int, @TestVar2 int
DECLARE @Loop int, @Start datetime, @CTR int, @TimesToLoop1 int, @TimesToLoop2 int

SET @Test1 = 0
SET @Test2 = 0
SET @Test3 = 0
SET @Loop = 0
SET @TestVar2 = 0
SET @TimesToLoop1 = 10
SET @TimesToLoop2 = 50000
WHILE @Loop < @TimesToLoop1
BEGIN
SET @Start = CURRENT_TIMESTAMP
SET @CTR = 0

/* Testing the performance of SET */
WHILE @CTR < @TimesToLoop2
BEGIN
SET @TestVar1 = 1
SET @TestVar2 = @TestVar2 - @TestVar1
SET @CTR = @CTR + 1
END

SET @Loop = @Loop + 1
SET @Test1 = @Test1 + DATEDIFF(ms, @Start, CURRENT_TIMESTAMP)
END

SET @Loop = 0
SET @TestVar2 = 0
WHILE @Loop < @TimesToLoop1
BEGIN
SELECT @Start = CURRENT_TIMESTAMP
SELECT @CTR = 0

/* Testing the performance of SELECT */
WHILE @CTR < @TimesToLoop2
BEGIN
SELECT @TestVar1 = 1
SELECT @TestVar2 = @TestVar2 - @TestVar1
SELECT @CTR = @CTR + 1
END

SELECT @Loop = @Loop + 1
SELECT @Test2 = @Test2 + DATEDIFF(ms, @Start, CURRENT_TIMESTAMP)
END

SET @Loop = 0
SET @TestVar2 = 0
WHILE @Loop < @TimesToLoop1
BEGIN
SELECT @Start = CURRENT_TIMESTAMP, @CTR = 0

/* Testing the performance of SELECT with multiple variable assignments */
WHILE @CTR < @TimesToLoop2
BEGIN
SELECT @TestVar1 = 1, @TestVar2 = @TestVar2 - @TestVar1, @CTR = @CTR + 1
END

SELECT @Loop = @Loop + 1, @Test3 = @Test3 + DATEDIFF(ms, @Start, CURRENT_TIMESTAMP)
END

SELECT
(@Test1/CAST(@TimesToLoop1 AS decimal(7,2)))/1000.00 AS [SET],
(@Test2/CAST(@TimesToLoop1 AS decimal(7,2)))/1000.00 AS [SELECT],
(@Test3/CAST(@TimesToLoop1 AS decimal(7,2)))/1000.00 AS [SELECT with Multiple Assignments]

No comments:

Post a Comment