SQLSENSE: SQL Server Practical Approaches

Thursday, September 4, 2008

A Tricky Scenario with T-SQL

The good part of T-SQL is that unlike most other areas of databases in general, one needs to think in order to write a good piece of valid code. Thinking not just to establish functionality but also to ensure performance. Now there are hundreds, even thousands of scenarios where the code can be better than what it has been already written as just to establish functionality. Let us take up one such scenario.

There is a supplier of Dell, HP, and IBM servers. Servers are shipped from the warehouse to three locations. Each location is represented by a location code. Each machine to be shipped to the warehouse has a ten digit Machine_X_Code that has leading zeroes for data that has less than ten digits in it. All this is represented roughly by the Main_Table below:

create table Main_Table
(Location_Code smallint not null
,Machine_Brand varchar(25) not null
,Machine_X_Code varchar(10) not null
,Number_Of_CPUs tinyint not null
,Procurement_Date Datetime not null default GETDATE()
constraint PK_LC_MC Primary Key Clustered (Location_Code, Machine_X_Code, Procurement_Date) with fillfactor = 70)
go

There is another table called Code_Generation_Table. This table determines what the Machine_X_Code for the next procured machine would be. For a given combination of store location code and a machine brand in this Code_Generation_Table, the Machine_X_Code for the Main_Table needs to be generated. The new machine_X_Code generated should be within the Start_code and End_Code range for that Location_Code and Machine_Brand. The table is as follows:

create table Code_Generation_Table
(Location_Code smallint not null
,Machine_Brand varchar(25) not null
,Start_Code int not null
,Current_Code int not null
,End_Code int not null
,Increment_Value tinyint not null
constraint PK_LC_MB Primary Key clustered (Location_Code, Machine_Brand) with fillfactor = 90)
go

Lets populate these two tables now with relevant data:

insert into Code_Generation_Table
select 1, 'DELL', 1001, 1001, 1005, 1
union all
select 2, 'DELL', 4001, 4001, 4005, 1
union all
select 3, 'DELL', 7001, 7001, 7005, 1
union all
select 1, 'HP', 2001, 2003, 2005, 1
union all
select 2, 'HP', 5001, 5003, 5005, 1
union all
select 1, 'IBM', 3001, 3001, 3005, 1
union all
select 2, 'IBM', 8001, 8003, 8005, 1
union all
select 3, 'IBM', 6001, 6004, 6005, 1
union all
select 3, 'SUN', 6011, 6011, 6015, 1

go

insert into Main_Table (Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs)
select 1, 'DELL', '0000001001', 2
union all
select 1, 'DELL', '0000001002', 4
union all
select 1, 'DELL', '0000001003', 8
union all
select 1, 'DELL', '0000001004', 16
union all
select 1, 'HP', '0000002002', 2
union all
select 1, 'HP', '0000002003', 4
union all
select 1, 'HP', '0000002004', 8
union all
select 1, 'HP', '0000002005', 32
union all
select 1, 'IBM', '0000003003', 2
union all
select 1, 'IBM', '0000003004', 8
union all
select 1, 'IBM', '0000003005', 32
union all
select 2, 'DELL', '0000004001', 2
union all
select 2, 'DELL', '0000004002', 4
union all
select 2, 'DELL', '0000004003', 8
union all
select 3, 'DELL', '0000007004', 16
union all
select 2, 'HP', '0000005002', 2
union all
select 2, 'HP', '0000005003', 4
union all
select 2, 'HP', '0000005004', 8
union all
select 2, 'HP', '0000005005', 32
union all
select 3, 'IBM', '0000006001', 2
union all
select 3, 'IBM', '0000006003', 2
union all
select 3, 'IBM', '0000006004', 8
union all
select 3, 'IBM', '0000006005', 32
Go

Here are the conditions that must be met for generating the Machine_X_Code:
1. Start from Current_Code value in the table, Code_Generation_Table.
2. Look for any values between Current_Code and End_Code that don't already exist in
the Machine_X_Code column of the Main_Table for the given Location_Code and
Machine_Brand. That becomes the new Machine_X_Code.
3. Given a scenario like the one below in the table Code_Generation_Table, the following conditions must also be met:
Location_Code, Machine_Brand, Start_Code, Current_Code, End_Code, Increment_Value
1 DELL 1001 1003 1005 1

Scenario 1:

If there is no record in Main_Table for that Machine Brand, then new Machine_X_Code needs to be 0000001003.

Scenario 2:

If Main_Table has records with Machine_X_Codes 0000001003, 0000001004, 0000001005 then the new Machine_X_Code will be 0000001001.

Scenario 3:

If Main_Table has all records with Machine_X_Code between and including 0000001001 to 0000001005 except 0000001002, then the next newly generated Machine_X_Code should be 0000001002.

The requirements having been made clear, let us see how a developer approached this problem. He chose to create a stored procedure (SP), usp_Generate_Machine_X_Code and a user-defined function (UDF), F_Validate_Machine_Code whihc is invoked from within the SP. The SP gives out the new Machine_X_Code. The unnctionality of the code below is explained in more granular detail just before the object definitions below. The code for both is given below.

Code for the user-defined function (UDF), F_Validate_Machine_Code:

--UDF
--This function checks that thet newly generated Machine_X_Code lies between
--the Start_Code and the End_Code. If it does, then the newly generated Machine_X_Code
--represented by the variable @l_Temp_Machine_X_Value is valid. If by chance, the newly
--generated Machine_X_Code lies outside the range of Start_Code and End_Code, then
--the new machine_x_code is initialized to the Start_Code generated Machine_X_Code
--reprsented by the variable, @l_Start_Machine_X_Value. The newely generated
--machine_X_code, the Start_Code and the ENd_Code are inputs to this function.
--A supposedly valid Machine_X_Code is returned by this function. The output however
--is not guarenteed to be valid unless it is re-checked for validity in the SP,
--usp_Generate_Machine_X_Code. If it is found to be invalid, then this
--user-defined function is invoked again and again as need be.

CREATE Function DBO.F_Validate_Machine_Code
(@l_Temp_Machine_X_Value Int,
@l_Start_Machine_X_Value int,
@l_End_Machine_X_Value int
) Returns Integer
AS
Begin
-- This function validates the whether the newly generated Machine_X_Code lies within the Range of Start_Code and End_Code.
If (@l_Temp_Machine_X_Value > @l_Start_Machine_X_Value)
or (@l_Temp_Machine_X_Value < @l_End_Machine_X_Value)
Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value

If (@l_Temp_Machine_X_Value < @l_Start_Machine_X_Value)
or (@l_Temp_Machine_X_Value > @l_End_Machine_X_Value)
Set @l_Temp_Machine_X_Value = @l_Start_Machine_X_Value

Return @l_Temp_Machine_X_Value
End
GO

Code for the stored procedure (SP), usp_Generate_Machine_X_Code:

--This SP's purpose is to generate a new Machine_X_Code. It generates a new
--Machine_X_Code from the Current_Code column of the table, Code_Generation_Table
--while checking that the newly generated Machine_X_Code does not already exist
--in the Machine_X_Code column of the table Main_Table. In addition it call the
--user-defined function F_Validate_Machine_Code to check that a newly generated
--Machine_X_Code lies between Start_Code and End_Code in the table, Code_Generation_Table.
--If the newly generated machin_x_code falls outside that range, then the machine_x_code
--is reset to Start_Code. In this whole process it keeps updating the Current_Code column
--of the table, Code_Generation_Table with the newly generated Machine_X_Code before its
--validation through the function, F_Validate_Machine_Code. It also updates the the
--Current_Code column of the table, Code_Generation_Table with the next possible
--Machine_X_Code value. Looping occurs till a valid Machine_X_Code is found. So,
--effectively there would be as many iterations of the loop as the number of attempts to
--find one good Machine_X_Number.

Create Procedure usp_Generate_Machine_X_Code
@i_Location_Code Smallint,
@i_Machine_Brand VarChar(10),
@o_Machine_X_Code Integer OutPut,
@o_Return_Value Smallint OutPut
As
Begin
Set nocount on
Set @o_Return_Value = 0

Declare @l_Increment_Value Smallint,
@l_Temp_Machine_X_Value Integer,
@l_Next_Machine_X_Value Integer,
@l_Start_Machine_X_Value Int,
@l_End_Machine_X_Value Integer,
@l_First_Machine_X_Value_Taken Integer,
@l_Affected_Rows Smallint,
@l_Execution_String VarChar(2000)

Select @l_Temp_Machine_X_Value = Current_Code
,@l_Increment_Value = Increment_Value
,@l_Start_Machine_X_Value = Start_Code
,@l_End_Machine_X_Value = End_Code
From DBO.Code_Generation_Table With (NoLock)
Where Location_Code = @i_Location_Code
And Machine_Brand = @i_Machine_Brand

Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

Set @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value

Set @l_Affected_Rows = 0

While @l_Affected_Rows = 0
Begin

Set @l_Next_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value

Set @l_Next_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Next_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

Set @l_Execution_String = 'Update A Set A.Current_Code = ' + Cast(@l_Next_Machine_X_Value as VarChar) +
' From Dbo.Code_Generation_Table A '+
+ ' Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) + ' And Machine_Brand = ' + ''''+ @i_Machine_Brand + '''' +
' And Not Exists (Select ' + ''''
+ 'x' + '''' + ' From Main_Table ' +
'Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) +
' AND Machine_Brand = ' + '''' + @i_Machine_Brand + '''' +
' And Machine_X_Code = ' + '''' + Replicate('0',10 - Len(@l_Temp_Machine_X_Value)) + Cast(@l_Temp_Machine_X_Value as Varchar(10)) + '''' + ')'

Exec (@l_Execution_String)

Set @l_Affected_Rows = @@Rowcount

If @l_Affected_Rows = 1
Begin
Select @o_Machine_X_Code = @l_Temp_Machine_X_Value, @o_Return_Value = 1
Break
End

Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value

Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

If @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Begin
-- For the current day all the Machine X Codes have been used up.
Select @l_Affected_Rows = 2, @o_Return_Value = -1
Break
End

End
Set nocount off
End
GO

Before we proceed further, please familiarize yourself with the logic of the SP and UDF, and the data in the two tables so that we can try to execute his logic. Let us try to generate a new Machine_X_Code for the Location_Code = 1 and Brand_Name = 'Dell'.

TEST 1:
Execute the following:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 1
and Machine_Brand = 'Dell'

The result set is as follows:

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
1 DELL 0000001001 2
1 DELL 0000001002 4
1 DELL 0000001003 8
1 DELL 0000001004 16

So the next generated machine_x_code to has to be 1005. Remember zeroes are suffixed later and that is not so important right now. What is important is generating the new Machine_X_Code that meets all the requirements above.

Now run the following code:

select * from Code_Generation_Table

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
1 HP 2001 2003 2005 1
1 IBM 3001 3001 3005 1
2 DELL 4001 4001 4005 1
2 HP 5001 5003 5005 1
2 IBM 8001 8003 8005 1
3 DELL 7001 7001 7005 1
3 IBM 6001 6004 6005 1
3 SUN 6011 6011 6015 1

Since the codes (1001, 1002, 1003 and 1004) already exists in the Main_Table for Location_Code =1 and Machine_Brand = 'Dell', the next one has to be 1005 as it exists in the Code_Generation_Table between Start_Code and End_Code. Let's see if the SP generates this machine code. Execute the following code to execute the SP:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 1
,@i_Machine_Brand = 'Dell'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
1005 1

We got the exact machine code. Now let's check the table Code_Generation_Table for the column Current_Code by executing the following code:

select * from Code_Generation_Table
where Location_Code = 1
and Machine_Brand = 'Dell'

The result set is as follows:

Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1

You may note that the Current_Code has been updated. It has actually been initialized with the next possible Current_Code because 1005 has been taken by us via execution of the SP. So the next available value can be 1006. But since 1006 is outside the range of 1001 and 1005, the next possible value has to be re-initialized to the start_code which is 1001. So this condition has also been satisfied. Very well.

TEST 2:
Now let's test another condition:

Scenario 1:

For Location_Code = 2 and Machine_Brand = 'IBM', there is no record in the Main_Table. Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

(0 rows affected)

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8003 8005 1

So as per the given condition, the new machine_x_code shoudl be 8003 and the Current_Code column of the table, Code_Generation_Table should get updated with 8004. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 2
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
8003 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8004 8005 1

So the new Machine_X_Number generated in 8003 which is perfect. And the Crrent_Code column has also been updated to the next possible Machine_X_Number which is 8004. So the Scenario 1 has been satisfied.

TEST 3:
Let's check out if Scenario 2 is satisfied:

For Location_Code = 2 and Machine_Brand = 'HP', there are four records in the Main_Table with machine codes (0000005002, 0000005003, 0000005004 and 0000005005). Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'HP'

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
2 HP 0000005002 2
2 HP 0000005003 4
2 HP 0000005004 8
2 HP 0000005005 32

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5003 5005 1

So as per the given condition for Scenario 2, the new machine_x_code generated should be 5001 and the Current_Code column of the table, Code_Generation_Table should get updated with 8002. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 2
,@i_Machine_Brand = 'HP'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
5001 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5002 5005 1

So the new Machine_X_Number generated in 5001 which is perfect. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 5002. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exists within the range of Start_Code and End_Code. So the Scenario 2 has been satisfied.

TEST 4:
Let's check the Scenario 3, the final scenario:

For Location_Code = 3 and Machine_Brand = 'IBM', there are four records in the Main_Table with machine codes (0000006001, 0000006003, 0000006004 and 0000006005). Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
3 IBM 0000006001 2
3 IBM 0000006003 2
3 IBM 0000006004 8
3 IBM 0000006005 32

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6004 6005 1

So as per the given condition for Scenario 3, the new machine_x_code generated should be 6002 and the Current_Code column of the table, Code_Generation_Table should get updated with 6003. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 3
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
6002 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6003 6005 1

So the new Machine_X_Number generated in 6002 which was the missing code in Main_Table for Location_Code = 3 and Machine_Brand = 'IBM'. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 6003. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exist within the range of Start_Code and End_Code. So the Scenario 3 has been satisfied.

The developer has satisifed all the functional requirements. Good job. Now what are the drawbacks in his code? Let's enlist them:

1. He uses an SP and a UDF, two database objects for one purpose. This is a minor drawback though.
2. Looping is being used. In a rare case, it may have just one iteration. But in almost all cases it would have multiple iterations upto a maximum of five iterations. We have a simple set of scenarios here. If the range between Start_Code and Stop_Code runs into several hundreds, the looping will happen those many times most of the time. And the invocation of the UDF would take place double those many times for generating just one new Machine_X_Number.
3. The Update to Code_Generation_Table takes place as many times as the number of iterations which could run into several hundred times. So many UPDATE's just to generate one Machine_X_Number seems too much processing. It should be avoided.
4. The code is accessing the Main_Table as many times as the number of iterations. In our case the Main_Table has just 22 rows. In the real situation, it actually has hundreds of thousands of rows. Why scan that table so many times just to generate one valid Machine_X_Number?
5. The SP performs record-by-record operation or row-by-row. An attempt should be made to perform set based operation for faster and efficient execution. That would also reduce scans of the underlying tables.

These are only a few drawbacks and there are others like usage of dynamic T-SQL which is not needed in this case, etc. We won't go into those detailed aspects in this article. We will concentrate more on logical programming.

Let's try to satisfy the same scenarios after making some refinements to the SP and by eliminating the user-defined function altogether.

A refined version of the SP is given below:

--This SP's purpose is to generate a new Machine_X_Code. It generates a new
--Machine_X_Code from the Current_Code column of the table, Code_Generation_Table
--while checking that the newly generated Machine_X_Code does not already exist
--in the Machine_X_Code column of the table Main_Table. It also checks that a newly
--generated Machine_X_Code lies between Start_Code and End_Code in the table,
--Code_Generation_Table.
--If the newly generated machine_x_code falls outside that range, then the machine_x_code
--is reset to Start_Code. In this whole process it updates the Current_Code column
--of the table, Code_Generation_Table with the newly generated Machine_X_Code once
--or at the most twice. The function, F_Validate_Machine_Code has been eliminated altogether.
--It also updates the the Current_Code column of the table, Code_Generation_Table with
--the next possible Machine_X_Code value. If at all, Looping occurs only twice in a rare case.
--So effectively, there would be no looping except in a scenario where there would be two iterations
--of the code when a new machine_x_code does not exist between the Current_Code and
--the End_Code. In such a scenario, it would check the valid machine codes from Start_Code
--through Current_Code in the second iteration.

Create Procedure usp_Generate_Machine_X_Code
@i_Location_Code Smallint,
@i_Machine_Brand VarChar(10),
@o_Machine_X_Code Integer OutPut,
@o_Return_Value Smallint OutPut
As
Begin
Set nocount on
Set @o_Return_Value = 0
/* WHILE Loop eliminated. Main_Table is now being referenced only once. New code below.
Set @o_Return_Value = 0

Declare @l_Increment_Value Smallint,
@l_Temp_Machine_X_Value Integer,
@l_Next_Machine_X_Value Integer,
@l_Start_Machine_X_Value Int,
@l_End_Machine_X_Value Integer,
@l_First_Machine_X_Value_Taken Integer,
@l_Affected_Rows Smallint,
@l_Execution_String VarChar(2000)

Select @l_Temp_Machine_X_Value = Current_Code
,@l_Increment_Value = Increment_Value
,@l_Start_Machine_X_Value = Start_Code
,@l_End_Machine_X_Value = End_Code
From DBO.Code_Generation_Table With (NoLock)
Where Location_Code = @i_Location_Code
And Machine_Brand = @i_Machine_Brand

Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

Set @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value

Set @l_Affected_Rows = 0

While @l_Affected_Rows = 0
Begin

Set @l_Next_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value

Set @l_Next_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Next_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

Set @l_Execution_String = 'Update A Set A.Current_Code = ' + Cast(@l_Next_Machine_X_Value as VarChar) +
' From Dbo.Code_Generation_Table A '+
+ ' Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) + ' And Machine_Brand = ' + ''''+ @i_Machine_Brand + '''' +
' And Not Exists (Select ' + ''''
+ 'x' + '''' + ' From Main_Table ' +
'Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) +
' AND Machine_Brand = ' + '''' + @i_Machine_Brand + '''' +
' And Machine_X_Code = ' + '''' + Replicate('0',10 - Len(@l_Temp_Machine_X_Value)) + Cast(@l_Temp_Machine_X_Value as Varchar(10)) + '''' + ')'

Exec (@l_Execution_String)

Set @l_Affected_Rows = @@Rowcount

If @l_Affected_Rows = 1
Begin
Select @o_Machine_X_Code = @l_Temp_Machine_X_Value, @o_Return_Value = 1
Break
End

Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value

Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)

If @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Begin
-- For the current day all the Machine_X_Codes have been used up.
Select @l_Affected_Rows = 2, @o_Return_Value = -1
Break
End
*/

declare @l_next_Machine_X_Code int
,@l_Start_Machine_X_Code int
,@l_End_Machine_X_Code int
,@l_Current_Machine_X_Code int
,@l_Increment_Value tinyint

select @l_Start_Machine_X_Code = Start_Code
,@l_End_Machine_X_Code = End_Code
,@l_Increment_Value = Increment_Value
,@l_Current_Machine_X_Code = Current_Code
from Code_Generation_Table
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand

if ((@l_Current_Machine_X_Code < @l_Start_Machine_X_Code) or (@l_Current_Machine_X_Code > @l_End_Machine_X_Code))
begin
Start_Processing:
update Code_Generation_Table
set @l_Current_Machine_X_Code = Current_Code = @l_Start_Machine_X_Code
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
end

if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table

create table #Temp_Code_Table
(Existing_Machine_Codes int
,Valid_machine_Codes int
,Serial_No int identity(0,1))

insert into #Temp_Code_Table (Existing_Machine_Codes, Valid_machine_Codes)
select convert(int, Machine_X_Code), @l_Current_Machine_X_Code from dbo.Main_Table
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
order by Machine_X_Code

if ((select COUNT(1) from #Temp_Code_Table) = 0)
begin
select @l_next_Machine_X_Code = @l_Current_Machine_X_Code
goto Final_Validation
end

update #Temp_Code_Table
set Valid_machine_Codes = Valid_machine_Codes + (Serial_No * @l_Increment_Value)

select @l_next_Machine_X_Code = min(a.Valid_machine_Codes) from #Temp_Code_Table a
where a.Valid_machine_Codes not in (select b.Existing_Machine_Codes from #Temp_Code_Table b)
and a.Valid_machine_Codes between @l_Start_Machine_X_Code and @l_End_Machine_X_Code

if (@l_next_Machine_X_Code is null)
begin
select @l_next_Machine_X_Code = max(Valid_machine_Codes) + @l_Increment_Value from #Temp_Code_Table
end

Final_Validation:
if (@l_next_Machine_X_Code > @l_End_Machine_X_Code)
begin
goto Start_Processing
end

if (@l_next_Machine_X_Code < @l_Start_Machine_X_Code) or (@l_next_Machine_X_Code > @l_End_Machine_X_Code)
begin
select @l_next_Machine_X_Code = @l_Start_Machine_X_Code
,@o_Return_Value = -1
set nocount off
if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table
return
end

if (@l_next_Machine_X_Code >= @l_Start_Machine_X_Code) and ((@l_next_Machine_X_Code >= @l_Start_Machine_X_Code) and (@l_next_Machine_X_Code <= @l_End_Machine_X_Code))
begin
update Code_Generation_Table with (readpast, rowlock, xlock)
set Current_Code = case
when @l_next_Machine_X_Code = @l_End_Machine_X_Code then @l_Start_Machine_X_Code
else @l_next_Machine_X_Code + @l_Increment_Value
end
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand

if (@@ROWCOUNT <> 0)
begin
select @o_Machine_X_Code = @l_next_Machine_X_Code
,@o_Return_Value = 1
end
else
set @o_Return_Value = -1
end

if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table

set nocount off
return
End
GO

The new logic in the SP is given in the comments before the Create Procedure statement. The calls to function have been eliminated. This SP checks for valid Machine_X_Code values between Current_Code and End_Code. If not found, it checks for valid Machine_X_Code values between Start_Code and Current_Code. This is because there is a functional resttriction that the search should always start from the Current_Code only.

Let's now start from the beginning by emptynig (truncating) bith the tables, Code_Generation_Table and Main_Table and filling (inserting) the same data as above into them afresh by executing the following code:

truncate table Code_Generation_Table
truncate table Main_Table

Go

insert into Code_Generation_Table
select 1, 'DELL', 1001, 1001, 1005, 1
union all
select 2, 'DELL', 4001, 4001, 4005, 1
union all
select 3, 'DELL', 7001, 7001, 7005, 1
union all
select 1, 'HP', 2001, 2003, 2005, 1
union all
select 2, 'HP', 5001, 5003, 5005, 1
union all
select 1, 'IBM', 3001, 3001, 3005, 1
union all
select 2, 'IBM', 8001, 8003, 8005, 1
union all
select 3, 'IBM', 6001, 6004, 6005, 1
union all
select 3, 'SUN', 6011, 6011, 6015, 1

go

insert into Main_Table (Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs)
select 1, 'DELL', '0000001001', 2
union all
select 1, 'DELL', '0000001002', 4
union all
select 1, 'DELL', '0000001003', 8
union all
select 1, 'DELL', '0000001004', 16
union all
select 1, 'HP', '0000002002', 2
union all
select 1, 'HP', '0000002003', 4
union all
select 1, 'HP', '0000002004', 8
union all
select 1, 'HP', '0000002005', 32
union all
select 1, 'IBM', '0000003003', 2
union all
select 1, 'IBM', '0000003004', 8
union all
select 1, 'IBM', '0000003005', 32
union all
select 2, 'DELL', '0000004001', 2
union all
select 2, 'DELL', '0000004002', 4
union all
select 2, 'DELL', '0000004003', 8
union all
select 3, 'DELL', '0000007004', 16
union all
select 2, 'HP', '0000005002', 2
union all
select 2, 'HP', '0000005003', 4
union all
select 2, 'HP', '0000005004', 8
union all
select 2, 'HP', '0000005005', 32
union all
select 3, 'IBM', '0000006001', 2
union all
select 3, 'IBM', '0000006003', 2
union all
select 3, 'IBM', '0000006004', 8
union all
select 3, 'IBM', '0000006005', 32
Go

Having filled up the data, it's time to test this new version of the SP.

Let us try to generate a new Machine_X_Code for the Location_Code = 1 and Brand_Name = 'Dell'.

TEST 1:
Execute the following:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 1
and Machine_Brand = 'Dell'

The result set is as follows:

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
1 DELL 0000001001 2
1 DELL 0000001002 4
1 DELL 0000001003 8
1 DELL 0000001004 16

So the next generated machine_x_code to has to be 1005. Remember zeroes are suffixed later and that is not so important right now. What is important is generating the new Machine_X_Code that meets all the requirements above.

Now run the following code:

select * from Code_Generation_Table

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
1 HP 2001 2003 2005 1
1 IBM 3001 3001 3005 1
2 DELL 4001 4001 4005 1
2 HP 5001 5003 5005 1
2 IBM 8001 8003 8005 1
3 DELL 7001 7001 7005 1
3 IBM 6001 6004 6005 1
3 SUN 6011 6011 6015 1

Since the codes (1001, 1002, 1003 and 1004) already exist in the Main_Table for Location_Code = 1 and Machine_Brand = 'Dell', the next one has to be 1005 as it exists in the Code_Generation_Table between Start_Code and End_Code. Let's see if the SP generates this machine code. Execute the following code to execute the SP:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 1
,@i_Machine_Brand = 'Dell'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
1005 1

We got the exact machine code. Now let's check the table Code_Generation_Table for the column Current_Code by executing the following code:

select * from Code_Generation_Table
where Location_Code = 1
and Machine_Brand = 'Dell'

The result set is as follows:

Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1

You may note that the Current_Code has been updated. It has actually been initialized with the next possible Current_Code because 1005 has been taken by us via execution of the SP. So the next available value can be 1006. But since 1006 is outside the range of 1001 and 1005, the next possible value has to be re-initialized to the start_code which is 1001. So this condition has also been satisfied by the new version of the SP.

TEST 2:
Now let's test another condition:

Scenario 1:

For Location_Code = 2 and Machine_Brand = 'IBM', there is no record in the Main_Table. Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

(0 rows affected)

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8003 8005 1

So as per the given condition, the new machine_x_code shoudl be 8003 and the Current_Code column of the table, Code_Generation_Table should get updated with 8004. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 2
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
8003 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8004 8005 1

So the new Machine_X_Number generated in 8003 which is perfect. And the Crrent_Code column has also been updated to the next possible Machine_X_Number which is 8004. So the Scenario 1 has been satisfied by the new SP.

TEST 3:
Let's check out if Scenario 2 is satisfied:

For Location_Code = 2 and Machine_Brand = 'HP', there are four records in the Main_Table with machine codes (0000005002, 0000005003, 0000005004 and 0000005005). Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'HP'

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
2 HP 0000005002 2
2 HP 0000005003 4
2 HP 0000005004 8
2 HP 0000005005 32

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5003 5005 1

So as per the given condition for Scenario 2, the new machine_x_code generated should be 5001 and the Current_Code column of the table, Code_Generation_Table should get updated with 8002. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 2
,@i_Machine_Brand = 'HP'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
5001 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5002 5005 1

So the new Machine_X_Number generated in 5001 which is perfect. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 5002. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exists within the range of Start_Code and End_Code. So the Scenario 2 has been satisfied by the new SP.

TEST 4:
Let's check the Scenario 3, the final scenario:

For Location_Code = 3 and Machine_Brand = 'IBM', there are four records in the Main_Table with machine codes (0000006001, 0000006003, 0000006004 and 0000006005). Let's check by executing the following code:

select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
3 IBM 0000006001 2
3 IBM 0000006003 2
3 IBM 0000006004 8
3 IBM 0000006005 32

Let's check in the Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6004 6005 1

So as per the given condition for Scenario 3, the new machine_x_code generated should be 6002 and the Current_Code column of the table, Code_Generation_Table should get updated with 6003. So let's execute the SP from the developer by executng the following code:

declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint

select @i_Location_Code = 3
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0

execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut

select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value

Here is the result set:

Machine_Code Return_Value
------------ ------------
6002 1

It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:

select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'

The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6003 6005 1

So the new Machine_X_Number generated in 6002 which was the missing code in Main_Table for Location_Code = 3 and Machine_Brand = 'IBM'. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 6003. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exist within the range of Start_Code and End_Code. So the Scenario 3 has been satisfied by the new SP.

We too have satisifed all the functional requirements with less scan of the tables Main_Table and Code_Generation_Table. We are not looping except in one scenario where we loop only twice at the most. No matter what the range of values between Start_Code and End_COde, we would scan any table once or at the most twice. Update to Code_Generation_Table also happenns only two times at the most. So the new code can handle larger volumes of data than the original code.

There is another method of optimiznig this SP further whereby there is only once scan of any table. May be in my next article, I would write about it as I am quite tired today. I feel any SP can be optimized no matter how optimized it already is. The dead end is always too distant. I hope you have enjoyed the article. Please let me know of your feedback which is always very welcome.

Thursday, March 13, 2008

TSQL and Excel - Data Exchanges

We are all aware of the applicability of "BCP" for transporting data between external files (text files, Excel files, etc) and database tables, and "Bulk Insert" for sending data one way from external files into database tables. I love BCP. Sometimes, situations arise where we need data from tables to be sent to well formatted Excel spreadsheets and vice-versa. With BCP, one could not send in the Column Headers into the Spreadsheet (it can be done with queryout option but in a round about way) . We always need simple solutions, don't we?

Let's complicate the requirement further. We have to write today's Top three stocks from the database on to a spreadsheet, i.e. the top three gainers in the Stock Market today. We have to also write the top three losing stocks in the Stock Market today into the same Excel spreadsheet but in a different tab. This single Excel spreadsheet with the two tabs, one for gainers and one for losers needs to be e-mailed to some of the investors daily. We would put the e-mailing issue aside as it is quite simple using SQL Server 2005 Database Mail or SQL Server 2000 SQL Mail.

The second requirement is that the data in both the Excel spreadsheet tabs needs to be overwritten with fresh data daily. The source of all data is just one database table which is fed daily. There are two more tabs in the spreadsheet that will have the consolidated gainers and losers respectively for all days as the days progress.

It might have been so simple if this overwritting on two tabs (Gainers and Losers) and appending onto two other tabs (AllGainers and AllLosers) was not required. It would have also been so simple if we could have had four separate spreadsheets instead of four tabs in one single spreadsheet. Then we could have simply used the simple BCP and added a little bit of complication in the bcp command to include the Column Headers in there. But that is not the case as we need to keep the solution simple. Some people say, use CLR and do it row by row. Some say, use VC++.

The issue here is to find a simple solution and not complicate the solution further. We don't want assemblies and DLL's to be prepared, compiled, registered and then modified whenever something needs to be changed. So CLR, VC++ and the like are not an option. We are talking about efficient programming here. And since we are talking about programming and not about packages, DTS and SSIS are also not an option. We need minimum maintenance too. What do we do?

We resort to our contemporary T-SQL database programming language. T-SQL is simple, neat, prolific, dynamic and very flexible. Why not explore OPENDATASOURCE and OPENROWSET? Let's go about solving the problem now.

So let me prepare the data first. Let's create a table called Table_Stock_Value and populate the same. This article is just about concepts. So please don't bother about constraints, indexes and fillfactors right now. All symbols in the data are hypothetical.

if exists (select 1 from sys.tables
where name = 'Table_Stock_Value'
and type = 'U')
drop table Table_Stock_Value
go
create table Table_Stock_Value
(SerialNo int identity (1,1) not null
,NASDAQ_SYMBOL varchar(4) not null
,Company_Name varchar(25) not null
,Change decimal(5,2)
,Date_of_Change datetime not null default getdate())
go
insert into Table_Stock_Value (NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change)

--TOP GAINERS
select 'ABCD', 'ABCD Inc', 2.50, getdate()-3
union all
select 'EFGH', 'EFGH Inc', 2.00, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.50, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.75, getdate()-2
union all
select 'MNOP', 'MNOP Inc', 1.60, getdate()-2
union all
select 'ABCD', 'ABCD Inc', 1.00, getdate()-2
union all
select 'QRST', 'QRST Inc', 3.50, getdate()-1
union all
select 'UVWX', 'UVWX Inc', 2.50, getdate()-1
union all
select 'ABCD', 'ABCD Inc', 1.25, getdate()-1
union all
select 'IJKL', 'IJKL Inc', 4.95, getdate()
union all
select 'ABCD', 'ABCD Inc', 2.50, getdate()
union all
select 'MNOP', 'MNOP Inc', 1.25, getdate()
union all
select 'ABCD', 'ABCD Inc', 1.90, getdate()+1
union all
select 'EFGH', 'EFGH Inc', 1.20, getdate()+1
union all
select 'QRST', 'QRST Inc', 1.10, getdate()+1
union all
select 'ABCD', 'ABCD Inc', 1.10, getdate()+2
union all
select 'UVWX', 'UVWX Inc', 1.00, getdate()+2
union all
select 'QRST', 'QRST Inc', 0.95, getdate()+2
union all
select 'ABCD', 'ABCD Inc', 1.50, getdate()+3
union all
select 'EFGH', 'EFGH Inc', 1.25, getdate()+3
union all
select 'IJKL', 'IJKL Inc', 1.00, getdate()+3

UNION ALL
--TOP LOSERS
select 'BBBB', 'BBBB Inc', -1.50, getdate()-3
union all
select 'CCCC', 'CCCC Inc', -1.20, getdate()-3
union all
select 'DDDD', 'DDDD Inc', -1.10, getdate()-3
union all
select 'EEEE', 'EEEE Inc', -1.75, getdate()-2
union all
select 'FFFF', 'FFFF Inc', -1.60, getdate()-2
union all
select 'YYYY', 'YYYY Inc', -1.00, getdate()-2
union all
select 'IJKL', 'IJKL Inc', -0.70, getdate()-1
union all
select 'HHHH', 'HHHH Inc', -0.62, getdate()-1
union all
select 'CCCC', 'CCCC Inc', -0.25, getdate()-1
union all
select 'UUUU', 'UUUU Inc', -4.95, getdate()
union all
select 'LLLL', 'LLLL Inc', -2.50, getdate()
union all
select 'AAAA', 'AAAA Inc', -1.25, getdate()
union all
select 'UVWX', 'UVWX Inc', -1.90, getdate()+1
union all
select 'FFFF', 'FFFF Inc', -1.20, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -1.10, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -0.80, getdate()+2
union all
select 'BBBB', 'BBBB Inc', -0.75, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -0.65, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -1.50, getdate()+3
union all
select 'YYYY', 'YYYY Inc', -1.25, getdate()+3
union all
select 'QRST', 'QRST Inc', -1.00, getdate()+3
go
create clustered index ci_date on Table_Stock_Value (Date_of_Change) with fillfactor = 70
go
select * from Table_Stock_Value
go

Next for one time only, we need to prepare an Excel spreadsheet with four tabs in it. The names of the tabs are, Gainers, Losers, AllGainers, and AllLosers. And each tab can have the column names same or different than in the table, Table_Stock_Value created above. For simplicity, let's keep the column names same in the tabs. I have created and kept the Excel spreadsheet in the folder, C:\SQLXL. I have named the Excel spreadsheet as Stock_Overview.xls.

Please remember to close the Excel sheet before the initial compilation of the SP and before each instance of execution of the SP.

Now, let's set out to do what we need to do i.e. writing into the Excel spreadsheet's different tabs as per the requirements stated above. I have written the SP as follows. Please note that the SQL Server advanced configuration option, 'Ad Hoc Distributed Queries' should be set to 1 to read from and write to the Excel Spreadsheet from within a database Stored Procedure:

set ansi_warnings on
go
set ansi_nulls on
Go
execute sp_configure 'show advanced options', 1
go
reconfigure
go
execute sp_configure 'Ad Hoc Distributed Queries', 1
go
reconfigure
go

if exists (select 1 from sys.procedures
where type = 'P'
and [NAME] = 'DBP_Prepare_Daily_Stock_Data')
drop procedure [DBP_Prepare_Daily_Stock_Data]
GO
create procedure DBP_Prepare_Daily_Stock_Data
@i_date datetime = null
as
begin
declare @l_File_Exists int
select @i_date = isnull(@i_date, getdate())
execute master.dbo.XP_fileexist 'C:\SQLXL\Stock_Overview.xls', @l_File_Exists output
if (@l_File_Exists = 1)
begin
--Gainers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Gainers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--Losers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Losers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--UPDATE/Overwrite the Gainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Gainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
--UPDATE/Overwrite the Losers tab with top 3 losers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Losers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
--UPDATE/APPEND the AllGainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllGainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllLosers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
PRINT 'RESULTS HAVE BEEN WRITTEN TO C:\SQLXL\Stock_Overview.xls'
end
else
Begin
Print 'The file Stock_Overview.xls should exist in the correct path, C:\SQLXL.'
PRINT 'RESULTS COULD NOT BE WRITTEN TO C:\SQLXL\Stock_Overview.xls. THE FILE IS OPEN OR DOES NOT EXIST.'
End
end
GO

The SP being ready now, let's check if the all the tabs in the Excel spreadsheet,
C:\SQLXL\Stock_Overview.xls are empty:

select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

They are empty.

Let's execute the SP for the earliest date in the table Table_Stock_Value, i.e. getdate()-3:

declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data.

You can also execute the following T-SQL code from your SQL Server database to check the same:

select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

The data in the spreadsheet and the result sets above would match.

Let's execute the SP for the next date in the table Table_Stock_Value, i.e. getdate()-2:

declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data. Please close the spreadsheet after reading its contents.

To verfiy the results using T-SQL itself without opening the spreadsheet, you can use the following code too and verify the results:

select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

Note that we have met the requirement.

Let's do it for getdate()-1 and getdate() as well.

declare @i_date datetime
select @i_date = getdate()-1
exec DBP_Prepare_Daily_Stock_Data @i_date

Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the date, getdate()-1. You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date to the SP. The AllGainers and AllLosers tabs have their number of rows growing with each execution of the SP as new rows get appended behind the older rows thus building up a historical data for later analysis. And that's all we want.

Let's execute the SP for @i_date = getdate().

declare @i_date datetime
select @i_date = getdate()
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the given input date, getdate(). You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date for top three gainers and losers respectively for the respective tabs. The AllGainers and AllLosers tabs have their number of rows growing with each execution as new rows get appended behind the older rows thus building up a historical data for later analysis. We are done with verifying that we have satisfied the requirements that we initially started out with.

You can checkout the contents of the spreasheet by just opening it from its location. Alternatively, you can use the following code. Please close the spreadsheet after physically reading its contents.:

select * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

Please remember to close the Excel sheet before each execution of the SP.

Now let me throw a little bit of explanation behind overwriting and appending the Excel spreadsheet tabs:

First of all, in order to read from and write to an external data source like the Excel spreadsheet in this case while not using the BCP or BULK INSERT commands, the advanced SQL Server configuration option 'Ad Hoc Distributed Queries' must be set to 1. This way, writes to and reads from the Excel spreadsheet get enabled. This's what we did first.

Next, we UPDATE any existing records in the Gainers and Losers tabs with an empty string, ''. This is equivalent to DELETEing all existing records in both these tabs. Even then, the empty characher string remains in the existing cells although it is not visible.

The next step is to overwrite these existing cells in the Gainers and Losers tabs. For this purpose, we use the INSERT statement. Please note that we use the starting and ending cells of the first row in the Gainers and Losers tab in the OPENDATASOURCE statement (A1:E1). This will enable overwriting the existing cells. The smart thing here is that any empty strings '' are no more visible even if they exist somwhere else in the spreadsheet.

For the tabs AllGainers and AllLosers, since we don't want to overwrite existing data but instead want to append new data to any existing data, we omit mentioning the starting row A1:E1 in the OPENDATASOURCE function of the INSERT statement. That ensures that new rows are always appended to the existing rows in these two tabs.

You can use the spreadsheet and its tabs as if they were individual tables, for reading and performing calculations. You can apply all types of T-SQL constructs on that data to make aggregations, etc. Please note the sample T-SQL code above that we used to check the result sets in the tabs of the spreadsheet after executing the SP.

One important point to note is that before compiling the SP, the spreadsheet with the appropriate tabs in it should exist in the correct path mentioned in the SP. Otherwise the SP will not get compiled and will instead throw an error.

Another important point is that during compilation and execution of the SP, the spreadsheet itself should not be physically openend from the Windows Explorer.

Ofcourse, sufficient scope exists for making improvisations to the SP to calculate the top stock for the month or for generating some special reports, etc. I am limited myself to just explaining the manipulation of an Excel spreadseet and its tabs using T-SQL. And the example has been designed to just show that. Using Database Mail to e-mail the spreadsheet is outside the scope of this article.

I hope you like the technique of writing to and manipulating spreadsheets and their tabs using simple T-SQL. Please let me know of your comments and thoughts.

Monday, March 10, 2008

Simplifying Code: Updating Variables

A good programming practice means being able to use every available option one can to reduce code and at the same time increase speed of execution. A merit of blogging is that such methods can be explained but demontration of using the sample code as it is on large databases is not possible because of the obvious limitations on how much can be written inside one article. That fact however does not in any way takes away the applicability of such methods on large databases. In today's article, we will see how an unrelated construct/statement can help in reducing code and speeding up execution.

First let us build the base table, Order_Master:

create table Order_Master
(Order_ID smallint not null
,Begin_Date datetime not null
,End_Date datetime not null
,Delivery_Location varchar(50) not null)
go

Now, let's put in some data into the table, Order_Master:

Insert into Order_Master (Order_ID, Begin_Date, End_Date, Delivery_Location)
select 1, getdate()-100, getdate()+100, 'New York'
union all
select 2, getdate()-50, getdate()+50, 'Washington D.C'
union all
select 3, getdate()-25, getdate()+25, 'Princeton'
union all
select 4, getdate()-12, getdate()+12, 'Dallas'
union all
select 5, getdate()-6, getdate()+6, 'Los Angeles'
go

Let's read from the table, Order_Master:

select * from Order_Master
go

Order_ID Begin_Date End_Date Delivery_Location
-------- ----------------------- ----------------------- --------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087 New York
2 2008-01-20 14:06:04.087 2008-04-29 14:06:04.087 Washington D.C
3 2008-02-14 14:06:04.087 2008-04-04 14:06:04.087 Princeton
4 2008-02-27 14:06:04.087 2008-03-22 14:06:04.087 Dallas
5 2008-03-04 14:06:04.087 2008-03-16 14:06:04.087 Los Angeles

Now let's come to the scenario.

1. For a given Order_ID and for a given input date, we need to find out what the begin date and end date for that Order_ID are.

2. We also need to find out whether the given input date falls within the begin date and end date for that order ID.

3. The output needs to be the Begin Date, the end date and a return value which would be 1 if the given input date falls within the begin date and end date for that order ID or 0 if it doesn't.

Given these requirements, a normal Stored Procedure would be like the following:

if exists (select 1 from sys.procedures
where type = 'P'
and user_name(schema_id) = 'dbo'
and [NAME] = 'DBP_ValidateOrderDates')
drop procedure [dbo].[DBP_ValidateOrderDates]
GO

CREATE PROCEDURE [dbo].[DBP_ValidateOrderDates]
@i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit = 0 output,
@o_BeginDate datetime = Null output,
@o_EndDate datetime = Null output
AS
BEGIN
SET NOCOUNT ON;
Select @o_BeginDate = Begin_Date
,@o_EndDate = End_Date
from dbo.Order_Master
where Order_ID = @i_OrderNo
IF @@rowcount = 0
Begin
Set @o_Return = 0
End
Else
If (@i_ValidationDate <= @o_EndDate and @i_ValidationDate >= @o_BeginDate)
Begin
set @o_Return = 1
End
Else
Begin
set @o_Return = 0
End
Set nocount off
END
Go

Now let's compile and then execute this SP for Order_ID = 1 and input date as today's date:

--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 1 because the input date falls between the begin and end dates for that Order_ID:

Return Value Begin Date End Date
------------ ----------------------- -----------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087

Now let's execute this SP for the same Order_ID (1) but with the input date not falling within the begin and end dates for that Order_ID.

--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()-200
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 0 because the input date does not fall between the begin and end dates for that Order_ID:

Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087

Now let's take another possible situation. If we were to input an Order_ID (101) that doesn't exist in the Order_Master table, we should be getting nulls for the begin and end dates but the return value should be 0. Let's execute this scenario.

--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 101
,@i_ValidationDate = getdate()
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

The result set matches with the expected result:

Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 NULL NULL

Since the input Order_ID, 101 does not exist in the table, the Begin and End dates are null. Since they are null, the input date can not fall between null begin and end dates and hence is 0.

Issue:

You may observe that there are so many constructs in this SP:

One SELECT statement
Three set variable statements
One usage of @@rowcount to check existence of the record
Three Begin-End blocks
Two IF and one ELSE construct

The SP seems perfectly logical. It has a very logical flow to satisfy all our requirements. But
the question is "Do we really need so many constructs to staify our requirements?". The answer is "May be Not". Remember, all we have to do is assign proper value to the output variables/parameters based on the input variables and the data in the table, Order_Master.

Let's try to simplify the same SP. Below, I have replace the IF BEGIN-END, IF-ELSE and @@rowcount with one UPDATE CASE statement. Let's see how it looks now:

if exists (select 1 from sys.procedures
where type = 'P'
and user_name(schema_id) = 'dbo'
and [NAME] = 'DBP_ValidateOrderDates')
drop procedure [dbo].[DBP_ValidateOrderDates]
GO

CREATE PROCEDURE [dbo].[DBP_ValidateOrderDates]
@i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit = 0 output,
@o_BeginDate datetime = Null output,
@o_EndDate datetime = Null output
AS
BEGIN
SET NOCOUNT ON;
update a
set @o_Return = case
when @i_ValidationDate between a.Begin_Date and a.End_Date then 1
else 0
end
,@o_BeginDate = a.Begin_Date
,@o_EndDate = a.End_Date
from (Select Begin_Date, End_Date from dbo.Order_Master with (nolock)
where Order_ID = @i_OrderNo) a

Set nocount off
END
Go

Doesn't the SP look short now? One UPDATE statement takes care of all possibilities and reduces code even though in the original SP, we are not updating any table for that matter. Neither are we updating the table Order_Master in this refined version of the original SP. All we are doing is updating the output variables / parameters based on the input parameters and the corresponding data in the table Order_Master. And that's all that is expected from this SP. So why take the longer route as in the original version? Not necessary.

Now let's execute the refined SP:

Now let's compile and then execute this SP for Order_ID = 1 and input date as today's date. This is the first case above:

--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 1 because the input date falls between the begin and end dates for that Order_ID:

Return Value Begin Date End Date
------------ ----------------------- -----------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087

Now let's execute this SP for the same Order_ID (1) but with the input date not falling within the begin and end dates for that Order_ID. This is the second case above.

--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()-200
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 0 because the input date does not fall between the begin and end dates for that Order_ID(1):

Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087

Now let's take another possible situation, i.e. the third situation above. If we were to input an Order_ID (101) that doesn't exist in the Order_Master table, we should be getting nulls for the begin and end dates but the return value should be
0. Let's execute this scenario.

--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime

--Assigning values to input variables/parameters:
select @i_OrderNo = 101
,@i_ValidationDate = getdate()
,@o_Return = null

--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output

--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'

The result set matches with the expected result:

Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 NULL NULL

Since the input Order_ID, 101 does not exist in the table, the Begin and End dates are null. Since they are null, the input date can not fall between null begin and end dates and hence is 0.

We get the same results with the refined version of the SP but in one statement that in multiple If's and ELSE's and @@rowcount.

The UPDATE statement in this case doesn't perform any updates to the table Order_Master itself. It just updates the output parameters to the correct values.

This is a very simple example of using unrelated operations (UPDATE) where SELECT seems the only solution and thus reducing the amount of code needed to be written and executed. This article just demonstrates that coding flexibility in Microsoft SQL Server's T-SQL is profound. Based on individual situation, there are numerous possibilities in T-SQL that can reduce code and execution time as well. We will follow some more of such techniques in the forthcoming articles.

Tuesday, December 11, 2007

MERGE statement in SQL Server 2008

Since the last few days, I have been spending some time on studying new features in the Katmai betas. One of the better coding introductions I like in Microsoft SQL Server 2008 is the MERGE statement. I like the way it provides option to UPDATE, DELETE and INSERT data simultaneously in a single T-SQL statement.

Using it for simple operations is a costly affair when compared to equivalent T-SQL code in 2000/2005. However using it for complex operations is simple and efficient compared to equivalent T-SQL code in 2000/2005. This is the paradox. I find most prolific coders prefer to write compact code even if is is very complex rather than using lengthy but simple code. The reason most often is that the shorter the code, the faster it performs in most cases. Here is where the paradox associated with MERGE statement comes in handy.

Let me create a simple situation first:

--Create a small table.
create table abc
(productid int identity (1,1)
,Quantity decimal(4,0) default 100
,Order_Date datetime default getdate())

--Populate the tables, abc.

insert into abc default values
insert into abc (Quantity) values (125)
insert into abc (Quantity) values (150)
insert into abc (Quantity) values (175)
insert into abc (Quantity) values (200)
insert into abc (Quantity) values (225)
insert into abc (Quantity) values (250)
insert into abc (Quantity) values (275)
insert into abc (Quantity) values (300)
insert into abc (Quantity) values (325)
insert into abc (Quantity) values (350)
insert into abc (Quantity) values (375)

--Check the contents of the table, abc.
select * from abc

--Add a nullable column to the table, abc. This one column is to be populated (updated) later.
alter table abc
add OrderID int null

--Now the MERGE construct updates the OrderID column of the table abc as follows.
--Divide the ProductID by 2. If remainder comes out to be zero, then update
--OrderID with 2. If remainder comes out to a non-zero value, then update
--OrderID as ProductID minus the Quotient.
--The code is given below:

merge abc as master_table
using (select ProductId, ProductId / @l_OrderID_value, ProductId % @l_OrderID_value
from abc) as input (ProductId, Quotient, Remainder)
ON master_Table.ProductId = Input.ProductID
When Matched
THEN UPDATE set master_table.OrderID = case
when input.Remainder = 0 then @l_OrderID_value
else master_Table.ProductId - Input.Quotient
end;

It was very structured but long. This was a simple operation and didn't need such length of code.

--Now let's backup the data in the tables abc.

select * into def from abc

Now, let's write the equivalent T-SQL code as a replacement of the MERGE statement above.

declare @l_OrderID_value int
set @l_OrderID_value = 2

update abc
set OrderID = case
when ProductID%@l_OrderID_value = 0 then @l_OrderID_value
else ProductId - (ProductID/@l_OrderID_value)
end;

This was short, simple and just one UPDATE statement was required to do the entire operation.

If you look for the query cost relative to batch when you run both the statements together and get actual execution plans, the MERGE costs 76% while the single UPDATE costs 24%. The MERGE statement had a SELECT from the table abc, then it uses a derived table, and then it does the UPDATE. In the second statement above, there is a direct UPDATE. So the second statement is expected to be faster. So in this example, MERGE was not really useful. The above example was a simple case, so we won't bother ourselves much with MERGE in simple scenarios.

Now, let's take up something slightly more involved where MERGE can be really useful.

--Create a new table.
create table abc_Summary
(OrderID int
,Order_Date datetime
,Quantity int)

--It is empty.
select * from abc_Summary

MERGE abc_summary as TARGET_Table
using (select OrderID, convert(varchar, Order_Date, 112), sum(Quantity) from abc
group by OrderID, convert(varchar, Order_Date, 112)) as Data (OrderID, Order_Date, Total_Quantity)
on TARGET_Table.OrderID = Data.OrderID
When NOT MATCHED THEN INSERT (OrderID, Order_Date, Quantity)
VALUES (Data.OrderID, Data.Order_Date, Data.Total_Quantity)
WHEN MATCHED THEN
UPDATE SET TARGET_Table.Quantity = Data.Total_Quantity;

--Verify that the empty table has gotten populated with aggregated data from the main table, abc.
select * from abc_Summary

--So this time the INSERT took place. Now lets UPDATE the column abc_Summary.Quantity to --zero. And then we will run the above code again.

update abc_summary
set Quantity = 0
GO

--Verify that Quantity is 0.
select * from abc_Summary
GO

--Run the same code again:
MERGE abc_summary as TARGET_Table
using (select OrderID, convert(varchar, Order_Date, 112), sum(Quantity) from abc
group by OrderID, convert(varchar, Order_Date, 112)) as Data (OrderID, Order_Date, Total_Quantity)
on TARGET_Table.OrderID = Data.OrderID
When NOT MATCHED THEN INSERT (OrderID, Order_Date, Quantity)
VALUES (Data.OrderID, Data.Order_Date, Data.Total_Quantity)
WHEN MATCHED THEN
UPDATE SET TARGET_Table.Quantity = Data.Total_Quantity;

--Verify that the Quantity column in the abc_Summarfy table is updated now with actual summed quantities per OrderID from the parent table, abc.

select * from abc_Summary

Lets do something different now, a bit more involved. UPDATE the table abc_Summary such that OrderID's 1,2 and 3 are removed and Quantity is set to zero for the remaining three rows. Also add one dirty row to the abc_summary table. Note that the table abc_summary should contain only those OrderID's that already exist in the parent table, abc. Since the newly inserted OrderID in the dirty row does not exist in the parent table, abc, it should be deleted from abc_summary. So in all, what we are trying to do here is following:
INSERT missing rows,
UPDATE Quantity for existing three rows, and
DELETE one dirty row.

All this is to be accomplished in one query.
Let's prepare the data now.

--Preparing data.
update abc_summary
set Quantity = 0
GO
delete from abc_summary
where OrderID in (1, 2, 3)
GO
insert into abc_summary (OrderID, Order_Date, Quantity)
values (2001, getdate(), 50000)
go

--Check the data. OrderID's 4,5 and 6 have their Quantity zero'ed out. OrderID's 1,2 and 3 no ---longer exist. A new OrderID 2001 has come in but it doesn't exist in parent table, abc.

select * from abc_Summary

--Now let's get the correct data into the table abc_summary in one single T-SQL query using the MERGE statement which issues INSERT, UPDATE and DELETE simultaneously.

MERGE abc_summary as TARGET_Table
using (select OrderID, convert(varchar, Order_Date, 112), sum(Quantity) from abc
group by OrderID, convert(varchar, Order_Date, 112)) as Data (OrderID, Order_Date, Total_Quantity)
on TARGET_Table.OrderID = Data.OrderID
When NOT MATCHED THEN INSERT (OrderID, Order_Date, Quantity)
VALUES (Data.OrderID, Data.Order_Date, Data.Total_Quantity)
WHEN MATCHED THEN
UPDATE SET TARGET_Table.Quantity = Data.Total_Quantity
WHEN SOURCE NOT MATCHED THEN
DELETE;

--Check the data. OrderID's 4,5 and 6 have their Quantity UPDATEd correctly. --Rows for OrderID's 1,2 and 3 have been INSERTed now. The dirty row for OrderID 2001 has been DELETed as it didn't exist in parent table, abc.

select * from abc_Summary

So this time the INSERT took place and brought in the three mmissing ORDERID's (1,2,3) from
the table abc into the table abc_Summary. Then it UPDATEd the abc_Summary.Quantity
column for ORDERID's 3,4,5 from zero to the correct aggregated quantity from the
parent table, abc. Lastly, it also DELETEd the erroneous row with ORDERID 2001 because this
ORDERID did not exist in the parent table, abc. All these three operations were
performed in one single query.

All this can be done without the MERGE statement as well. But it would be really very lengthy and expensive code. It would contain several IF-ELSE constructs, IF [NOT] EXISTS clauses, multiple instances of usage of the tables abc and abc_Summary, apart from the INSERT, UPDATE, DELETE statements. We don't want to do that, do we? The cost of that query would be much more than the MERGE statement above. Let's verify the same. Let me try to build as short a code as possible to replace the MERGE statement above. Please see the replacement code below:

--Prepare the data first.
update abc_summary
set Quantity = 0
GO
delete from abc_summary
where OrderID in (1, 2, 3)
GO
insert into abc_summary (OrderID, Order_Date, Quantity)
values (2001, getdate(), 50000)
go

--Check the data.
select * from abc_Summary

--Now let's get the correct data into the table abc_summary.
--UPDATE Quantity to the correct aggregated value.
update abc_summary
set abc_summary.Quantity = (select sum(abc.Quantity) from abc
where abc_Summary.OrderID = abc.OrderID)
where exists (select OrderID from abc
where abc_Summary.OrderID = abc.OrderID)

--Check the data. Quantity has been updated correctly.
select * from abc_Summary

--INSERT missing ORDERID's 1,2 and 3.
insert into abc_summary (OrderID, Order_Date, Quantity)
select OrderID, convert(varchar, Order_Date, 112), sum(Quantity) from abc
where OrderID not in (select OrderID from abc_Summary)
group by OrderID, convert(varchar, Order_Date, 112)

--Check the data. Missing ORDERID's have been re-inserted correctly with the right Quantity.
select * from abc_Summary

--DELETE dirty row.
delete from abc_Summary
where OrderID not in (select OrderID from abc)

--Check the data. The row with ORDERID 2001 has been deleted as required.
select * from abc_Summary

So multiple instances of joins between the parent table abc and the child table abc_Summary had to be used.

The execution plan showed 40% cost for the MERGE statement and 60% cost for the replacement statemenets above combined. Here we are talking about just va few rows of data. The benefits rise with increase in the number of rows. You may want to try it out.

So in all, MERGE statement in Microsoft SQL Server 2008 is really good. It reduces code, does multiple operations in a single statement and is less expensive or is more efficient when we need to perform a bit complex operations. Please note that the syntax for INSERT, UPDATE and DELETE constructs within the MERGE statement are slightly different from their original counterparts. The MERGE statement opens up a lot of possibilities. Imagine how interesting it would be to use within an AFTER TRIGGER to perform UPDATE, DELETE and INSERT operations on another table in one shot through the trigger. Although it is possible to do it without the MERGE statement as well, but it gets too lengthy and tedious as I had to implement at one of my clients for certain reasons. MERGE can make it compact, simple and efficient.

In my next article, we will solve some puzzles.

Wednesday, November 14, 2007

Some T-SQL code in 2005 and 2000.

I have sometimes been fazed by CTE (Common Table Expressions) and PIVOT in SQL Server 2005. I like the way they are structured but hate the way they perform. I tried comparing them with our old-style drived tables. And to my surprise, code written using derived tables performed much better that the code written using CTE or PIVOT or both combined. Let me share some code here using the SQL Server 2005 Adventureworks database.

--Shows Amount per customer:

WITH Value_from_Customer (CustomerID, SalesOrderID, Total_Value_of_Sale)
AS
(SELECT SOH.CustomerID, SOD.SalesOrderID, sum(SOD.LineTotal)
FROM Sales.SalesOrderDetail AS SOD INNER JOIN Sales.SalesOrderHeader SOH
ON (SOH.SalesOrderID = SOD.SalesOrderID)
GROUP BY SOH.CustomerID, SOD.SalesOrderID
)
SELECT CustomerID
,count(SalesOrderID) as Number_of_Sales
,sum(Total_Value_of_Sale) Value_from_Customer
FROM Value_from_Customer
Group by CustomerID
ORDER BY CustomerID
GO

--The same would have been done in SQL Server 2000 as follows:

SELECT CustomerID
,count(SalesOrderID) as Number_of_Sales
,sum(Total_Value_of_Sale) as Value_from_Customer
from
(SELECT SOH.CustomerID, SOD.SalesOrderID,
sum (SOD.LineTotal) as Total_Value_of_Sale
FROM Sales.SalesOrderDetail AS SOD INNER JOIN Sales.SalesOrderHeader SOH
ON (SOH.SalesOrderID = SOD.SalesOrderID)
GROUP BY SOH.CustomerID, SOD.SalesOrderID) abc
Group by CustomerID
ORDER BY CustomerID
GO

The CTE above looks very structured and systematic. The equivalent query with derived table looks more like a block of buildings. But I have found in all my tests that derived tables always outperform CTE's. In fact, I can share a real story about it. One of my colleagues had a terabyte-plus sized database for a data warehouse. Somebody wrote a 200 line CTE that would fetch a few rows from a combination of several multi-million row tables. It gave the results but after executing for eight hours. It is hard to withstand eight hours when you know that you need just a few rows at the end. I suggested that CTE's in the query be replaced by regular derived tables. It was a big task but when it was done, the query gave out the same results in 55 seconds. That's the kind of difference I am talking about. After that some more improvements were done to make it finally run in 10 seconds. However, we would not go into the details of that story here but would stick to differences in performance between new and old T-SQL constructs.

My sample queries for 2005 and 2000 above would show only marginal difference in execution times becuase the number of rows in the referenced tables is very small. When the number of rows are increased in the underlying tables, the difference in execution times keeps on rising with the 2000 style code outperforming the 2005 CTE code.

Now let's take up a PIVOT construct.

---Shows Sale per year by territory. PIVOT.

SELECT YEAR_of_SALE
, [1] AS Trtry_1
, [2] AS Trtry_2
, [3] AS Trtry_3
, [4] AS Trtry_4
, [5] AS Trtry_5
, [6] AS Trtry_6
FROM
(SELECT TerritoryID, TotalDue, year (OrderDate) AS YEAR_of_SALE
FROM Sales.SalesOrderHeader) SOH
PIVOT (SUM (TotalDue) FOR TerritoryID IN ( [1], [2], [3], [4], [5], [6] )) AS pvt
ORDER BY YEAR_of_SALE
GO

--The same would have been done in SQL 2000 as follows:

select YEAR_of_SALE
,sum(Trtry_1) as Trtry_1
,sum(Trtry_2) as Trtry_2
,sum(Trtry_3) as Trtry_3
,sum(Trtry_4) as Trtry_4
,sum(Trtry_5) as Trtry_5
,sum(Trtry_6) as Trtry_6
from
(select YEAR_of_SALE
,isnull((select Total_Revenue where TerritoryID = 1), 0) as Trtry_1
,isnull((select Total_Revenue where TerritoryID = 2), 0) as Trtry_2
,isnull((select Total_Revenue where TerritoryID = 3), 0) as Trtry_3
,isnull((select Total_Revenue where TerritoryID = 4), 0) as Trtry_4
,isnull((select Total_Revenue where TerritoryID = 5), 0) as Trtry_5
,isnull((select Total_Revenue where TerritoryID = 6), 0) as Trtry_6

from
(SELECT TerritoryID
,year (OrderDate) AS YEAR_of_SALE
,sum(TotalDue) Total_Revenue
FROM Sales.SalesOrderHeader
where TerritoryID between 1 and 6
GROUP BY TerritoryID, year (OrderDate)) abc) def

GROUP BY YEAR_of_SALE
ORDER BY YEAR_of_SALE
GO

Now as you can see, the 2005 query with PIVOT looks simple and neat. The equivalent query performing the same functionality but written using derived tables looks tedious. However, the second query with derived tables outperforms the first one with PIVOT. I agree that the first query with PIVOT requires much less coding effort and looks much less complex than the second one using derived tables. But what are clients interested in? Coding complexity or performance. It is always performance that everybody is interested in. When a customer clicks on a link, he/she expects results fast and doesn't really care what code is executing in the background. So we have to use code which performs better.

Now lets take another example where I combine a CTE with a PIVOT structure. Please note that below the 2005 style query with CTE and PIVOT combined, I have an equivalent for 2000 using derived tables:

--2005.
--Find number of physical sales by year and their total value.
--Break down the yearly value by territory upto six territories.

WITH Value_of_Sale (SalesOrderID, Transaction_Date, Total_Value_of_Sale)
AS
(SELECT SOD.SalesOrderID, SOH.ModifiedDate, sum(SOD.LineTotal)
FROM Sales.SalesOrderDetail AS SOD INNER JOIN Sales.SalesOrderHeader SOH
ON (SOH.SalesOrderID = SOD.SalesOrderID)
GROUP BY SOD.SalesOrderID, SOH.ModifiedDate
)
SELECT count(a.SalesOrderID) Total_Number_of_Sales
,b.YEAR_of_SALE
,sum(a.Total_Value_of_Sale) Total_Value_of_Sale
,b.Trtry_1
,b.Trtry_2
,b.Trtry_3
,b.Trtry_4
,b.Trtry_5
,b.Trtry_6
FROM Value_of_Sale a
inner join
(SELECT YEAR_of_SALE
,[1] AS Trtry_1
,[2] AS Trtry_2
,[3] AS Trtry_3
,[4] AS Trtry_4
,[5] AS Trtry_5
,[6] AS Trtry_6
FROM (SELECT TerritoryID, TotalDue, year (OrderDate) AS YEAR_of_SALE
FROM
Sales.SalesOrderHeader) SOH
PIVOT (SUM (TotalDue) FOR TerritoryID IN ( [1], [2], [3], [4], [5], [6] ) ) AS pvt ) b

on year(a.Transaction_Date) = b.YEAR_of_SALE
Group By b.YEAR_of_SALE
,b.Trtry_1
,b.Trtry_2
,b.Trtry_3
,b.Trtry_4
,b.Trtry_5
,b.Trtry_6
ORDER BY b.YEAR_of_SALE Asc
GO

--The same would have been done in SQL Server 2000 as follows:
---2000.

select count(a.SalesOrderID) Number_of_Sales
,b.YEAR_of_SALE
,sum(a.Total_Value_of_Sale) Total_Value_of_Sale
,b.Trtry_1
,b.Trtry_2
,b.Trtry_3
,b.Trtry_4
,b.Trtry_5
,b.Trtry_6
from

(SELECT SOD.SalesOrderID
,year(SOH.ModifiedDate) AS Transaction_Date
,sum(SOD.LineTotal) AS Total_Value_of_Sale
FROM Sales.SalesOrderDetail AS SOD INNER JOIN Sales.SalesOrderHeader SOH
ON (SOH.SalesOrderID = SOD.SalesOrderID)
GROUP BY SOD.SalesOrderID, year(SOH.ModifiedDate)
) a
INNER JOIN
(select YEAR_of_SALE
,sum(Trtry_1) as Trtry_1
,sum(Trtry_2) as Trtry_2
,sum(Trtry_3) as Trtry_3
,sum(Trtry_4) as Trtry_4
,sum(Trtry_5) as Trtry_5
,sum(Trtry_6) as Trtry_6
from
(select YEAR_of_SALE
,isnull((select Total_Revenue where TerritoryID = 1), 0) as Trtry_1
,isnull((select Total_Revenue where TerritoryID = 2), 0) as Trtry_2
,isnull((select Total_Revenue where TerritoryID = 3), 0) as Trtry_3
,isnull((select Total_Revenue where TerritoryID = 4), 0) as Trtry_4
,isnull((select Total_Revenue where TerritoryID = 5), 0) as Trtry_5
,isnull((select Total_Revenue where TerritoryID = 6), 0) as Trtry_6
from
(SELECT TerritoryID
,year (OrderDate) AS YEAR_of_SALE
,sum(TotalDue) Total_Revenue
FROM Sales.SalesOrderHeader
where TerritoryID between 1 and 6
GROUP BY TerritoryID, year (OrderDate)) abc) def
GROUP BY YEAR_of_SALE) b

on a.Transaction_Date = b.YEAR_of_SALE
Group By b.YEAR_of_SALE
,b.Trtry_1
,b.Trtry_2
,b.Trtry_3
,b.Trtry_4
,b.Trtry_5
,b.Trtry_6
ORDER BY b.YEAR_of_SALE Asc
GO

The 2000 code with derived tables involves much more coding and code complexity than the equivalent one in 2005. But the 2000 code outperforms the 2005 code everytime. When I incease the number of rows in the underlying referenced tables within the two queries, the different in execution times gets wider.

I am not denying SQL Server 2005 credit that it deserves. But I still love the SQL Server 2000 coding style becuase of its supeior performance. Please note that all the code was run on SQL Server 2005 Adventureworks database in 90 compatibility mode.

However, there are a few coding improvements that I love in SQL Server 2005. These are the NTILE and ROW_NUMBER() OVER constructs. The reason why I like them is that not only are they short and simple, they also perform better than their equivalnt coding constructs in SQL Server 2000. Let me share some examples below.

Let me create a situation first. An Engineering College conducts nationwide enterance exams. Thousands of students appear in the tests. Let's say only the top 12 scorers qualify for admission. The college first selects the top 12 scorers in the exam. The college then needs to rank these top 12 students based on their total score first. If for any two or more of the top 12 students the total scores match, then the one with the higher score in Mathematics is ranked higher. If total scores and score in Maths are also same, then the one with higher score in Physics is ranked higher. If the total score and individual scores in Maths, Physics and Chemistry also match, then the student with the higher age is to be ranked higher as per the College policy. Once ranks have bene allotted to the top 12 students, they are to be invited in groups/batches of four to permit them to elect their field of interest (Computer Engineering, Electronics Engineering, Electrical Engineering, Mechanical Engineering, Chemical Engineering, Civil Engineering, etc). Each group is based on the ranks of the students and needs to contain exactly four students.

Taking these requirements into consideration, lets design a small table with the top twleve scorers as follows:

--Create the table.
create table Student_Ranks
(Student_Name varchar(2)
,Maths smallint
,Physics smallint
,Chemistry smallint
,Total_Marks AS Maths + Physics + Chemistry
,Age tinyint)
GO

--Populate the table with marks.
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('A', 90, 50, 70, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('B', 70, 60, 90, 16)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('C', 50, 70, 90, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('D', 50, 90, 70, 16)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('E', 100, 90, 45, 16)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('F', 90, 50, 70, 16)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('G', 45, 80, 60, 16)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('H', 52, 49, 35, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('I', 75, 35, 45, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('J', 35, 35, 25, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('K', 10, 45, 35, 17)
insert into Student_Ranks (Student_Name, Maths, Physics, Chemistry, Age)
values ('L', 5, 35, 56, 17)
GO

select * from Student_Ranks
GO

--SQL Server 2005 code for ranking students.
--Ranks students by Total Marks. If equal, then checks for Maths, then Physics, then
--Chemistry. If all equal, then ranks the older one up.

select Row_Number() Over (Order by Total_Marks desc
,Maths desc
,Physics desc
,Chemistry desc
,Age Desc)
as Student_Rank, Student_Name
from
Student_Ranks
GO

--The code otherwise in SQL 2000 that would do the ranking.
--Identity allowed only with SELECT INTO.
--To gaurentee order, Top 100 Percent, Maxdop 1 and Order By are required.

select top 100 percent Student_Name
,Student_Rank = identity(int, 1, 1)
into #Temp_Student_Ranks
from Student_Ranks
Order by Total_Marks desc, Maths desc, Physics desc, Chemistry desc, Age Desc
option (maxdop 1)

select * from #Temp_Student_Ranks
GO
--Clean up.
Drop table #Temp_Student_Ranks
GO

--The college has to call candidates for disciple allotment in batches of 4.
--Who is in which batch/group as per their rank? The first group is called first, the second next
--and so on.

select NTILE(3) Over(Order by Total_Marks desc
,Maths desc
,Physics desc
,Chemistry desc
,Age Desc)
as Batch, Student_Name from Student_Ranks
GO

--In 2000, the same grouping of batches would have been done in the following way:
select top 100 percent Student_Name
,Student_Rank = identity(int, 1, 1)
,0 as Batch
into #Temp_Student_Ranks
from Student_Ranks
Order by Total_Marks desc, Maths desc, Physics desc, Chemistry desc, Age Desc
option (maxdop 1)
GO

update #Temp_Student_Ranks
set Batch = case
when Student_Rank between 1 and 4 then 1
when Student_Rank between 5 and 8 then 2
when Student_Rank between 9 and 12 then 3
else 4
end

select Student_Name, Batch from #Temp_Student_Ranks
GO
--Clean up.
Drop table #Temp_Student_Ranks
GO

You may observe, how easy it is in 2005 to rank and group students using ROW_NUMBER OVER() and NTILE() respectively. The same holds good for other items as far as ranking and grouping is concerned and as far as I have tried. And it performs better with SQL Server 2005 code than with SQL Server 2000 style code.

In my next article, I will come up with some tricky T-SQL code. Please let me know your opinions for scope of improvement.