SQLSENSE - SQL situations and their Solutions - Suresh Kumar Maganti
Thursday, November 13, 2008
Database Mirroring in Microsoft SQL Server 2005: The finer points
Some drawbacks of log-shipping are following:
a.] You need frequent log backups of the production database.
b.] The recoverability of your DR database depends largely on the frequency of transaction log backups and the time taken to copy such backups from production to DR and the time taken to restore these backups.
c.] The log-shipped database remains dormant except having production transaction log backups getting restored on to it.
d.] From an application end-user perspective, the DR database is useful only after a disaster, not anytime else.
e.]The entire transaction log backup is restored at DR and then Uncommitted transactions are rolled back (which is fine). It is not that only committed transactions go from production to DR. So there is unnecessary avoidable network traffic.
f.] The physical transaction log backup file has to be copied to the DR causing avoidable network traffic and causing pressure on the DPC [NIC] CPU's.
h.] With log-shipping, there is always a scope for data loss in case of disasters. And companies do compromise on this front which is not needed.
i.] Too many jobs are in place to run log-shipping (to backup logs at source, copy them to DR and restore them at DR).
j.] Too many time-consuming processes involved when failure occurs especially if the database involved in huge (VLDB).
k.] Failover is always manual.
l.] Even if the data loss is of 10 minutes, the actual downtime would be atleast an hour at a minimum. This directly means loss of revenue for that period.
m.] Manual user redirection in case of failure of the main server/database is needed.
n.] Restored transaction log backup files on DR have to be deleted. Continuous copying and deleting of files make the DR look like a file server and of couse, causes the discs over there to get fragmented much faster.
Drawbacks of Replication:
a.] Complex process to set up involving snapshot replication, etc to begin with.
b.] In certain types of replication, an extra column (GUID) is created at the end of each replicated table causing an increase in disc space usage.
c.] Complex conflict resolution rules have to be put in place during setup.
d.] A failure in replication could actually mean reinitialization of subscriptions most of the time. That's a time-consuming process and means that DR would not be technically existing during that period if replication is being used for DR purposes.
e.] Two things to manage instead of one. You need to manage Production database as well as the replication for failures, conflicts, maintenance, etc.
f.] Needs some skill to manage replication and its own set of problems.
g.] Of course there are certain merits in replication like updatable subscriptions, merge replication, bi-directional transactional replication, availability of the subscriber database for reads/writes, etc. But the drawbacks above make replication a more tediuos and time-consuming operation/process than necessary.
h.] Manual failover and user redirection in case of failure of the main server/database.
What is the easiest alternative? Database Mirroring is. Why? Because of the following reasons:
a.] Easy to set up, easier still if log-shipping is already in place and needs to be replaced with database mirroring.
b.] Easy to monitor. Inbuilt Microsoft Database Mirroring Monitor in SQL Server Management Studio.
c.] Manual or automatic failover.
d.] Separate counters in Windows Performance Monitor for database mirroring.
e.] Dynamic Management Views for up to the point-in-time information on the principal and mirror databases, lag time between mirror and principal and so on.
f.] Minimal or no data loss even in asynchronous mode.
g.] Guaranteed zero-data-loss in synchronous mode.
h.] Possibility of automatic user redirection in case of failures.
i.] Hardly requires maintenance. Set it up and forget it.
j.] Easy to failover. It's a one line code to failover and failback.
k.] Makes other maintenance jobs like Windows patching and SQL Server patching easy with no downtime.
l.] Possibility to use mirror for reporting application using the snapshot feature.
m.] Requires least amount of maintenance of all the DR options.
n.] Easy to use.
o.] Easy to change between different modes of mirroring (synchronous and asynchronous).
p.] Only committed transactions flow to mirror. So less network banmndwidth consumed.
q.] No file transfers involved.
r.] No conflict resolutions needed.
s.] Miniam disc fragmentation as no copying and deleting fo files is involved.
t.] Maintenance actions on principal database automatically move over to mirror. I am talking of operations like shrinking log/data files, etc.
I am not going into the set up of mirroring as it is simple, easy to use, and easy to code. In roughly 10-15 lines of TSQL code, mirroring can be established in the desired mode.
Some points which I would consider important with respect to mirroring are as follows:
Do not install mirror database in a different node of the same cluster where principal resides. This would be a disastrous approach in an active-active cluster when one node has to failover to the other. Always keep you mirror database on a separate cluster or machine if you don't have a cluster.
Do not keep the transaction log file size unnecessarily large. If say your peak transaction log file consumption is 5GB for example, it would be good to have the actual file size within 10GB to 20GB. Don't keep it at say 50GB. That slows down writes to the transaction log file on principal and hence also on mirror. The point is don't keep too much empty space in the transaction log file on principal. Expand it as needed.
Don't create full or differential database backups on principal when you are in the process of restoring logs on to the mirror database to bring it upto speed with the principal database. If by mistake that does happen, stop any scheduled transaction log backups on principal and let the mirror sync up wth principal on its own although it would take a few minutes for the mirror to catch up and get synchronized with the principal this way. While the catching-up process is going on, it would be better to stop transaction log backups on the principal as well. That is okay. Patience helps in this avoidable scenario. After the synchronization is established, turn on the jobs for transaction log backups, full backups and differentail backups back on the principal. Please do consider this point during the setup process for database mirroring just in this avoidable case where somebody performs a full or differential database backup of the principal database by mistake while you are setting up database mirroring.
Start with asynchronous mode to begin with. Observe for a few days how the transactions are doing, whether there is any piling-up of log on mirror and how often the entities are not in sync per day. If you find synchonization 90% of the time, you can consider moving on to synchronous mode if need be.
I would suggest using TSQL code instead of the GUI for the setup. There is a lot more flexibility in TSQL coding than in the GUI. And it is a lot easier to debug/rollback/comeback/startover with TSQL during the setup process if something goes really bad due to circumstances beyond your control. Mirroring is so straightforward that it would not go wrong if you are following the 10-11 simple steps to setting up mirroring. The GUI is good for monitoring though and for setting up alerts.
Do not use synchronous mode if the CPU utilization is more than 50% on your principal server averaged for a given hour. This can potentially cause unwanted automatic failovers. In such a case, either set right you application / database code/architecture/design/indexes, etc first or use asynchronous mode of mirroring only.
suresh_m_kumar@hotmail.com
Thursday, September 4, 2008
A Tricky Scenario with T-SQL
There is a supplier of Dell, HP, and IBM servers. Servers are shipped from the warehouse to three locations. Each location is represented by a location code. Each machine to be shipped to the warehouse has a ten digit Machine_X_Code that has leading zeroes for data that has less than ten digits in it. All this is represented roughly by the Main_Table below:
create table Main_Table
(Location_Code smallint not null
,Machine_Brand varchar(25) not null
,Machine_X_Code varchar(10) not null
,Number_Of_CPUs tinyint not null
,Procurement_Date Datetime not null default GETDATE()
constraint PK_LC_MC Primary Key Clustered (Location_Code, Machine_X_Code, Procurement_Date) with fillfactor = 70)
go
There is another table called Code_Generation_Table. This table determines what the Machine_X_Code for the next procured machine would be. For a given combination of store location code and a machine brand in this Code_Generation_Table, the Machine_X_Code for the Main_Table needs to be generated. The new machine_X_Code generated should be within the Start_code and End_Code range for that Location_Code and Machine_Brand. The table is as follows:
create table Code_Generation_Table
(Location_Code smallint not null
,Machine_Brand varchar(25) not null
,Start_Code int not null
,Current_Code int not null
,End_Code int not null
,Increment_Value tinyint not null
constraint PK_LC_MB Primary Key clustered (Location_Code, Machine_Brand) with fillfactor = 90)
go
Lets populate these two tables now with relevant data:
insert into Code_Generation_Table
select 1, 'DELL', 1001, 1001, 1005, 1
union all
select 2, 'DELL', 4001, 4001, 4005, 1
union all
select 3, 'DELL', 7001, 7001, 7005, 1
union all
select 1, 'HP', 2001, 2003, 2005, 1
union all
select 2, 'HP', 5001, 5003, 5005, 1
union all
select 1, 'IBM', 3001, 3001, 3005, 1
union all
select 2, 'IBM', 8001, 8003, 8005, 1
union all
select 3, 'IBM', 6001, 6004, 6005, 1
union all
select 3, 'SUN', 6011, 6011, 6015, 1
go
insert into Main_Table (Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs)
select 1, 'DELL', '0000001001', 2
union all
select 1, 'DELL', '0000001002', 4
union all
select 1, 'DELL', '0000001003', 8
union all
select 1, 'DELL', '0000001004', 16
union all
select 1, 'HP', '0000002002', 2
union all
select 1, 'HP', '0000002003', 4
union all
select 1, 'HP', '0000002004', 8
union all
select 1, 'HP', '0000002005', 32
union all
select 1, 'IBM', '0000003003', 2
union all
select 1, 'IBM', '0000003004', 8
union all
select 1, 'IBM', '0000003005', 32
union all
select 2, 'DELL', '0000004001', 2
union all
select 2, 'DELL', '0000004002', 4
union all
select 2, 'DELL', '0000004003', 8
union all
select 3, 'DELL', '0000007004', 16
union all
select 2, 'HP', '0000005002', 2
union all
select 2, 'HP', '0000005003', 4
union all
select 2, 'HP', '0000005004', 8
union all
select 2, 'HP', '0000005005', 32
union all
select 3, 'IBM', '0000006001', 2
union all
select 3, 'IBM', '0000006003', 2
union all
select 3, 'IBM', '0000006004', 8
union all
select 3, 'IBM', '0000006005', 32
Go
Here are the conditions that must be met for generating the Machine_X_Code:
1. Start from Current_Code value in the table, Code_Generation_Table.
2. Look for any values between Current_Code and End_Code that don't already exist in
the Machine_X_Code column of the Main_Table for the given Location_Code and
Machine_Brand. That becomes the new Machine_X_Code.
3. Given a scenario like the one below in the table Code_Generation_Table, the following conditions must also be met:
Location_Code, Machine_Brand, Start_Code, Current_Code, End_Code, Increment_Value
1 DELL 1001 1003 1005 1
Scenario 1:
If there is no record in Main_Table for that Machine Brand, then new Machine_X_Code needs to be 0000001003.
Scenario 2:
If Main_Table has records with Machine_X_Codes 0000001003, 0000001004, 0000001005 then the new Machine_X_Code will be 0000001001.
Scenario 3:
If Main_Table has all records with Machine_X_Code between and including 0000001001 to 0000001005 except 0000001002, then the next newly generated Machine_X_Code should be 0000001002.
The requirements having been made clear, let us see how a developer approached this problem. He chose to create a stored procedure (SP), usp_Generate_Machine_X_Code and a user-defined function (UDF), F_Validate_Machine_Code whihc is invoked from within the SP. The SP gives out the new Machine_X_Code. The unnctionality of the code below is explained in more granular detail just before the object definitions below. The code for both is given below.
Code for the user-defined function (UDF), F_Validate_Machine_Code:
--UDF
--This function checks that thet newly generated Machine_X_Code lies between
--the Start_Code and the End_Code. If it does, then the newly generated Machine_X_Code
--represented by the variable @l_Temp_Machine_X_Value is valid. If by chance, the newly
--generated Machine_X_Code lies outside the range of Start_Code and End_Code, then
--the new machine_x_code is initialized to the Start_Code generated Machine_X_Code
--reprsented by the variable, @l_Start_Machine_X_Value. The newely generated
--machine_X_code, the Start_Code and the ENd_Code are inputs to this function.
--A supposedly valid Machine_X_Code is returned by this function. The output however
--is not guarenteed to be valid unless it is re-checked for validity in the SP,
--usp_Generate_Machine_X_Code. If it is found to be invalid, then this
--user-defined function is invoked again and again as need be.
CREATE Function DBO.F_Validate_Machine_Code
(@l_Temp_Machine_X_Value Int,
@l_Start_Machine_X_Value int,
@l_End_Machine_X_Value int
) Returns Integer
AS
Begin
-- This function validates the whether the newly generated Machine_X_Code lies within the Range of Start_Code and End_Code.
If (@l_Temp_Machine_X_Value > @l_Start_Machine_X_Value)
or (@l_Temp_Machine_X_Value < @l_End_Machine_X_Value)
Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value
If (@l_Temp_Machine_X_Value < @l_Start_Machine_X_Value)
or (@l_Temp_Machine_X_Value > @l_End_Machine_X_Value)
Set @l_Temp_Machine_X_Value = @l_Start_Machine_X_Value
Return @l_Temp_Machine_X_Value
End
GO
Code for the stored procedure (SP), usp_Generate_Machine_X_Code:
--This SP's purpose is to generate a new Machine_X_Code. It generates a new
--Machine_X_Code from the Current_Code column of the table, Code_Generation_Table
--while checking that the newly generated Machine_X_Code does not already exist
--in the Machine_X_Code column of the table Main_Table. In addition it call the
--user-defined function F_Validate_Machine_Code to check that a newly generated
--Machine_X_Code lies between Start_Code and End_Code in the table, Code_Generation_Table.
--If the newly generated machin_x_code falls outside that range, then the machine_x_code
--is reset to Start_Code. In this whole process it keeps updating the Current_Code column
--of the table, Code_Generation_Table with the newly generated Machine_X_Code before its
--validation through the function, F_Validate_Machine_Code. It also updates the the
--Current_Code column of the table, Code_Generation_Table with the next possible
--Machine_X_Code value. Looping occurs till a valid Machine_X_Code is found. So,
--effectively there would be as many iterations of the loop as the number of attempts to
--find one good Machine_X_Number.
Create Procedure usp_Generate_Machine_X_Code
@i_Location_Code Smallint,
@i_Machine_Brand VarChar(10),
@o_Machine_X_Code Integer OutPut,
@o_Return_Value Smallint OutPut
As
Begin
Set nocount on
Set @o_Return_Value = 0
Declare @l_Increment_Value Smallint,
@l_Temp_Machine_X_Value Integer,
@l_Next_Machine_X_Value Integer,
@l_Start_Machine_X_Value Int,
@l_End_Machine_X_Value Integer,
@l_First_Machine_X_Value_Taken Integer,
@l_Affected_Rows Smallint,
@l_Execution_String VarChar(2000)
Select @l_Temp_Machine_X_Value = Current_Code
,@l_Increment_Value = Increment_Value
,@l_Start_Machine_X_Value = Start_Code
,@l_End_Machine_X_Value = End_Code
From DBO.Code_Generation_Table With (NoLock)
Where Location_Code = @i_Location_Code
And Machine_Brand = @i_Machine_Brand
Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
Set @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Set @l_Affected_Rows = 0
While @l_Affected_Rows = 0
Begin
Set @l_Next_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value
Set @l_Next_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Next_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
Set @l_Execution_String = 'Update A Set A.Current_Code = ' + Cast(@l_Next_Machine_X_Value as VarChar) +
' From Dbo.Code_Generation_Table A '+
+ ' Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) + ' And Machine_Brand = ' + ''''+ @i_Machine_Brand + '''' +
' And Not Exists (Select ' + ''''
+ 'x' + '''' + ' From Main_Table ' +
'Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) +
' AND Machine_Brand = ' + '''' + @i_Machine_Brand + '''' +
' And Machine_X_Code = ' + '''' + Replicate('0',10 - Len(@l_Temp_Machine_X_Value)) + Cast(@l_Temp_Machine_X_Value as Varchar(10)) + '''' + ')'
Exec (@l_Execution_String)
Set @l_Affected_Rows = @@Rowcount
If @l_Affected_Rows = 1
Begin
Select @o_Machine_X_Code = @l_Temp_Machine_X_Value, @o_Return_Value = 1
Break
End
Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value
Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
If @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Begin
-- For the current day all the Machine X Codes have been used up.
Select @l_Affected_Rows = 2, @o_Return_Value = -1
Break
End
End
Set nocount off
End
GO
Before we proceed further, please familiarize yourself with the logic of the SP and UDF, and the data in the two tables so that we can try to execute his logic. Let us try to generate a new Machine_X_Code for the Location_Code = 1 and Brand_Name = 'Dell'.
TEST 1:
Execute the following:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 1
and Machine_Brand = 'Dell'
The result set is as follows:
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
1 DELL 0000001001 2
1 DELL 0000001002 4
1 DELL 0000001003 8
1 DELL 0000001004 16
So the next generated machine_x_code to has to be 1005. Remember zeroes are suffixed later and that is not so important right now. What is important is generating the new Machine_X_Code that meets all the requirements above.
Now run the following code:
select * from Code_Generation_Table
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
1 HP 2001 2003 2005 1
1 IBM 3001 3001 3005 1
2 DELL 4001 4001 4005 1
2 HP 5001 5003 5005 1
2 IBM 8001 8003 8005 1
3 DELL 7001 7001 7005 1
3 IBM 6001 6004 6005 1
3 SUN 6011 6011 6015 1
Since the codes (1001, 1002, 1003 and 1004) already exists in the Main_Table for Location_Code =1 and Machine_Brand = 'Dell', the next one has to be 1005 as it exists in the Code_Generation_Table between Start_Code and End_Code. Let's see if the SP generates this machine code. Execute the following code to execute the SP:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 1
,@i_Machine_Brand = 'Dell'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
1005 1
We got the exact machine code. Now let's check the table Code_Generation_Table for the column Current_Code by executing the following code:
select * from Code_Generation_Table
where Location_Code = 1
and Machine_Brand = 'Dell'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
You may note that the Current_Code has been updated. It has actually been initialized with the next possible Current_Code because 1005 has been taken by us via execution of the SP. So the next available value can be 1006. But since 1006 is outside the range of 1001 and 1005, the next possible value has to be re-initialized to the start_code which is 1001. So this condition has also been satisfied. Very well.
TEST 2:
Now let's test another condition:
Scenario 1:
For Location_Code = 2 and Machine_Brand = 'IBM', there is no record in the Main_Table. Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
(0 rows affected)
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8003 8005 1
So as per the given condition, the new machine_x_code shoudl be 8003 and the Current_Code column of the table, Code_Generation_Table should get updated with 8004. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 2
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
8003 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8004 8005 1
So the new Machine_X_Number generated in 8003 which is perfect. And the Crrent_Code column has also been updated to the next possible Machine_X_Number which is 8004. So the Scenario 1 has been satisfied.
TEST 3:
Let's check out if Scenario 2 is satisfied:
For Location_Code = 2 and Machine_Brand = 'HP', there are four records in the Main_Table with machine codes (0000005002, 0000005003, 0000005004 and 0000005005). Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'HP'
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
2 HP 0000005002 2
2 HP 0000005003 4
2 HP 0000005004 8
2 HP 0000005005 32
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5003 5005 1
So as per the given condition for Scenario 2, the new machine_x_code generated should be 5001 and the Current_Code column of the table, Code_Generation_Table should get updated with 8002. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 2
,@i_Machine_Brand = 'HP'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
5001 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5002 5005 1
So the new Machine_X_Number generated in 5001 which is perfect. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 5002. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exists within the range of Start_Code and End_Code. So the Scenario 2 has been satisfied.
TEST 4:
Let's check the Scenario 3, the final scenario:
For Location_Code = 3 and Machine_Brand = 'IBM', there are four records in the Main_Table with machine codes (0000006001, 0000006003, 0000006004 and 0000006005). Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
3 IBM 0000006001 2
3 IBM 0000006003 2
3 IBM 0000006004 8
3 IBM 0000006005 32
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6004 6005 1
So as per the given condition for Scenario 3, the new machine_x_code generated should be 6002 and the Current_Code column of the table, Code_Generation_Table should get updated with 6003. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 3
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
6002 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6003 6005 1
So the new Machine_X_Number generated in 6002 which was the missing code in Main_Table for Location_Code = 3 and Machine_Brand = 'IBM'. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 6003. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exist within the range of Start_Code and End_Code. So the Scenario 3 has been satisfied.
The developer has satisifed all the functional requirements. Good job. Now what are the drawbacks in his code? Let's enlist them:
1. He uses an SP and a UDF, two database objects for one purpose. This is a minor drawback though.
2. Looping is being used. In a rare case, it may have just one iteration. But in almost all cases it would have multiple iterations upto a maximum of five iterations. We have a simple set of scenarios here. If the range between Start_Code and Stop_Code runs into several hundreds, the looping will happen those many times most of the time. And the invocation of the UDF would take place double those many times for generating just one new Machine_X_Number.
3. The Update to Code_Generation_Table takes place as many times as the number of iterations which could run into several hundred times. So many UPDATE's just to generate one Machine_X_Number seems too much processing. It should be avoided.
4. The code is accessing the Main_Table as many times as the number of iterations. In our case the Main_Table has just 22 rows. In the real situation, it actually has hundreds of thousands of rows. Why scan that table so many times just to generate one valid Machine_X_Number?
5. The SP performs record-by-record operation or row-by-row. An attempt should be made to perform set based operation for faster and efficient execution. That would also reduce scans of the underlying tables.
These are only a few drawbacks and there are others like usage of dynamic T-SQL which is not needed in this case, etc. We won't go into those detailed aspects in this article. We will concentrate more on logical programming.
Let's try to satisfy the same scenarios after making some refinements to the SP and by eliminating the user-defined function altogether.
A refined version of the SP is given below:
--This SP's purpose is to generate a new Machine_X_Code. It generates a new
--Machine_X_Code from the Current_Code column of the table, Code_Generation_Table
--while checking that the newly generated Machine_X_Code does not already exist
--in the Machine_X_Code column of the table Main_Table. It also checks that a newly
--generated Machine_X_Code lies between Start_Code and End_Code in the table,
--Code_Generation_Table.
--If the newly generated machine_x_code falls outside that range, then the machine_x_code
--is reset to Start_Code. In this whole process it updates the Current_Code column
--of the table, Code_Generation_Table with the newly generated Machine_X_Code once
--or at the most twice. The function, F_Validate_Machine_Code has been eliminated altogether.
--It also updates the the Current_Code column of the table, Code_Generation_Table with
--the next possible Machine_X_Code value. If at all, Looping occurs only twice in a rare case.
--So effectively, there would be no looping except in a scenario where there would be two iterations
--of the code when a new machine_x_code does not exist between the Current_Code and
--the End_Code. In such a scenario, it would check the valid machine codes from Start_Code
--through Current_Code in the second iteration.
Create Procedure usp_Generate_Machine_X_Code
@i_Location_Code Smallint,
@i_Machine_Brand VarChar(10),
@o_Machine_X_Code Integer OutPut,
@o_Return_Value Smallint OutPut
As
Begin
Set nocount on
Set @o_Return_Value = 0
/* WHILE Loop eliminated. Main_Table is now being referenced only once. New code below.
Set @o_Return_Value = 0
Declare @l_Increment_Value Smallint,
@l_Temp_Machine_X_Value Integer,
@l_Next_Machine_X_Value Integer,
@l_Start_Machine_X_Value Int,
@l_End_Machine_X_Value Integer,
@l_First_Machine_X_Value_Taken Integer,
@l_Affected_Rows Smallint,
@l_Execution_String VarChar(2000)
Select @l_Temp_Machine_X_Value = Current_Code
,@l_Increment_Value = Increment_Value
,@l_Start_Machine_X_Value = Start_Code
,@l_End_Machine_X_Value = End_Code
From DBO.Code_Generation_Table With (NoLock)
Where Location_Code = @i_Location_Code
And Machine_Brand = @i_Machine_Brand
Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
Set @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Set @l_Affected_Rows = 0
While @l_Affected_Rows = 0
Begin
Set @l_Next_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value
Set @l_Next_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Next_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
Set @l_Execution_String = 'Update A Set A.Current_Code = ' + Cast(@l_Next_Machine_X_Value as VarChar) +
' From Dbo.Code_Generation_Table A '+
+ ' Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) + ' And Machine_Brand = ' + ''''+ @i_Machine_Brand + '''' +
' And Not Exists (Select ' + ''''
+ 'x' + '''' + ' From Main_Table ' +
'Where Location_Code = ' + Cast(@i_Location_Code AS VarChar) +
' AND Machine_Brand = ' + '''' + @i_Machine_Brand + '''' +
' And Machine_X_Code = ' + '''' + Replicate('0',10 - Len(@l_Temp_Machine_X_Value)) + Cast(@l_Temp_Machine_X_Value as Varchar(10)) + '''' + ')'
Exec (@l_Execution_String)
Set @l_Affected_Rows = @@Rowcount
If @l_Affected_Rows = 1
Begin
Select @o_Machine_X_Code = @l_Temp_Machine_X_Value, @o_Return_Value = 1
Break
End
Set @l_Temp_Machine_X_Value = @l_Temp_Machine_X_Value + @l_Increment_Value
Set @l_Temp_Machine_X_Value = DBO.F_Validate_Machine_Code (@l_Temp_Machine_X_Value, @l_Start_Machine_X_Value, @l_End_Machine_X_Value)
If @l_First_Machine_X_Value_Taken = @l_Temp_Machine_X_Value
Begin
-- For the current day all the Machine_X_Codes have been used up.
Select @l_Affected_Rows = 2, @o_Return_Value = -1
Break
End
*/
declare @l_next_Machine_X_Code int
,@l_Start_Machine_X_Code int
,@l_End_Machine_X_Code int
,@l_Current_Machine_X_Code int
,@l_Increment_Value tinyint
select @l_Start_Machine_X_Code = Start_Code
,@l_End_Machine_X_Code = End_Code
,@l_Increment_Value = Increment_Value
,@l_Current_Machine_X_Code = Current_Code
from Code_Generation_Table
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
if ((@l_Current_Machine_X_Code < @l_Start_Machine_X_Code) or (@l_Current_Machine_X_Code > @l_End_Machine_X_Code))
begin
Start_Processing:
update Code_Generation_Table
set @l_Current_Machine_X_Code = Current_Code = @l_Start_Machine_X_Code
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
end
if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table
create table #Temp_Code_Table
(Existing_Machine_Codes int
,Valid_machine_Codes int
,Serial_No int identity(0,1))
insert into #Temp_Code_Table (Existing_Machine_Codes, Valid_machine_Codes)
select convert(int, Machine_X_Code), @l_Current_Machine_X_Code from dbo.Main_Table
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
order by Machine_X_Code
if ((select COUNT(1) from #Temp_Code_Table) = 0)
begin
select @l_next_Machine_X_Code = @l_Current_Machine_X_Code
goto Final_Validation
end
update #Temp_Code_Table
set Valid_machine_Codes = Valid_machine_Codes + (Serial_No * @l_Increment_Value)
select @l_next_Machine_X_Code = min(a.Valid_machine_Codes) from #Temp_Code_Table a
where a.Valid_machine_Codes not in (select b.Existing_Machine_Codes from #Temp_Code_Table b)
and a.Valid_machine_Codes between @l_Start_Machine_X_Code and @l_End_Machine_X_Code
if (@l_next_Machine_X_Code is null)
begin
select @l_next_Machine_X_Code = max(Valid_machine_Codes) + @l_Increment_Value from #Temp_Code_Table
end
Final_Validation:
if (@l_next_Machine_X_Code > @l_End_Machine_X_Code)
begin
goto Start_Processing
end
if (@l_next_Machine_X_Code < @l_Start_Machine_X_Code) or (@l_next_Machine_X_Code > @l_End_Machine_X_Code)
begin
select @l_next_Machine_X_Code = @l_Start_Machine_X_Code
,@o_Return_Value = -1
set nocount off
if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table
return
end
if (@l_next_Machine_X_Code >= @l_Start_Machine_X_Code) and ((@l_next_Machine_X_Code >= @l_Start_Machine_X_Code) and (@l_next_Machine_X_Code <= @l_End_Machine_X_Code))
begin
update Code_Generation_Table with (readpast, rowlock, xlock)
set Current_Code = case
when @l_next_Machine_X_Code = @l_End_Machine_X_Code then @l_Start_Machine_X_Code
else @l_next_Machine_X_Code + @l_Increment_Value
end
where Location_Code = @i_Location_Code
and Machine_Brand = @i_Machine_Brand
if (@@ROWCOUNT <> 0)
begin
select @o_Machine_X_Code = @l_next_Machine_X_Code
,@o_Return_Value = 1
end
else
set @o_Return_Value = -1
end
if object_id ('tempdb.dbo.#Temp_Code_Table') is not null
drop table #Temp_Code_Table
set nocount off
return
End
GO
The new logic in the SP is given in the comments before the Create Procedure statement. The calls to function have been eliminated. This SP checks for valid Machine_X_Code values between Current_Code and End_Code. If not found, it checks for valid Machine_X_Code values between Start_Code and Current_Code. This is because there is a functional resttriction that the search should always start from the Current_Code only.
Let's now start from the beginning by emptynig (truncating) bith the tables, Code_Generation_Table and Main_Table and filling (inserting) the same data as above into them afresh by executing the following code:
truncate table Code_Generation_Table
truncate table Main_Table
Go
insert into Code_Generation_Table
select 1, 'DELL', 1001, 1001, 1005, 1
union all
select 2, 'DELL', 4001, 4001, 4005, 1
union all
select 3, 'DELL', 7001, 7001, 7005, 1
union all
select 1, 'HP', 2001, 2003, 2005, 1
union all
select 2, 'HP', 5001, 5003, 5005, 1
union all
select 1, 'IBM', 3001, 3001, 3005, 1
union all
select 2, 'IBM', 8001, 8003, 8005, 1
union all
select 3, 'IBM', 6001, 6004, 6005, 1
union all
select 3, 'SUN', 6011, 6011, 6015, 1
go
insert into Main_Table (Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs)
select 1, 'DELL', '0000001001', 2
union all
select 1, 'DELL', '0000001002', 4
union all
select 1, 'DELL', '0000001003', 8
union all
select 1, 'DELL', '0000001004', 16
union all
select 1, 'HP', '0000002002', 2
union all
select 1, 'HP', '0000002003', 4
union all
select 1, 'HP', '0000002004', 8
union all
select 1, 'HP', '0000002005', 32
union all
select 1, 'IBM', '0000003003', 2
union all
select 1, 'IBM', '0000003004', 8
union all
select 1, 'IBM', '0000003005', 32
union all
select 2, 'DELL', '0000004001', 2
union all
select 2, 'DELL', '0000004002', 4
union all
select 2, 'DELL', '0000004003', 8
union all
select 3, 'DELL', '0000007004', 16
union all
select 2, 'HP', '0000005002', 2
union all
select 2, 'HP', '0000005003', 4
union all
select 2, 'HP', '0000005004', 8
union all
select 2, 'HP', '0000005005', 32
union all
select 3, 'IBM', '0000006001', 2
union all
select 3, 'IBM', '0000006003', 2
union all
select 3, 'IBM', '0000006004', 8
union all
select 3, 'IBM', '0000006005', 32
Go
Having filled up the data, it's time to test this new version of the SP.
Let us try to generate a new Machine_X_Code for the Location_Code = 1 and Brand_Name = 'Dell'.
TEST 1:
Execute the following:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 1
and Machine_Brand = 'Dell'
The result set is as follows:
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
1 DELL 0000001001 2
1 DELL 0000001002 4
1 DELL 0000001003 8
1 DELL 0000001004 16
So the next generated machine_x_code to has to be 1005. Remember zeroes are suffixed later and that is not so important right now. What is important is generating the new Machine_X_Code that meets all the requirements above.
Now run the following code:
select * from Code_Generation_Table
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
1 HP 2001 2003 2005 1
1 IBM 3001 3001 3005 1
2 DELL 4001 4001 4005 1
2 HP 5001 5003 5005 1
2 IBM 8001 8003 8005 1
3 DELL 7001 7001 7005 1
3 IBM 6001 6004 6005 1
3 SUN 6011 6011 6015 1
Since the codes (1001, 1002, 1003 and 1004) already exist in the Main_Table for Location_Code = 1 and Machine_Brand = 'Dell', the next one has to be 1005 as it exists in the Code_Generation_Table between Start_Code and End_Code. Let's see if the SP generates this machine code. Execute the following code to execute the SP:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 1
,@i_Machine_Brand = 'Dell'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
1005 1
We got the exact machine code. Now let's check the table Code_Generation_Table for the column Current_Code by executing the following code:
select * from Code_Generation_Table
where Location_Code = 1
and Machine_Brand = 'Dell'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
1 DELL 1001 1001 1005 1
You may note that the Current_Code has been updated. It has actually been initialized with the next possible Current_Code because 1005 has been taken by us via execution of the SP. So the next available value can be 1006. But since 1006 is outside the range of 1001 and 1005, the next possible value has to be re-initialized to the start_code which is 1001. So this condition has also been satisfied by the new version of the SP.
TEST 2:
Now let's test another condition:
Scenario 1:
For Location_Code = 2 and Machine_Brand = 'IBM', there is no record in the Main_Table. Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
(0 rows affected)
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8003 8005 1
So as per the given condition, the new machine_x_code shoudl be 8003 and the Current_Code column of the table, Code_Generation_Table should get updated with 8004. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 2
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
8003 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 IBM 8001 8004 8005 1
So the new Machine_X_Number generated in 8003 which is perfect. And the Crrent_Code column has also been updated to the next possible Machine_X_Number which is 8004. So the Scenario 1 has been satisfied by the new SP.
TEST 3:
Let's check out if Scenario 2 is satisfied:
For Location_Code = 2 and Machine_Brand = 'HP', there are four records in the Main_Table with machine codes (0000005002, 0000005003, 0000005004 and 0000005005). Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 2
and Machine_Brand = 'HP'
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
2 HP 0000005002 2
2 HP 0000005003 4
2 HP 0000005004 8
2 HP 0000005005 32
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5003 5005 1
So as per the given condition for Scenario 2, the new machine_x_code generated should be 5001 and the Current_Code column of the table, Code_Generation_Table should get updated with 8002. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 2
,@i_Machine_Brand = 'HP'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
5001 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 2
and Machine_Brand = 'HP'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
2 HP 5001 5002 5005 1
So the new Machine_X_Number generated in 5001 which is perfect. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 5002. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exists within the range of Start_Code and End_Code. So the Scenario 2 has been satisfied by the new SP.
TEST 4:
Let's check the Scenario 3, the final scenario:
For Location_Code = 3 and Machine_Brand = 'IBM', there are four records in the Main_Table with machine codes (0000006001, 0000006003, 0000006004 and 0000006005). Let's check by executing the following code:
select Location_Code, Machine_Brand, Machine_X_Code, Number_Of_CPUs from Main_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
Location_Code Machine_Brand Machine_X_Code Number_Of_CPUs
------------- ------------------------- -------------- --------------
3 IBM 0000006001 2
3 IBM 0000006003 2
3 IBM 0000006004 8
3 IBM 0000006005 32
Let's check in the Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6004 6005 1
So as per the given condition for Scenario 3, the new machine_x_code generated should be 6002 and the Current_Code column of the table, Code_Generation_Table should get updated with 6003. So let's execute the SP from the developer by executng the following code:
declare @i_Location_Code Smallint
,@i_Machine_Brand VarChar(10)
,@o_Machine_X_Code Integer
,@o_Return_Value Smallint
select @i_Location_Code = 3
,@i_Machine_Brand = 'IBM'
,@o_Return_Value = 0
execute usp_Generate_Machine_X_Code
@i_Location_Code
,@i_Machine_Brand
,@o_Machine_X_Code OutPut
,@o_Return_Value OutPut
select @o_Machine_X_Code Machine_Code
,@o_Return_Value Return_Value
Here is the result set:
Machine_Code Return_Value
------------ ------------
6002 1
It is perfect. Let's check the Current_Code column of the table, Code_Generation_Table by executing the following code:
select * from Code_Generation_Table
where Location_Code = 3
and Machine_Brand = 'IBM'
The result set is as follows:
Location_Code Machine_Brand Start_Code Current_Code End_Code Increment_Value
------------- ------------ ----------- ------------ ----------- ---------------
3 IBM 6001 6003 6005 1
So the new Machine_X_Number generated in 6002 which was the missing code in Main_Table for Location_Code = 3 and Machine_Brand = 'IBM'. And the Current_Code column has also been updated to the next possible Machine_X_Number which is 6003. The next possible Machine_X_Number should be equal to the output of the SP + Increment_Value and should exist within the range of Start_Code and End_Code. So the Scenario 3 has been satisfied by the new SP.
We too have satisifed all the functional requirements with less scan of the tables Main_Table and Code_Generation_Table. We are not looping except in one scenario where we loop only twice at the most. No matter what the range of values between Start_Code and End_COde, we would scan any table once or at the most twice. Update to Code_Generation_Table also happenns only two times at the most. So the new code can handle larger volumes of data than the original code.
There is another method of optimiznig this SP further whereby there is only once scan of any table. May be in my next article, I would write about it as I am quite tired today. I feel any SP can be optimized no matter how optimized it already is. The dead end is always too distant. I hope you have enjoyed the article. Please let me know of your feedback which is always very welcome.
Thursday, March 13, 2008
TSQL and Excel - Data Exchanges
Let's complicate the requirement further. We have to write today's Top three stocks from the database on to a spreadsheet, i.e. the top three gainers in the Stock Market today. We have to also write the top three losing stocks in the Stock Market today into the same Excel spreadsheet but in a different tab. This single Excel spreadsheet with the two tabs, one for gainers and one for losers needs to be e-mailed to some of the investors daily. We would put the e-mailing issue aside as it is quite simple using SQL Server 2005 Database Mail or SQL Server 2000 SQL Mail.
The second requirement is that the data in both the Excel spreadsheet tabs needs to be overwritten with fresh data daily. The source of all data is just one database table which is fed daily. There are two more tabs in the spreadsheet that will have the consolidated gainers and losers respectively for all days as the days progress.
It might have been so simple if this overwritting on two tabs (Gainers and Losers) and appending onto two other tabs (AllGainers and AllLosers) was not required. It would have also been so simple if we could have had four separate spreadsheets instead of four tabs in one single spreadsheet. Then we could have simply used the simple BCP and added a little bit of complication in the bcp command to include the Column Headers in there. But that is not the case as we need to keep the solution simple. Some people say, use CLR and do it row by row. Some say, use VC++.
The issue here is to find a simple solution and not complicate the solution further. We don't want assemblies and DLL's to be prepared, compiled, registered and then modified whenever something needs to be changed. So CLR, VC++ and the like are not an option. We are talking about efficient programming here. And since we are talking about programming and not about packages, DTS and SSIS are also not an option. We need minimum maintenance too. What do we do?
We resort to our contemporary T-SQL database programming language. T-SQL is simple, neat, prolific, dynamic and very flexible. Why not explore OPENDATASOURCE and OPENROWSET? Let's go about solving the problem now.
So let me prepare the data first. Let's create a table called Table_Stock_Value and populate the same. This article is just about concepts. So please don't bother about constraints, indexes and fillfactors right now. All symbols in the data are hypothetical.
if exists (select 1 from sys.tables
where name = 'Table_Stock_Value'
and type = 'U')
drop table Table_Stock_Value
go
create table Table_Stock_Value
(SerialNo int identity (1,1) not null
,NASDAQ_SYMBOL varchar(4) not null
,Company_Name varchar(25) not null
,Change decimal(5,2)
,Date_of_Change datetime not null default getdate())
go
insert into Table_Stock_Value (NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change)
--TOP GAINERS
select 'ABCD', 'ABCD Inc', 2.50, getdate()-3
union all
select 'EFGH', 'EFGH Inc', 2.00, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.50, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.75, getdate()-2
union all
select 'MNOP', 'MNOP Inc', 1.60, getdate()-2
union all
select 'ABCD', 'ABCD Inc', 1.00, getdate()-2
union all
select 'QRST', 'QRST Inc', 3.50, getdate()-1
union all
select 'UVWX', 'UVWX Inc', 2.50, getdate()-1
union all
select 'ABCD', 'ABCD Inc', 1.25, getdate()-1
union all
select 'IJKL', 'IJKL Inc', 4.95, getdate()
union all
select 'ABCD', 'ABCD Inc', 2.50, getdate()
union all
select 'MNOP', 'MNOP Inc', 1.25, getdate()
union all
select 'ABCD', 'ABCD Inc', 1.90, getdate()+1
union all
select 'EFGH', 'EFGH Inc', 1.20, getdate()+1
union all
select 'QRST', 'QRST Inc', 1.10, getdate()+1
union all
select 'ABCD', 'ABCD Inc', 1.10, getdate()+2
union all
select 'UVWX', 'UVWX Inc', 1.00, getdate()+2
union all
select 'QRST', 'QRST Inc', 0.95, getdate()+2
union all
select 'ABCD', 'ABCD Inc', 1.50, getdate()+3
union all
select 'EFGH', 'EFGH Inc', 1.25, getdate()+3
union all
select 'IJKL', 'IJKL Inc', 1.00, getdate()+3
UNION ALL
--TOP LOSERS
select 'BBBB', 'BBBB Inc', -1.50, getdate()-3
union all
select 'CCCC', 'CCCC Inc', -1.20, getdate()-3
union all
select 'DDDD', 'DDDD Inc', -1.10, getdate()-3
union all
select 'EEEE', 'EEEE Inc', -1.75, getdate()-2
union all
select 'FFFF', 'FFFF Inc', -1.60, getdate()-2
union all
select 'YYYY', 'YYYY Inc', -1.00, getdate()-2
union all
select 'IJKL', 'IJKL Inc', -0.70, getdate()-1
union all
select 'HHHH', 'HHHH Inc', -0.62, getdate()-1
union all
select 'CCCC', 'CCCC Inc', -0.25, getdate()-1
union all
select 'UUUU', 'UUUU Inc', -4.95, getdate()
union all
select 'LLLL', 'LLLL Inc', -2.50, getdate()
union all
select 'AAAA', 'AAAA Inc', -1.25, getdate()
union all
select 'UVWX', 'UVWX Inc', -1.90, getdate()+1
union all
select 'FFFF', 'FFFF Inc', -1.20, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -1.10, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -0.80, getdate()+2
union all
select 'BBBB', 'BBBB Inc', -0.75, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -0.65, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -1.50, getdate()+3
union all
select 'YYYY', 'YYYY Inc', -1.25, getdate()+3
union all
select 'QRST', 'QRST Inc', -1.00, getdate()+3
go
create clustered index ci_date on Table_Stock_Value (Date_of_Change) with fillfactor = 70
go
select * from Table_Stock_Value
go
Next for one time only, we need to prepare an Excel spreadsheet with four tabs in it. The names of the tabs are, Gainers, Losers, AllGainers, and AllLosers. And each tab can have the column names same or different than in the table, Table_Stock_Value created above. For simplicity, let's keep the column names same in the tabs. I have created and kept the Excel spreadsheet in the folder, C:\SQLXL. I have named the Excel spreadsheet as Stock_Overview.xls.
Please remember to close the Excel sheet before the initial compilation of the SP and before each instance of execution of the SP.
Now, let's set out to do what we need to do i.e. writing into the Excel spreadsheet's different tabs as per the requirements stated above. I have written the SP as follows. Please note that the SQL Server advanced configuration option, 'Ad Hoc Distributed Queries' should be set to 1 to read from and write to the Excel Spreadsheet from within a database Stored Procedure:
set ansi_warnings on
go
set ansi_nulls on
Go
execute sp_configure 'show advanced options', 1
go
reconfigure
go
execute sp_configure 'Ad Hoc Distributed Queries', 1
go
reconfigure
go
if exists (select 1 from sys.procedures
where type = 'P'
and [NAME] = 'DBP_Prepare_Daily_Stock_Data')
drop procedure [DBP_Prepare_Daily_Stock_Data]
GO
create procedure DBP_Prepare_Daily_Stock_Data
@i_date datetime = null
as
begin
declare @l_File_Exists int
select @i_date = isnull(@i_date, getdate())
execute master.dbo.XP_fileexist 'C:\SQLXL\Stock_Overview.xls', @l_File_Exists output
if (@l_File_Exists = 1)
begin
--Gainers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Gainers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--Losers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Losers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--UPDATE/Overwrite the Gainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Gainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
--UPDATE/Overwrite the Losers tab with top 3 losers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Losers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
--UPDATE/APPEND the AllGainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllGainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllLosers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
PRINT 'RESULTS HAVE BEEN WRITTEN TO C:\SQLXL\Stock_Overview.xls'
end
else
Begin
Print 'The file Stock_Overview.xls should exist in the correct path, C:\SQLXL.'
PRINT 'RESULTS COULD NOT BE WRITTEN TO C:\SQLXL\Stock_Overview.xls. THE FILE IS OPEN OR DOES NOT EXIST.'
End
end
GO
The SP being ready now, let's check if the all the tabs in the Excel spreadsheet,
C:\SQLXL\Stock_Overview.xls are empty:
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go
They are empty.
Let's execute the SP for the earliest date in the table Table_Stock_Value, i.e. getdate()-3:
declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go
Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data.
You can also execute the following T-SQL code from your SQL Server database to check the same:
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go
The data in the spreadsheet and the result sets above would match.
Let's execute the SP for the next date in the table Table_Stock_Value, i.e. getdate()-2:
declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go
Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data. Please close the spreadsheet after reading its contents.
To verfiy the results using T-SQL itself without opening the spreadsheet, you can use the following code too and verify the results:
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go
Note that we have met the requirement.
Let's do it for getdate()-1 and getdate() as well.
declare @i_date datetime
select @i_date = getdate()-1
exec DBP_Prepare_Daily_Stock_Data @i_date
Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the date, getdate()-1. You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date to the SP. The AllGainers and AllLosers tabs have their number of rows growing with each execution of the SP as new rows get appended behind the older rows thus building up a historical data for later analysis. And that's all we want.
Let's execute the SP for @i_date = getdate().
declare @i_date datetime
select @i_date = getdate()
exec DBP_Prepare_Daily_Stock_Data @i_date
go
Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the given input date, getdate(). You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date for top three gainers and losers respectively for the respective tabs. The AllGainers and AllLosers tabs have their number of rows growing with each execution as new rows get appended behind the older rows thus building up a historical data for later analysis. We are done with verifying that we have satisfied the requirements that we initially started out with.
You can checkout the contents of the spreasheet by just opening it from its location. Alternatively, you can use the following code. Please close the spreadsheet after physically reading its contents.:
select * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go
Please remember to close the Excel sheet before each execution of the SP.
Now let me throw a little bit of explanation behind overwriting and appending the Excel spreadsheet tabs:
First of all, in order to read from and write to an external data source like the Excel spreadsheet in this case while not using the BCP or BULK INSERT commands, the advanced SQL Server configuration option 'Ad Hoc Distributed Queries' must be set to 1. This way, writes to and reads from the Excel spreadsheet get enabled. This's what we did first.
Next, we UPDATE any existing records in the Gainers and Losers tabs with an empty string, ''. This is equivalent to DELETEing all existing records in both these tabs. Even then, the empty characher string remains in the existing cells although it is not visible.
The next step is to overwrite these existing cells in the Gainers and Losers tabs. For this purpose, we use the INSERT statement. Please note that we use the starting and ending cells of the first row in the Gainers and Losers tab in the OPENDATASOURCE statement (A1:E1). This will enable overwriting the existing cells. The smart thing here is that any empty strings '' are no more visible even if they exist somwhere else in the spreadsheet.
For the tabs AllGainers and AllLosers, since we don't want to overwrite existing data but instead want to append new data to any existing data, we omit mentioning the starting row A1:E1 in the OPENDATASOURCE function of the INSERT statement. That ensures that new rows are always appended to the existing rows in these two tabs.
You can use the spreadsheet and its tabs as if they were individual tables, for reading and performing calculations. You can apply all types of T-SQL constructs on that data to make aggregations, etc. Please note the sample T-SQL code above that we used to check the result sets in the tabs of the spreadsheet after executing the SP.
One important point to note is that before compiling the SP, the spreadsheet with the appropriate tabs in it should exist in the correct path mentioned in the SP. Otherwise the SP will not get compiled and will instead throw an error.
Another important point is that during compilation and execution of the SP, the spreadsheet itself should not be physically openend from the Windows Explorer.
Ofcourse, sufficient scope exists for making improvisations to the SP to calculate the top stock for the month or for generating some special reports, etc. I am limited myself to just explaining the manipulation of an Excel spreadseet and its tabs using T-SQL. And the example has been designed to just show that. Using Database Mail to e-mail the spreadsheet is outside the scope of this article.
I hope you like the technique of writing to and manipulating spreadsheets and their tabs using simple T-SQL. Please let me know of your comments and thoughts.
Monday, March 10, 2008
Simplifying Code: Updating Variables
First let us build the base table, Order_Master:
create table Order_Master
(Order_ID smallint not null
,Begin_Date datetime not null
,End_Date datetime not null
,Delivery_Location varchar(50) not null)
go
Now, let's put in some data into the table, Order_Master:
Insert into Order_Master (Order_ID, Begin_Date, End_Date, Delivery_Location)
select 1, getdate()-100, getdate()+100, 'New York'
union all
select 2, getdate()-50, getdate()+50, 'Washington D.C'
union all
select 3, getdate()-25, getdate()+25, 'Princeton'
union all
select 4, getdate()-12, getdate()+12, 'Dallas'
union all
select 5, getdate()-6, getdate()+6, 'Los Angeles'
go
Let's read from the table, Order_Master:
select * from Order_Master
go
Order_ID Begin_Date End_Date Delivery_Location
-------- ----------------------- ----------------------- --------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087 New York
2 2008-01-20 14:06:04.087 2008-04-29 14:06:04.087 Washington D.C
3 2008-02-14 14:06:04.087 2008-04-04 14:06:04.087 Princeton
4 2008-02-27 14:06:04.087 2008-03-22 14:06:04.087 Dallas
5 2008-03-04 14:06:04.087 2008-03-16 14:06:04.087 Los Angeles
Now let's come to the scenario.
1. For a given Order_ID and for a given input date, we need to find out what the begin date and end date for that Order_ID are.
2. We also need to find out whether the given input date falls within the begin date and end date for that order ID.
3. The output needs to be the Begin Date, the end date and a return value which would be 1 if the given input date falls within the begin date and end date for that order ID or 0 if it doesn't.
Given these requirements, a normal Stored Procedure would be like the following:
if exists (select 1 from sys.procedures
where type = 'P'
and user_name(schema_id) = 'dbo'
and [NAME] = 'DBP_ValidateOrderDates')
drop procedure [dbo].[DBP_ValidateOrderDates]
GO
CREATE PROCEDURE [dbo].[DBP_ValidateOrderDates]
@i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit = 0 output,
@o_BeginDate datetime = Null output,
@o_EndDate datetime = Null output
AS
BEGIN
SET NOCOUNT ON;
Select @o_BeginDate = Begin_Date
,@o_EndDate = End_Date
from dbo.Order_Master
where Order_ID = @i_OrderNo
IF @@rowcount = 0
Begin
Set @o_Return = 0
End
Else
If (@i_ValidationDate <= @o_EndDate and @i_ValidationDate >= @o_BeginDate)
Begin
set @o_Return = 1
End
Else
Begin
set @o_Return = 0
End
Set nocount off
END
Go
Now let's compile and then execute this SP for Order_ID = 1 and input date as today's date:
--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 1 because the input date falls between the begin and end dates for that Order_ID:
Return Value Begin Date End Date
------------ ----------------------- -----------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087
Now let's execute this SP for the same Order_ID (1) but with the input date not falling within the begin and end dates for that Order_ID.
--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()-200
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 0 because the input date does not fall between the begin and end dates for that Order_ID:
Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087
Now let's take another possible situation. If we were to input an Order_ID (101) that doesn't exist in the Order_Master table, we should be getting nulls for the begin and end dates but the return value should be 0. Let's execute this scenario.
--Declaring the variables:
declare @i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit,
@o_BeginDate datetime,
@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 101
,@i_ValidationDate = getdate()
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
The result set matches with the expected result:
Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 NULL NULL
Since the input Order_ID, 101 does not exist in the table, the Begin and End dates are null. Since they are null, the input date can not fall between null begin and end dates and hence is 0.
Issue:
You may observe that there are so many constructs in this SP:
One SELECT statement
Three set variable statements
One usage of @@rowcount to check existence of the record
Three Begin-End blocks
Two IF and one ELSE construct
The SP seems perfectly logical. It has a very logical flow to satisfy all our requirements. But
the question is "Do we really need so many constructs to staify our requirements?". The answer is "May be Not". Remember, all we have to do is assign proper value to the output variables/parameters based on the input variables and the data in the table, Order_Master.
Let's try to simplify the same SP. Below, I have replace the IF BEGIN-END, IF-ELSE and @@rowcount with one UPDATE CASE statement. Let's see how it looks now:
if exists (select 1 from sys.procedures
where type = 'P'
and user_name(schema_id) = 'dbo'
and [NAME] = 'DBP_ValidateOrderDates')
drop procedure [dbo].[DBP_ValidateOrderDates]
GO
CREATE PROCEDURE [dbo].[DBP_ValidateOrderDates]
@i_OrderNo varchar(6),
@i_ValidationDate datetime,
@o_Return bit = 0 output,
@o_BeginDate datetime = Null output,
@o_EndDate datetime = Null output
AS
BEGIN
SET NOCOUNT ON;
update a
set @o_Return = case
when @i_ValidationDate between a.Begin_Date and a.End_Date then 1
else 0
end
,@o_BeginDate = a.Begin_Date
,@o_EndDate = a.End_Date
from (Select Begin_Date, End_Date from dbo.Order_Master with (nolock)
where Order_ID = @i_OrderNo) a
Set nocount off
END
Go
Doesn't the SP look short now? One UPDATE statement takes care of all possibilities and reduces code even though in the original SP, we are not updating any table for that matter. Neither are we updating the table Order_Master in this refined version of the original SP. All we are doing is updating the output variables / parameters based on the input parameters and the corresponding data in the table Order_Master. And that's all that is expected from this SP. So why take the longer route as in the original version? Not necessary.
Now let's execute the refined SP:
Now let's compile and then execute this SP for Order_ID = 1 and input date as today's date. This is the first case above:
--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 1 because the input date falls between the begin and end dates for that Order_ID:
Return Value Begin Date End Date
------------ ----------------------- -----------------------
1 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087
Now let's execute this SP for the same Order_ID (1) but with the input date not falling within the begin and end dates for that Order_ID. This is the second case above.
--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 1
,@i_ValidationDate = getdate()-200
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
Here is the result set as expected. We got the begin and end dates for the given Order_ID (1) and the return value is 0 because the input date does not fall between the begin and end dates for that Order_ID(1):
Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 2007-12-01 14:06:04.087 2008-06-18 14:06:04.087
Now let's take another possible situation, i.e. the third situation above. If we were to input an Order_ID (101) that doesn't exist in the Order_Master table, we should be getting nulls for the begin and end dates but the return value should be
0. Let's execute this scenario.
--Declaring the variables:
declare @i_OrderNo varchar(6)
,@i_ValidationDate datetime
,@o_Return bit
,@o_BeginDate datetime
,@o_EndDate datetime
--Assigning values to input variables/parameters:
select @i_OrderNo = 101
,@i_ValidationDate = getdate()
,@o_Return = null
--Executing the SP with the given input parameters:
execute [dbo].[DBP_ValidateOrderDates]
@i_OrderNo
,@i_ValidationDate
,@o_Return output
,@o_BeginDate output
,@o_EndDate output
--Capturing the output:
select
@o_Return 'Return Value', @o_BeginDate 'Begin Date', @o_EndDate 'End Date'
The result set matches with the expected result:
Return Value Begin Date End Date
------------ ----------------------- -----------------------
0 NULL NULL
Since the input Order_ID, 101 does not exist in the table, the Begin and End dates are null. Since they are null, the input date can not fall between null begin and end dates and hence is 0.
We get the same results with the refined version of the SP but in one statement that in multiple If's and ELSE's and @@rowcount.
The UPDATE statement in this case doesn't perform any updates to the table Order_Master itself. It just updates the output parameters to the correct values.
This is a very simple example of using unrelated operations (UPDATE) where SELECT seems the only solution and thus reducing the amount of code needed to be written and executed. This article just demonstrates that coding flexibility in Microsoft SQL Server's T-SQL is profound. Based on individual situation, there are numerous possibilities in T-SQL that can reduce code and execution time as well. We will follow some more of such techniques in the forthcoming articles.