Thursday, March 13, 2008

TSQL and Excel - Data Exchanges

We are all aware of the applicability of "BCP" for transporting data between external files (text files, Excel files, etc) and database tables, and "Bulk Insert" for sending data one way from external files into database tables. I love BCP. Sometimes, situations arise where we need data from tables to be sent to well formatted Excel spreadsheets and vice-versa. With BCP, one could not send in the Column Headers into the Spreadsheet (it can be done with queryout option but in a round about way) . We always need simple solutions, don't we?


Let's complicate the requirement further. We have to write today's Top three stocks from the database on to a spreadsheet, i.e. the top three gainers in the Stock Market today. We have to also write the top three losing stocks in the Stock Market today into the same Excel spreadsheet but in a different tab. This single Excel spreadsheet with the two tabs, one for gainers and one for losers needs to be e-mailed to some of the investors daily. We would put the e-mailing issue aside as it is quite simple using SQL Server 2005 Database Mail or SQL Server 2000 SQL Mail.



The second requirement is that the data in both the Excel spreadsheet tabs needs to be overwritten with fresh data daily. The source of all data is just one database table which is fed daily. There are two more tabs in the spreadsheet that will have the consolidated gainers and losers respectively for all days as the days progress.



It might have been so simple if this overwritting on two tabs (Gainers and Losers) and appending onto two other tabs (AllGainers and AllLosers) was not required. It would have also been so simple if we could have had four separate spreadsheets instead of four tabs in one single spreadsheet. Then we could have simply used the simple BCP and added a little bit of complication in the bcp command to include the Column Headers in there. But that is not the case as we need to keep the solution simple. Some people say, use CLR and do it row by row. Some say, use VC++.


The issue here is to find a simple solution and not complicate the solution further. We don't want assemblies and DLL's to be prepared, compiled, registered and then modified whenever something needs to be changed. So CLR, VC++ and the like are not an option. We are talking about efficient programming here. And since we are talking about programming and not about packages, DTS and SSIS are also not an option. We need minimum maintenance too. What do we do?



We resort to our contemporary T-SQL database programming language. T-SQL is simple, neat, prolific, dynamic and very flexible. Why not explore OPENDATASOURCE and OPENROWSET? Let's go about solving the problem now.



So let me prepare the data first. Let's create a table called Table_Stock_Value and populate the same. This article is just about concepts. So please don't bother about constraints, indexes and fillfactors right now. All symbols in the data are hypothetical.




if exists (select 1 from sys.tables
where name = 'Table_Stock_Value'
and type = 'U')
drop table Table_Stock_Value
go
create table Table_Stock_Value
(SerialNo int identity (1,1) not null
,NASDAQ_SYMBOL varchar(4) not null
,Company_Name varchar(25) not null
,Change decimal(5,2)
,Date_of_Change datetime not null default getdate())
go
insert into Table_Stock_Value (NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change)

--TOP GAINERS
select 'ABCD', 'ABCD Inc', 2.50, getdate()-3
union all
select 'EFGH', 'EFGH Inc', 2.00, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.50, getdate()-3
union all
select 'IJKL', 'IJKL Inc', 1.75, getdate()-2
union all
select 'MNOP', 'MNOP Inc', 1.60, getdate()-2
union all
select 'ABCD', 'ABCD Inc', 1.00, getdate()-2
union all
select 'QRST', 'QRST Inc', 3.50, getdate()-1
union all
select 'UVWX', 'UVWX Inc', 2.50, getdate()-1
union all
select 'ABCD', 'ABCD Inc', 1.25, getdate()-1
union all
select 'IJKL', 'IJKL Inc', 4.95, getdate()
union all
select 'ABCD', 'ABCD Inc', 2.50, getdate()
union all
select 'MNOP', 'MNOP Inc', 1.25, getdate()
union all
select 'ABCD', 'ABCD Inc', 1.90, getdate()+1
union all
select 'EFGH', 'EFGH Inc', 1.20, getdate()+1
union all
select 'QRST', 'QRST Inc', 1.10, getdate()+1
union all
select 'ABCD', 'ABCD Inc', 1.10, getdate()+2
union all
select 'UVWX', 'UVWX Inc', 1.00, getdate()+2
union all
select 'QRST', 'QRST Inc', 0.95, getdate()+2
union all
select 'ABCD', 'ABCD Inc', 1.50, getdate()+3
union all
select 'EFGH', 'EFGH Inc', 1.25, getdate()+3
union all
select 'IJKL', 'IJKL Inc', 1.00, getdate()+3

UNION ALL
--TOP LOSERS
select 'BBBB', 'BBBB Inc', -1.50, getdate()-3
union all
select 'CCCC', 'CCCC Inc', -1.20, getdate()-3
union all
select 'DDDD', 'DDDD Inc', -1.10, getdate()-3
union all
select 'EEEE', 'EEEE Inc', -1.75, getdate()-2
union all
select 'FFFF', 'FFFF Inc', -1.60, getdate()-2
union all
select 'YYYY', 'YYYY Inc', -1.00, getdate()-2
union all
select 'IJKL', 'IJKL Inc', -0.70, getdate()-1
union all
select 'HHHH', 'HHHH Inc', -0.62, getdate()-1
union all
select 'CCCC', 'CCCC Inc', -0.25, getdate()-1
union all
select 'UUUU', 'UUUU Inc', -4.95, getdate()
union all
select 'LLLL', 'LLLL Inc', -2.50, getdate()
union all
select 'AAAA', 'AAAA Inc', -1.25, getdate()
union all
select 'UVWX', 'UVWX Inc', -1.90, getdate()+1
union all
select 'FFFF', 'FFFF Inc', -1.20, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -1.10, getdate()+1
union all
select 'CCCC', 'CCCC Inc', -0.80, getdate()+2
union all
select 'BBBB', 'BBBB Inc', -0.75, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -0.65, getdate()+2
union all
select 'EEEE', 'EEEE Inc', -1.50, getdate()+3
union all
select 'YYYY', 'YYYY Inc', -1.25, getdate()+3
union all
select 'QRST', 'QRST Inc', -1.00, getdate()+3
go
create clustered index ci_date on Table_Stock_Value (Date_of_Change) with fillfactor = 70
go
select * from Table_Stock_Value
go



Next for one time only, we need to prepare an Excel spreadsheet with four tabs in it. The names of the tabs are, Gainers, Losers, AllGainers, and AllLosers. And each tab can have the column names same or different than in the table, Table_Stock_Value created above. For simplicity, let's keep the column names same in the tabs. I have created and kept the Excel spreadsheet in the folder, C:\SQLXL. I have named the Excel spreadsheet as Stock_Overview.xls.


Please remember to close the Excel sheet before the initial compilation of the SP and before each instance of execution of the SP.


Now, let's set out to do what we need to do i.e. writing into the Excel spreadsheet's different tabs as per the requirements stated above. I have written the SP as follows. Please note that the SQL Server advanced configuration option, 'Ad Hoc Distributed Queries' should be set to 1 to read from and write to the Excel Spreadsheet from within a database Stored Procedure:






set ansi_warnings on
go
set ansi_nulls on
Go
execute sp_configure 'show advanced options', 1
go
reconfigure
go
execute sp_configure 'Ad Hoc Distributed Queries', 1
go
reconfigure
go

if exists (select 1 from sys.procedures
where type = 'P'
and [NAME] = 'DBP_Prepare_Daily_Stock_Data')
drop procedure [DBP_Prepare_Daily_Stock_Data]
GO
create procedure DBP_Prepare_Daily_Stock_Data
@i_date datetime = null
as
begin
declare @l_File_Exists int
select @i_date = isnull(@i_date, getdate())
execute master.dbo.XP_fileexist 'C:\SQLXL\Stock_Overview.xls', @l_File_Exists output
if (@l_File_Exists = 1)
begin
--Gainers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Gainers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--Losers have to be overwritten everyday.
UPDATE OPENDATASOURCE ('Microsoft.Jet.OleDB.4.0',
'Data Source="C:\SQLXL\Stock_Overview.xls";
extended Properties=Excel 8.0')...Losers$
SET SerialNo = ''
,NASDAQ_SYMBOL = ''
,Company_Name = ''
,Change = ''
,Date_of_Change = ''
WHERE SerialNo is not null
--UPDATE/Overwrite the Gainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Gainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
--UPDATE/Overwrite the Losers tab with top 3 losers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [Losers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
--UPDATE/APPEND the AllGainers tab with top 3 gainers' data for the given day, @i_date.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllGainers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change > 0.00
order by Change desc
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\SQLXL\Stock_Overview.xls;',
'SELECT SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change FROM [AllLosers$A1:E1]')
select top 3 SerialNo, NASDAQ_SYMBOL, Company_Name, Change, Date_of_Change
from Table_Stock_Value with (nolock)
where convert(varchar, Date_of_Change, 112) = convert(varchar, @i_date, 112)
and Change < 0.00
order by Change desc
PRINT 'RESULTS HAVE BEEN WRITTEN TO C:\SQLXL\Stock_Overview.xls'
end
else
Begin
Print 'The file Stock_Overview.xls should exist in the correct path, C:\SQLXL.'
PRINT 'RESULTS COULD NOT BE WRITTEN TO C:\SQLXL\Stock_Overview.xls. THE FILE IS OPEN OR DOES NOT EXIST.'
End
end
GO

The SP being ready now, let's check if the all the tabs in the Excel spreadsheet,
C:\SQLXL\Stock_Overview.xls are empty:

select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

They are empty.

Let's execute the SP for the earliest date in the table Table_Stock_Value, i.e. getdate()-3:


declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data.

You can also execute the following T-SQL code from your SQL Server database to check the same:


select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go


The data in the spreadsheet and the result sets above would match.


Let's execute the SP for the next date in the table Table_Stock_Value, i.e. getdate()-2:

declare @i_date datetime
select @i_date = getdate()-3
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the spreadsheet, C:\SQLXL\Stock_Overview.xls and check it out. The data has been written to the respective tabs for the given input date to the SP, DBP_Prepare_Daily_Stock_Data. Please close the spreadsheet after reading its contents.


To verfiy the results using T-SQL itself without opening the spreadsheet, you can use the following code too and verify the results:


select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

Note that we have met the requirement.

Let's do it for getdate()-1 and getdate() as well.

declare @i_date datetime
select @i_date = getdate()-1
exec DBP_Prepare_Daily_Stock_Data @i_date

Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the date, getdate()-1. You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date to the SP. The AllGainers and AllLosers tabs have their number of rows growing with each execution of the SP as new rows get appended behind the older rows thus building up a historical data for later analysis. And that's all we want.

Let's execute the SP for @i_date = getdate().

declare @i_date datetime
select @i_date = getdate()
exec DBP_Prepare_Daily_Stock_Data @i_date
go

Now open the Excel Spreadsheet. You would find the Gainer and Losers tabs overwritten with the top 3 gainers and top 3 losers respectively for the given input date, getdate(). You would also notice that the same data has been appended respectively to AllGainers and AllLosers tabs within the same spreadsheet. So, the Gainers and Losers tabs always have three rows each and these rows pertain specifically to the given input date for top three gainers and losers respectively for the respective tabs. The AllGainers and AllLosers tabs have their number of rows growing with each execution as new rows get appended behind the older rows thus building up a historical data for later analysis. We are done with verifying that we have satisfied the requirements that we initially started out with.

You can checkout the contents of the spreasheet by just opening it from its location. Alternatively, you can use the following code. Please close the spreadsheet after physically reading its contents.:

select * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Gainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [Losers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllGainers$]')
go
select *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;DATABASE=C:\SQLXL\Stock_Overview.xls', 'Select * from [AllLosers$]')
go

Please remember to close the Excel sheet before each execution of the SP.

Now let me throw a little bit of explanation behind overwriting and appending the Excel spreadsheet tabs:

First of all, in order to read from and write to an external data source like the Excel spreadsheet in this case while not using the BCP or BULK INSERT commands, the advanced SQL Server configuration option 'Ad Hoc Distributed Queries' must be set to 1. This way, writes to and reads from the Excel spreadsheet get enabled. This's what we did first.

Next, we UPDATE any existing records in the Gainers and Losers tabs with an empty string, ''. This is equivalent to DELETEing all existing records in both these tabs. Even then, the empty characher string remains in the existing cells although it is not visible.

The next step is to overwrite these existing cells in the Gainers and Losers tabs. For this purpose, we use the INSERT statement. Please note that we use the starting and ending cells of the first row in the Gainers and Losers tab in the OPENDATASOURCE statement (A1:E1). This will enable overwriting the existing cells. The smart thing here is that any empty strings '' are no more visible even if they exist somwhere else in the spreadsheet.

For the tabs AllGainers and AllLosers, since we don't want to overwrite existing data but instead want to append new data to any existing data, we omit mentioning the starting row A1:E1 in the OPENDATASOURCE function of the INSERT statement. That ensures that new rows are always appended to the existing rows in these two tabs.

You can use the spreadsheet and its tabs as if they were individual tables, for reading and performing calculations. You can apply all types of T-SQL constructs on that data to make aggregations, etc. Please note the sample T-SQL code above that we used to check the result sets in the tabs of the spreadsheet after executing the SP.

One important point to note is that before compiling the SP, the spreadsheet with the appropriate tabs in it should exist in the correct path mentioned in the SP. Otherwise the SP will not get compiled and will instead throw an error.

Another important point is that during compilation and execution of the SP, the spreadsheet itself should not be physically openend from the Windows Explorer.

Ofcourse, sufficient scope exists for making improvisations to the SP to calculate the top stock for the month or for generating some special reports, etc. I am limited myself to just explaining the manipulation of an Excel spreadseet and its tabs using T-SQL. And the example has been designed to just show that. Using Database Mail to e-mail the spreadsheet is outside the scope of this article.

I hope you like the technique of writing to and manipulating spreadsheets and their tabs using simple T-SQL. Please let me know of your comments and thoughts.

No comments: