Thursday, March 14, 2013

Index Keys & Properties as a step in Database Tuning

Introduction


As DBA's, we encounter long running queries that end-users compalin about. Indexing strategy is one area of database where developers usually tend to create indexes on the fly without much consideration for or analysis on how an index that helps one query marginally can slow down a bunch of other queries or stored procedures drastically. Some (not  all) of the situations that we usually come across are following:




  • Dbo.Orders table is being joined to dbo.Order_Details table but both of them exist on the same filegroup and on the same set of underlying discs. This would cause disc queuing and the consequent slowness. 
  • Dbo.Orders table is being joined to dbo.Order_Details table on the common column Product_ID but Product_ID is the leftmost colmn of a non-clustered index on dbo.Orders but is the third column in another non-clustered index on dbo.Order_Details. So these indexes don't help this join much. 
  • Updates are taking place on a column that has a non-clustered index that is compressed. It would be good not to have an index at all on a column being updated. In case such an index is really required, not compressing it would be beneficial. Otherwise the update is much slower than when the index is uncompressed.
  • A majority of values in a particular column used in the WHERE clause of a query have a value but the query is designed to fetch only such rows where the value is null. May be a filtered-index would have helped speed it up.  
In such situations and some others, we would like to have a quick gance at atleast the following to begin with:
  • Whether the two tables being joined are physically resident on different filegroups.
  • The columns being joined are the leftmost columns in their respective indexes.
  • The column being updated is not part of an index (first of all) that is compressed in any way (definitely not).
  • In case, a query is looking for a small subset of values from a column, is the column a part of an index that has the appropriate filter defined on the index.
  • Etc.
 Instead of trying to procure such relevant information painstakingly one at a time, an easier way would be to have a stored procedure return the same. This article is aimed at creating such a stored procedure that could come in handy in performance situations described above.

Details of the Stored procedure
The text of the store procedure is given below:

If exists (select 1 from sys.procedures              
               where USER_NAME(schema_id) = 'dbo'
               and name = 'dbp_Find_Index_Properties_by_Column'
               and type = 'P')
begin
     drop procedure dbo.dbp_Find_Index_Properties_by_Column
end
go
create procedure dbo.dbp_Find_Index_Properties_by_Column
  @i_Table_Name     varchar(64) = null
,@i_Column_Name varchar(24) = null
,@l_Column_Position_in_Index tinyint = null
as
*******************************************************************************************************************************
Created by: Suresh K Maganti
Purpose    : Find if a column is used in an index. View additional properties of the index.
                   For now, I have included only clustered and non-clustered indexes.
Input
Parameters: If provided, the result set is appropriately filtered. Else, the complete result set is returned.
********************************************************************************************************************************/
set  nocount on;

with CTE_Base (Table_Name, Index_Name, Leftmost_Column_in_Index, Position_in_Index, Columns_in_the_Index, Index_Type, Is_Primary_Key, Is_The_Index_Unique, Fill_Factor_When_Created, File_Group_Name, Is_It_A_Filtered_Index, Filter_Definition, Is_An_Included_Column, Is_Compressed, Type_Of_Compression_Used)
as
(    select C.Name Table_Name, b.name Index_Name
,col_name(b.[object_id], (select e.column_id from sys.index_columns  e
                                          where e.index_column_id = 1
                                          and     e.[object_id] = b.[object_id]
                                          and     e.index_id = b.index_id))                   Leftmost_Column
,case
    when @i_Column_Name is null then null
    else a.key_ordinal
end                                                                                                         Position_in_Index

,replace((select col_name(f.[object_id], f.column_id) + ' ###' 'data()'
               from sys.index_columns f
               where f.[object_id] = a.[object_id]
               and     f.index_id = a.index_id
               order by f.key_ordinal asc
               for XML Path ('')), '###', ',')                                                   Columns_in_the_Index

,case b.Type
      when 0 then 'Heap'
      when 1 then 'Clustered'
      when 2 then 'Nonclustered'
      when 3 then 'XML'
      when 4 then 'Spatial'
end                                                                                                         Index_Type

      ,case b.is_primary_key
       when 1 then 'Yes'
       else 'No'
end                                                                                                         Is_Primary_Key


,case b.is_unique
       when  1 then 'Yes'
       else 'No'
end                                                                                                        Is_The_Index_Unique

,b.fill_factor                                                                                          Fill_Factor_When_Created

,FILEGROUP_NAME (b.data_space_id)                                            File_Group_Name

,case b.has_filter
     when 1 then 'Yes'
     else 'No'
end                                                                                                        Is_It_A_Filtered_Index
    
,b.filter_definition

,case a.is_included_column
      when 1 then 'Yes'
      else 'No'
end                                                                                                        Is_An_Included_Column

,case
     when d.data_compression > 0 then 'Yes'
     else 'No'
end                                                                                                       Is_Compressed

,d.data_compression_desc Type_Of_Compression_Used

from                   sys.tables c left join sys.index_columns a

on c.[object_id] = a.[object_id]

inner join sys.indexes b

on    a.[object_id] = b.[object_id]
and  a.index_id = b.index_id

inner join sys.partitions d
on     b.[object_id] = d.[object_id]
and   b.index_id = d.index_id

where b.[type] in (1, 2)
and a.key_ordinal = isnull(@l_Column_Position_in_Index, a.key_ordinal)
and OBJECTPROPERTY (b.[object_id], 'IsSystemTable') = 0
and OBJECTPROPERTY (b.[object_id], 'IsTable') = 1
and col_name(b.[object_id], a.column_id) = ISNULL(@i_Column_Name, col_name(b.[object_id], a.column_id))
and c.name = ISNULL(@i_Table_Name, c.name))


select distinct Table_Name, Index_Name, Leftmost_Column_in_Index, Position_in_Index
,substring(Columns_in_the_Index, 1, LEN(Columns_in_the_Index) - 1) Columns_in_the_Index
,Index_Type, Is_The_Index_Unique, Fill_Factor_When_Created, File_Group_Name, Is_It_A_Filtered_Index, Filter_Definition, Is_An_Included_Column, Is_Compressed, Type_Of_Compression_Used

from CTE_Base
order by Table_Name

set nocount off;

go

For now, I have included only clustered and non-clustered index including filtered indexes.
This stored procedure basically uses a few catalogue views:

  • sys.partitions
  • sys.tables
  • sys.index_columns
  • sys.indexes

and a few system functions like:
  • OBJECTPROPERTY
  • FILEGROUP_NAME
  • COL_NAME
and a new construct FOR XML PATH that was introduced in Microsoft SQL Server 2008. XML PATH is being used in this stored procedure to generate a comma separated list of columns that are part of a given index ordered by their order of occurrence within the index. You may notice an interesting fact here. I am using ORDER BY inside the subquery. Normally ORDER BY is the last part of any SELECT statement. But when used with XML PATH, it comes before the FOR XML PATH construct. Another thing you may notice is that when ORDER BY is used inside a subquery, usage of TOP 100 PERCENT after the keyword SELECT, is mandatory in most cases. Not so in this case.

The rest are all basic joins.

Now we will get to the usage of this stored procedure.


Usage
For simplicity, I have used Microsoft's Adventureworks database in the examples below.

Execution 1:


Please execute the following TSQL code without any input parameters:

execute dbo.dbp_Find_Index_Properties_by_Column



                                                                                   Figure 1

Results for all clustered and non-clustered indexes in the Adventureworks database are displayed.



Execution 2:


Please execute the following TSQL code. It will display the results for all tables and their indexes in the Adventureworks database the column AddressID is a part of:

execute dbo.dbp_Find_Index_Properties_by_Column null, 'AddressID', null

                                                                                 Figure 2

Execution 3:

Now execute the following TSQL statements to change the fillfactor on the index Purchasing.VendorAddress.IX_VendorAddress_AddressID from 80 to 50:

use Adventureworks
go
alter index IX_VendorAddress_AddressID on Purchasing.VendorAddress rebuild with (statistics_norecompute = on, fillfactor = 50, Online = ON, maxdop = 8, sort_in_tempdb = on, allow_page_locks = on)
go

Now execute the statement below to see if the change in fill factor for this index has been captured:

execute dbo.dbp_Find_Index_Properties_by_Column null, 'AddressID', null

                                                                             Figure 3


As you have noticed in the screenshot above, the stored procedure displays 50 (Figure 3) intead of 80 (Figure 2) for the fill factor for this index.



Execution 4: 

 Now let us add a secondary filegroup named SecondaryFileGroup to the database Adventureworks, add a data file to this new filegroup, then drop the index Purchasing.VendorAddress.IX_VendorAddress_AddressID and finally recreate it on the new filegroup, SecondaryFileGroup using data compression and a filter on the AddressID column. The same is accomplished by executing the following code. Please note that you may have to change the path of the new data file in the code below as per you custom configured disc layout:
 

alter database Adventureworks add filegroup SecondaryFileGroup
go
alter database Adventureworks add file (Name = Adventureworks_2, filename = 'E:\Data3\MSSQL\Data\Adventureworks_2.ndf', size = 10MB, Maxsize = 100MB, filegrowth = 10%) to filegroup SecondaryFileGroup
go
drop index Purchasing.VendorAddress.IX_VendorAddress_AddressID
go
create index IX_VendorAddress_AddressID on Purchasing.VendorAddress (AddressID)
where AddressID is not null with
(statistics_norecompute = on, fillfactor = 50, data_compression = Page, Online = ON, maxdop = 8, sort_in_tempdb = on, allow_page_locks = on) on SecondaryFileGroup
go
 Now execute the statement below to see if the changes for this index have been captured:

execute dbo.dbp_Find_Index_Properties_by_Column null, 'AddressID', null


                                                                                 Figure 4

The changes have been captured.

Execution 5:

Lets check to see on which tables the column, AddressID is being used in an index while occurring at the second position.

Please execute the statement below to find this out:

execute dbo.dbp_Find_Index_Properties_by_Column null, 'AddressID', 2






                                                                                           Figure 5

AddressID occurs as the second column in the three indexes on the three tables shown in Figure 5 above. 

Summary:
The stored procedure dbo.dbp_Find_Index_Properties_by_Column can be used to find various aspects of how a column is involved within an index. This SP also provides details on various indexes which can then be used to refine existing indexing strategy based on various application stored procedures. I haven't delved deeper into performance optimization with respect to indexing strategy in this article. I have only attempted to provide an easier starting piont for a preliminary analysis.

As usual, please let me know of your thoughts and comments.