Introduction

Performance, performance, performance. While performance in SQL Server could be achieved with code improvement and proper infrastructure maintenance, both tasks require an effort. Code reviews and deep index analysis would take time, but sometimes the obvious is right in front of you. Let’s talk about HEAP tables, aka tables with no clustered indexes that somebody (hopefully not you) left you to maintain and use.

Problem

I’m in no way an expert on indexes, but when I think about tables and indexes, I always draw a parallel to books.

Virtually all books have a table of content (a.k.a. TOC) at the beginning of the book (see Fig #1) and index at the end of the book (see Fig #2) to make searching for content as quick and efficient as possible. Additionally, book TOC and index are also allowing the book author to better organize his/her thoughts when writing a book which is very similar to making SQL Server storage engine more efficient.

Fig #1

Screenshot used with permission from Apress from the book SQL Server 2019 Revealed (2019).

Fig #2

Screenshot used with permission from Apress from the book SQL Server 2019 Revealed (2019).

Obviously, while books and tables have similarities, they have some distinctive differences – while book index can co-exists with the book, clustered index is a part of the table. Of course, there are plenty of other differences.

A book with an index is akin to a table with clustered index while a book with a table of contents is akin to table with a non-clustered index. Having just a TOC without an index in a book is not efficient enough. I mean, you can find your stuff, but it will take much longer to find it. That is also true for tables with no clustered indexes. For the most part (unless it’s table with like less than a 100 rows), heap tables are bad for performance and rarely a good idea. This is perhaps why when I think about adding a clustered index to a HEAP table, I think about enabling a Turbo button. Tables with clustered indexes will help SQL Server to locate the data quickly.

Since we are in agreement that HEAP tables are usually bad and tables with clustered are usually good, let’s try to find and fix those heap tables, shall we?

Solution

The following script will detect all (tables with less than 100 records are excluded) the HEAP tables in a database and create a script to inject identity column into every table.

select mas.name + '.' + objs.name as table_name
, pars.row_count
, CASE
WHEN ISNULL(stas.last_user_scan, '01-01-1900')>ISNULL(stas.last_user_seek, '01-01-1900')
AND ISNULL(stas.last_user_scan, '01-01-1900')>ISNULL(stas.last_user_lookup, '01-01-1900')
AND ISNULL(stas.last_user_scan, '01-01-1900')>ISNULL(stas.last_user_update, '01-01-1900')
THEN stas.last_user_scan
WHEN ISNULL(stas.last_user_seek, '01-01-1900')>ISNULL(stas.last_user_scan, '01-01-1900')
AND ISNULL(stas.last_user_seek, '01-01-1900')>ISNULL(stas.last_user_lookup, '01-01-1900')
AND ISNULL(stas.last_user_seek, '01-01-1900')>ISNULL(stas.last_user_update, '01-01-1900')
THEN stas.last_user_seek
WHEN ISNULL(stas.last_user_lookup, '01-01-1900')>ISNULL(stas.last_user_scan, '01-01-1900')
AND ISNULL(stas.last_user_lookup, '01-01-1900')>ISNULL(stas.last_user_seek , '01-01-1900')
AND ISNULL(stas.last_user_lookup, '01-01-1900')>ISNULL(stas.last_user_update, '01-01-1900')
THEN stas.last_user_lookup
WHEN ISNULL(stas.last_user_update, '01-01-1900')>=ISNULL(stas.last_user_scan, '01-01-1900')
AND ISNULL(stas.last_user_update, '01-01-1900')>=ISNULL(stas.last_user_lookup, '01-01-1900')
AND ISNULL(stas.last_user_update, '01-01-1900')>ISNULL(stas.last_user_lookup, '01-01-1900')
THEN stas.last_user_update
ELSE '01-02-1900' END as latest_access
, stas.user_lookups + stas.user_scans + stas.user_seeks + stas.user_updates as total_access
, 'ALTER TABLE ' + mas.name + '.' + objs.name + ' ADD ' + objs.name + 'ID INT NOT NULL IDENTITY (1,1) ' as addid_script
from sys.indexes as inds
inner join sys.objects as objs on objs.object_id = inds.object_id
inner join sys.tables as tabs on tabs.object_id = objs.object_id
inner join sys.dm_db_partition_stats as pars on pars.object_id = objs.object_id
inner join sys.dm_db_index_usage_stats as stas on stas.object_id = inds.object_id AND stas.index_id = inds.index_id
inner join sys.schemas as mas on mas.schema_id = tabs.schema_id
where inds.type_desc = 'heap'
and pars.index_id=0
and pars.row_count>100
order by row_count desc, objs.name

All you need to do is to run the script above to discover HEAP tables (see Fig #3), take a ready to use script from addid_script column and run it to create an extra column that by default will add a clustered index. I also included last_access and total_access columns to add some context on which frequency and recency of indexes usage.

Fig #3

Disclaimer

This blog post is partially based on the following resources:

https://www.red-gate.com/simple-talk/sql/database-administration/sql-server-heaps-and-their-fragmentation/

https://www.apress.com/us/book/9781484254189

https://docs.microsoft.com/en-us/learn/modules/describe-concepts-of-relational-data/3-explore-structures