The main reason why Microsoft introduced table variable in SQL Server 2000 is to reduce stored procedure recompilations (a recompilation occurs when the stored procedure execution plan is recreated). Table variables also have a well defined scope (the current procedure or function) and they induce less logging and locking (because their transaction last for a single SQL statement). These are great advantages when dealing with short simplier OLTP-style queries and processes.
However, there are huge drawbacks of using table variables when you process a lot of rows. For a large table, using a table variable is very often a bad practice…
Statistics
First, they do not have any statistics (statistics are used by the query optimizer to produce the most efficient query plan based on data distribution). The following example demonstrates that the query optimizer has no clue about how many rows a table variable has when building the query plan:
SET NOCOUNT ON -- Declare table variable DECLARE @TABLE_VARIABLE TABLE (ID INT PRIMARY KEY CLUSTERED) DECLARE @I INT = 0 -- Insert 10K rows BEGIN TRAN WHILE @I < 10000 BEGIN INSERT INTO @TABLE_VARIABLE VALUES (@I) SET @I=@I+1 END COMMIT TRAN -- Display all rows and output execution plan set statistics profile on SELECT * FROM @TABLE_VARIABLE set statistics profile off
Result:
Rows | StmtText | … | EstimateRows |
10000 | |–Clustered Index Scan(OBJECT:(@TABLE_VARIABLE)) | … | 1 |
The optimizer do not recompile queries that use table variables. In our example, although SQL Server performs a clustered index scan, it assumes the index has only one row because the engine does not have access to the table variable/clustered index statistics. Of course, such an assumption can make a huge impact on performance when a suboptimal query plan is used on a large table. A workaround is to use the OPTION (RECOMPILE) hint.
Indexes
You can’t add indexes to a table variable. Creating specific indexes obviously helps to improve query performance. The workaround is to specify constraints when declaring the table. Specifying a PRIMARY KEY CLUSTERED will create a clustered index and specifying a UNIQUE column will create a nonclustered index. However, you won’t always have the necessary flexibility of the indexes. For example, it won’t be possible to create non unique clustered indexes or nonclustered index with included columns.
Parallel plans
When executing INSERTs, UPDATEs or DELETEs on a table variable, the SQL Server storage engine never generate a parallel execution plan. This is a huge handicap and affects heavily the query performance when playing with large datasets.
You’re right pointing out that table variable prevents query recompile, and lacks data distribution statistics information, and are limited regarding indexation.
But knowing these limits, a table variable could also be a good candidate for huge table. For some processing, only a clustered index is required, and the nature of processing make index seek attractive and more efficient than other access method.
The real problem is to be able to see if these limitations are going to be a problem in the type of processing you are doing with your huge table. So it is right to point out that in some case this could be troublesome with huge table, but the size of the table is not really the problem.
Very good point Maurice.
I guess we can’t expect bad practices or best practices to provide optimal results 100% of the time. I agree that sometimes table variable may perform slightly better than temporary tables for large tables, especially in transactions since there is less logging involved. I think it is fair to say that, in general and particulary for complex queries, the fact that the storage engine never generate a parallel plan added to the reduced flexibily of table variables make the use of temporary tables a better choice for large tables.