The vast majority of the time, you will use SSIS Lookup component in Full Cache mode. This mode is the fastest because it queries the database only once (before the data flow starts) and apply hashing in order to do high-performance comparisons.
Sometimes however you will have to use Non Cached lookups. For example if your reference table doesn’t fit in memory or if you wish to lookup rows that you just inserted in your reference table at the beginning of your data flow. You might also run into cases where you need to do inequalities lookups or where you have very few rows at the source and you wish to lookup a table which has several million rows.
Because the Non Cached mode will query the database for each row, it has to be fast.
If your reference table is large and you omit to create a covering index, you will get very poor performance. A covering index, by definition, is an index that contains each column that is used by your query. Note that only the columns in your WHERE statement (the Lookup join columns) must be in the index, the other columns (Lookup reference columns) need only to be in the “Included Columns” of the index. Including your reference columns in the index makes it easy for SQL Server to get your reference values without going back to the table pages.
Let’s create a lookup table:
IF NOT EXISTS(select 1 from INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='LOOKUP_TABLE') CREATE TABLE LOOKUP_TABLE( [LOOKUP] [varchar](36) NULL, [REFERENCE] [int] NOT NULL PRIMARY KEY CLUSTERED)
And add 1 million rows:
SET NOCOUNT ON TRUNCATE TABLE LOOKUP_TABLE DECLARE @I INT = 0 BEGIN TRAN WHILE @I < 1000000 BEGIN INSERT INTO LOOKUP_TABLE(LOOKUP,REFERENCE) SELECT NEWID(), @I SET @I=@I+1 END COMMIT TRAN
Select No Cache or Partial Cache in the SSIS Lookup Component dialog box:
The SSIS Lookup query should be defined like this:
And the SSIS Lookup Columns should be defined like this:
The covering index is then created this way:
CREATE NONCLUSTERED INDEX [IX_NON_CLUSTERED_LOOKUP_INCLUDE_LOOKUP] ON [LOOKUP_TABLE] ([LOOKUP] ASC) INCLUDE ([REFERENCE])
We can appreciate the performance gain for 1000 rows:
Index | Elapsed Time |
Clustered Index Scan | 37.3 sec |
Covering Index | 0.109 sec |
Another way of judging the performance of the lookup is to display SQL Server query statistics for a single query:
Force SQL Server to use clustered index instead of covering index:
-- Clear memory cache and show some stats DBCC DROPCLEANBUFFERS set statistics io on set statistics time on -- Use default SSIS Lookup syntax select * from ( SELECT [LOOKUP],[REFERENCE] FROM [LOOKUP_TABLE] WITH (INDEX (0)) -- ...but tell SQL to scan the table instead of using the covering index ) [refTable] where [refTable].[LOOKUP] = '77EE7751-6BC1-4088-874E-D8F4440BB01A'
Results:
Table ‘LOOKUP_TABLE’. Scan count 9, logical reads 6699, physical reads 152
CPU time = 296 ms, elapsed time = 624 ms.
Here SQL Server will use the covering index:
-- Clear memory cache and show some stats DBCC DROPCLEANBUFFERS set statistics io on set statistics time on -- Use default SSIS Lookup syntax select * from ( SELECT [LOOKUP],[REFERENCE] FROM [LOOKUP_TABLE] ) [refTable] where [refTable].[LOOKUP] = '77EE7751-6BC1-4088-874E-D8F4440BB01A'
Results:
Table ‘LOOKUP_TABLE’. Scan count 1, logical reads 3, physical reads 2
CPU time = 0 ms, elapsed time = 31 ms.
RSS