Ever so often, queries include some aggregates data. In a datawarehouse environment, they are frequently used for metric and Key Performance Indicator (KPI) calculations. Aggregates are also used a lot for reporting purposes and for statistical computation.
When a query perform poorly, our first instinct is to add an index to speed it up. Creating a view and indexing that view is often an overlooked solution. However, an indexed view might perform much better with less performance hit (on INSERT, UPDATE & DELETE) compared with an index. Enterprise edition of SQL Server is able to automatically use indexed views even when they are not referenced, just like indexes.
Let’s create a typical web log table with two million entries. This table contains the cookie_id of the visitor, the visit date and the transaction value if the user made a purchase during the visit.
-- Create table IF OBJECT_ID('WEBLOG','U') IS NULL BEGIN CREATE TABLE [WEBLOG]( [COOKIE_ID] [int] NOT NULL, [VISIT_DATE] [date] NOT NULL, [TRANSACTION_VALUE] [money] NOT NULL ) ON [PRIMARY] CREATE CLUSTERED INDEX [IX_VISIT_DATE] ON [WEBLOG] ( [VISIT_DATE] ASC ) ON [PRIMARY] END -- Empty the table DELETE FROM WEBLOG -- Insert 2 million rows DECLARE @I INT = 0 DECLARE @HITS INT = 2000000 BEGIN TRANSACTION WHILE @I<@HITS BEGIN INSERT INTO WEBLOG(COOKIE_ID,VISIT_DATE,TRANSACTION_VALUE) SELECT RAND()*@HITS/100 -- COOKIE_ID ,DATEADD(day,CONVERT(INT,(RAND()*365)),'20110101') -- VISIT DATE ,CASE WHEN RAND()<0.01 THEN RAND()*10.0 ELSE 0.0 END -- TRANSACTION_VALUE SET @I=@I+1 END COMMIT TRANSACTION
We wish to report the number of visits, the lifetime value (total purchases) and revenue per visit of each cookie:
SELECT COOKIE_ID , COUNT(*) AS FREQUENCY , SUM(TRANSACTION_VALUE) AS LIFETIME_VALUE , AVG(TRANSACTION_VALUE) AS REVENUE_PER_VISIT FROM [DBO].WEBLOG GROUP BY COOKIE_ID
On average, this query takes 600.4 milliseconds on my system. We can speed it up using a nonclustered index:
CREATE NONCLUSTERED INDEX [IX_RFM] ON [WEBLOG] ([COOKIE_ID]) INCLUDE ( [TRANSACTION_VALUE]) ON [PRIMARY]
It now takes 336 milliseconds to run the query, a performance improvement of 44%. We can increase this performance gain by using an indexed view instead of creating the previous index:
IF OBJECT_ID('RFM','V') IS NOT NULL DROP VIEW [RFM] GO -- Create view CREATE VIEW [RFM] WITH SCHEMABINDING AS SELECT COOKIE_ID, SUM(TRANSACTION_VALUE) AS MONETARY ,COUNT_BIG(*) AS FREQUENCY FROM [DBO].WEBLOG GROUP BY COOKIE_ID GO -- Create clustered index on view, making it a indexed view CREATE UNIQUE CLUSTERED INDEX IDX_RFM_V ON [RFM] (COOKIE_ID);
The query now runs in 53.8 ms on average: a 91% performance gain. The thing is, fetching the data is almost instantaneous because the result of the view is materialized: the majority of the elapsed time is spent displaying the query results. Performance do not depend on the underlying table but rather on what can be fetched from the materialized results of the view (everything, in our case):
Here, even if the average aggregate (AVG) is not defined in the view, the query optimizer is able to derive the result from the COUNT and the SUM aggregates. If the view gets big, you can also create nonclustered indexes on your view to speed-up access to subsets of your view. You get the best query performance gains if your underlying tables are large and your query results stay small (hence the benefit with aggregations).
The improved query speed comes with additional overhead when modifying table data (just like indexes). The following table displays a summary of my test results:
Indexed view is also a great way to improve INNER JOINS performance. When two or more table are prejoined in an indexed view, the query optimizer can choose to retrieve the materialized view data instead of performing a costly join operation.
For more information on indexed views, see the following Microsoft article: http://technet.microsoft.com/en-us/library/cc917715.asp.
This is an outstanding page with excellent and detail rich information – It should be required reading for all aspiring database developers!
Thanks!
This really helped me out.
I disagree with the praising of indexed views. They have many limitations; for example you can’t use MAX, MIN, can’t use OUTER JOINs and can’t reference other views.
Creating an indexed view on an aggregate query that references multiple table (which is very common in the industry) makes the query very fragile. You may be able to speed the query up today and make the users happy but you’re living on a prayer that your requirements don’t change tomorrow because if they do, you will either be hard pressed to find an acceptable solution or you will experience huge performance degradation.