Incremental statistics… How to Update Statistics on 100TB Database


I’ve recovered this post using (Google cache) from a post I made a long time ago from retired blog of brasilian SQL PFEs

https://blogs.msdn.microsoft.com/pfebrasilsql/2016/03/18/estatisticas-incrementais-ou-como-atualizar-estatisticas-em-base-de-100tb-vldb/

And the english version also retired, translated by (Chris Lound / Alexandre Mendeiros)

https://blogs.technet.microsoft.com/dataplatform/2016/04/03/incremental-statistics-how-to-update-statistics-on-100tb-database/

Find below this article

I really like the idea of Incremental Statistics and what it can offer customers, specifically around reduced maintenance times and I’ve been trying to find a real life example of how a customer has used this feature to help them manage super huge databases.

I am here to comment a case where the statistics were not updated frequently in a 100TB database on a risk analysis I made ​​in a client server. So it was suggested the use of incremental statistics (SQL 2014 or higher).

According to the documentation itself (https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics )

When new partitions are added to a large table, statistics should be updated to include the new partitions. However the time required to scan the entire table (FULLSCAN or SAMPLE option) might be quite long. Also, scanning the entire table isn’t necessary because only the statistics on the new partitions might be needed. The incremental option creates and stores statistics on a per partition basis, and when updated, only refreshes statistics on those partitions that need new statistics.

So, I do not need to update using FULLSCAN in a giant table every day, you can, for example, update only the last two partitions.

We will see below how to apply this in your environment.

The first thing to do is to change the object definitions, so that new statistics are created as incremental.

ALTER DATABASE SANDBOX
SET AUTO_CREATE_STATISTICS ON ( INCREMENTAL = ON )

After that, for all partitioned tables where you want to make this change, you need to update the complete statistics in the object statistics.

Although this step is too heavy it will pay off later.

UPDATE STATISTICS [PartitionTable] ( [PK__Partitio__357D0D3ED0CC792E] ) 
WITH FULLSCAN , INCREMENTAL = ON

To ease a bit, I created some scripts to help in this process.

One of them just to apply in a table, another one to pick up all partitioned tables in a database.

-------------------------------------------------------------------------- 
--GENERATE INCREMENTAL UPDATE STATISTICS FOR 1 TABLE
SELECT 'UPDATE STATISTICS ' + QUOTENAME(object_name(object_id)) + ' (' + QUOTENAME(name) + ') WITH FULLSCAN, INCREMENTAL = ON'
FROM sys.stats
WHERE object_id = object_id('[dbo].[PartitionTable]')
AND is_incremental = 0


--------------------------------------------------------------------------
--GENERATE INCREMENTAL UPDATE STATISTICS FOR ALL PARTITIONED TABLES
SELECT 'UPDATE STATISTICS ' + QUOTENAME(object_name(S.object_id)) + ' (' + QUOTENAME(s.name) + ') WITH FULLSCAN, INCREMENTAL = ON'
FROM sys.tables T
INNER JOIN sys.indexes I
ON T.object_id = I.object_id
INNER JOIN sys.data_spaces DS
ON I.data_space_id = DS.data_space_id
INNER JOIN sys.stats S
ON I.object_id = S.object_id
WHERE
DS.type = 'PS'
AND S.is_incremental = 0

After this first slower step, you just need to plan how your partitions will be updated.

In my scenario partitioning was by year-month and older partitions are rarely updated.

Then, we will update in a daily basis only the two last partitions.

This script will generate 3 types of update:

  1. UPDATE for non-partitioned tables.
  2. UPDATE for partitioned tables, but no incremental statistics
  3. UPDATE for partitioned tables with incremental statistics (last two partitions)
DECLARE @RESAMPLE BIT = 1
DECLARE @PERCENT_SAMPLE INT = 100 -- IF @RESAMPLE = 0 SET @PERCENT_SAMPLE
DECLARE @PROCESS_LAST_X_NONEMPTY_PARTITIONS INT = 2

--------------------------------------------------------------------------

SELECT
SQL_COMMAND = 'UPDATE STATISTICS ' + QUOTENAME(T.Name) + ' (' + QUOTENAME(S.name) + ') '
+ IIF (@RESAMPLE = 1,
'WITH RESAMPLE',
'WITH SAMPLE ' + CONVERT(VARCHAR(3), @PERCENT_SAMPLE) + ' PERCENT')
+ CHAR(10)
,INFO = ' — NON PARTITIONED TABLES'
FROM sys.tables T
INNER JOIN sys.indexes I
ON T.object_id = I.object_id
INNER JOIN sys.data_spaces DS
ON I.data_space_id = DS.data_space_id
INNER JOIN sys.stats S
ON I.object_id = S.object_id
WHERE
DS.type = 'FG' -- ROWS_FILEGROUP — NON PARTITION TABLE
AND S.stats_id IS NOT NULL -- FILTER TABLES WITH NO STATISTICS

--------------------------------------------------------------------------

SELECT
SQL_COMMAND = 'UPDATE STATISTICS ' + QUOTENAME(T.Name) + ' (' + QUOTENAME(S.name) + ') '
+ IIF (@RESAMPLE = 1,
'WITH RESAMPLE',
'WITH SAMPLE ' + CONVERT(VARCHAR(3), @PERCENT_SAMPLE) + ' PERCENT')
+ CHAR(10)
,INFO = ' — PARTITIONED TABLES NON INCREMENTAL'
FROM sys.tables T
INNER JOIN sys.indexes I
ON T.object_id = I.object_id
INNER JOIN sys.data_spaces DS
ON I.data_space_id = DS.data_space_id
INNER JOIN sys.stats S
ON I.object_id = S.object_id
WHERE
DS.type = 'PS' -- PARTITION_SCHEME — PARTITION TABLE
AND S.is_incremental = 0


--------------------------------------------------------------------------

IF (OBJECT_ID ('tempdb..#TEMP_LAST2PARTITIONS') IS NOT NULL)
DROP TABLE #TEMP_LAST2PARTITIONS

--------------------------------------------------------------------------

;WITH AUX_LAST2PARTITIONS AS
(
SELECT
T.object_id
,TableName = T.Name
,I.index_id
,IX_Name = COALESCE(I.Name,'[HEAP]')
,P.partition_number
,P.rows
,i.data_space_id
,ROW_NUMBER = ROW_NUMBER() OVER ( PARTITION BY T.object_id, I.index_id ORDER BY P.partition_number DESC)
FROM sys.tables T
INNER JOIN sys.indexes I
ON T.object_id = I.object_id
INNER JOIN sys.data_spaces DS
ON I.data_space_id = DS.data_space_id
INNER JOIN sys.partitions P
ON I.object_id = P.object_id
AND I.index_id = P.index_id
WHERE
DS.type = 'PS' -- PARTITION_SCHEME — PARTITION TABLE
AND P.rows > 0 -- filter empty partitions
)
SELECT *
INTO #TEMP_LAST2PARTITIONS
FROM AUX_LAST2PARTITIONS
WHERE [ROW_NUMBER] <= @PROCESS_LAST_X_NONEMPTY_PARTITIONS
CREATE CLUSTERED INDEX #IX_TEMP_LAST2PARTITIONS ON #TEMP_LAST2PARTITIONS (object_id, index_id, partition_number)

--------------------------------------------------------------------------

;WITH AUX AS
(
SELECT
AUX.object_id
,AUX.TableName
,AUX.index_id
,AUX.IX_Name
,StatsName = S.name
,AUX.partition_number
,AUX.rows
,PARTITION_VALUE =
ISNULL(
CAST(left_prv.value AS VARCHAR(MAX)) +
CASE
WHEN pf.boundary_value_on_right = 0 THEN ' > '
ELSE ' >= '
END
, '-INF > '
)
+ 'X' +
ISNULL(
CASE
WHEN pf.boundary_value_on_right = 0 THEN ' >= '
ELSE ' > '
END +
CAST(right_prv.value AS NVARCHAR(MAX)
), ' > INF'
)
FROM #TEMP_LAST2PARTITIONS AUX
INNER JOIN sys.stats S
ON aux.object_id = S.object_id
LEFT JOIN sys.partition_schemes ps
ON aux.data_space_id = ps.data_space_id
LEFT JOIN sys.partition_functions pf
ON ps.function_id = pf.function_id
LEFT JOIN sys.partition_range_values left_prv
ON left_prv.function_id = ps.function_id
AND left_prv.boundary_id + 1 = aux.partition_number
LEFT JOIN sys.partition_range_values right_prv
ON right_prv.function_id = ps.function_id
AND right_prv.boundary_id = aux.partition_number
WHERE S.is_incremental = 1
)
SELECT
SQL_COMMAND = 'UPDATE STATISTICS ' + QUOTENAME(TableName) + ' (' + QUOTENAME(StatsName) + ') WITH RESAMPLE ON PARTITIONS (' +CONVERT(VARCHAR(20), partition_number) + ')' + CHAR(10)
,INFO = ' — PARTITIONED TABLES INCREMENTAL LAST (' + CONVERT(VARCHAR, @PROCESS_LAST_X_NONEMPTY_PARTITIONS) + ') PARTITIONS ON (' +PARTITION_VALUE + ')'
FROM AUX
ORDER BY TableName, IX_Name, StatsName, partition_number desc

Below is an example of the commands:

clip_image001

Conclusion

Remember that this was an example where I just wanted to update the last 2 partitions, but each case is a case.

Just pick up these commands into a variable and schedule it to run with [ sp_executesql ]

In my scenario for one of the tables with 9 TB data + 18TB indexes (117 partitions) there was a reduction of time from ~ 3 days (Update statistics FULLSCAN applied to all statistics of the table) to ~ 1 hour (Update statistics with FULLSCAN for all statistics in the 2 last partitions of the table).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s