Read more: http://www.blogsmonetize.com/2010/10/how-to-use-syntax-highlighter-3083-in.html#ixzz1DHzvEgBA

Friday, March 25, 2011

Statistics - An Important Factor

Update statistics are important for SQL server database as the query plan generation heavily depends on the histogram of the data been passed for the very first execution. The DBA should have given with sysadmin fixed server role or ownership of the database.

When do I need a statistics update?

- If we see some drastic difference between Estimated number of records in Estimated and Actual plans, it shows a statistics update is much adviced.

(Caught very good stuff from Kim L Tripp's blog thought of worth mentioning...) Statistics are traditionally updated when roughly 20% (+ a minimum of 500 rows) of the data has changed. In SQL Server 200x (2000, 2005 and 2008) statistics are NOT immediately updated when the threshold is reached, instead they are invalidated. It's not until someone needs the statistic that SQL Server updates it. This reduces thrashing that occurred in SQL Server 7.0 when stats were updated immediately instead of just being invalidated. Another interesting point is what is meant by "20% of the data has changed?"... How is that defined? Is it based on updates to columns or inserts of rows? Of course the answer is... it depends - here, it depends on the version of SQL Server that you're using: SQL Server 2000 defines 20% as 20% of the ROWS have changed. You can see this in sysindexes.rcmodctr. SQL Server 2005/8 defines 20% as 20% of the COLUMN data has changed. You cannot see this unless you are accessing SQL Server through the DAC as it's in a base system table (2005: sysrowsetcolumns.rcmodified and for 2008: sysrscols.rcmodified).

There are two ways we can acheive the updates on statistics: a. Update Statistics b. sp_updatestats

Whats the difference between the above two?

Update statistics


- Runs statistics updates for the table given for all indexes and columns.


sp_updatestats

- Runs update statistics for all user defined and internal tables.

- Uses rowmodcntr to update the statistics using the 20% data changed.

It avoids unnecessary updates of statistics for unchanged rows. column statistics are never being updated by sp_updatestats unless it reaches its threshold.Column statistics have a major role in the creation of a query plan. Sometimes optimizer looks at the column statistics histogram when generates a query plan.So if we see any performance degradation in our production, it is always look at the statistics of a table and update if you feel any of cloumn statistics are a way out of up-to-date albeit the index statistics are up to date.


Select *,STATS_DATE(object_id,stats_id) 
From sys.stats where object_id = OBJECT_ID('')
Notes:

Update statistics uses tempdb heavily.As tempdb is a shared resorce and is often a bottleneck for large system, it would be great if we could run the update statistics at traffic time.When we do update statistics, internally it does the following generation which is causing lots of sort warnings:



SELECT StatMan([SC0], [SB0000]) FROM
(SELECT TOP 100 PERCENT [SC0], step_direction([SC0]) over (order by NULL) AS [SB0000]
FROM (SELECT [cEndDateTime] AS [SC0] FROM [dbo].[Tablename] WITH (READUNCOMMITTED,SAMPLE 5.949706e+001 PERCENT) )
AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)

--to find all the Tables and Index with number of days statistics being old



SELECT OBJECT_NAME(A.id) AS Object_Name, A.name AS index_name, STATS_DATE(A.id, indid) AS StatsUpdated , DATEDIFF(d,STATS_DATE(A.id, indid),getdate()) DaysOld FROM sysindexes A INNER JOIN sysobjects B ON A.id = B.id and Xtype='U' WHERE A.name IS NOT NULL ORDER BY DATEDIFF(d,STATS_DATE(A.id, indid),getdate()) DESC

No comments: