About Me

My photo
Colorado
Paul has 18 years experience with Microsoft SQL Server. He has worked in the roles of production DBA, database developer, database architect, applications developer, business intelligence and data warehouse developer, and instructor for students aspiring for MCDBA certification. He has performed numerous data migrations and supported large databases (3 Terabyte, 1+ billion rows) with high transactions. He is a member of PASS, blogs about lessons learned from a developer’s approach to SQL Server administration, and has been the president of the Boulder SQL Server Users’ Group from January 2009 to present.

Friday, December 10, 2010

Row Count for All Tables in a Given Database

At the October 2010 meeting of the Boulder SQL Server Users’ Group, a question was raised as to how to get a row count of all tables in a given database. 

I used to do this by doing a simple SELECT COUNT(*), generating the code for doing this on all the tables with this:

SELECT 'SELECT Count(*) FROM [' + SCHEMA_NAME(t.schema_id) + '].[' + t.name + ']'
FROM sys.tables t
ORDER BY name

I would then copy/paste the results of the above query into a new query window and run it. 

But the problem with this method is that each SELECT COUNT(*) query will either be a Table Scan or Clustered Index Scan.  This means that SQL Sever will have to pass through all the rows of the table, which is time-consuming and inefficient.  This method might be acceptable for smaller databases, but it is not a good idea for tables with millions or billions of rows, or databases in a production environment.

At that same meeting, Kate Totten offered a script which is far more efficient.  After making some minor cosmetic changes, I present her script here:

Use AdventureWorks

-- Low-footprint row count (without table or index scan) for a given database
SELECT  Convert(varchar, ServerProperty('ServerName')) as Server,
            DB_NAME() AS DB,
            SCHEMA_NAME(t.schema_id) AS SchemaName,
            t.name AS TableName,
            '[' + SCHEMA_NAME(t.schema_id) + '].[' + t.name + ']' AS FullTableName,
            i.Rows,
            t.object_id
FROM sys.tables t
JOIN sys.sysindexes i
      ON  t.object_id = i.id
      AND i.indid < 2         -- 0 = heap, 1 = has clustered index, >1 is non-clustered index
ORDER BY rows desc

This method is far superior.  The row count for each table is already present in sys.sysindexes, so the tables themselves do not need to be accessed.  We only want where indid = 0 (heap) and indid = 1 (clustered) because a given table will always have exactly one row here for either a heap or the clustered index, both of which necessarily include all rows in a table.  Note that a non-clustered index might have all rows in a table, but it cannot be guaranteed.

Drawbacks to the above method (will be addressed in future blog posts)
  • You have to run it in each database where you want to study the row counts.  
  • Row counts for each partition in a partitioned table are not included. 

No comments:

Post a Comment