Categories: Fabric

Get row counts of all tables in a Microsoft Fabric warehouse

I loaded some built-in sample data from Wide World Importers into a Fabric warehouse. You get an option to load sample data when you create a new pipeline in Fabric.

It says the data is 352MB in size, but after loading the data I was curious about how many rows were actually in that sample data set. Unfortunately, it’s not as straight forward as with a “normal” SQL Server database to get the row counts. First of all, when you connect with SSMS to the database there’s sadly no option to get the row counts report:

For contrast, this is with a regular SQL Server database (Azure SQL DB doesn’t have this report either though):

So let’s turn to some SQL querying. I googled a bit and got this article as one of the first hits: SQL Server Row Count for all Tables in a Database. Let’s go over the different options.

sys.partitions

This system views returns an approximate row count.

SELECT
     TableName  = QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.name)
    ,[RowCount] = SUM(sPTN.[Rows])
FROM sys.objects    sOBJ
JOIN sys.partitions sPTN ON sOBJ.object_id = sPTN.object_id
WHERE   sOBJ.[type]         = 'U'
    AND sOBJ.is_ms_shipped  = 0x0
    AND index_id            < 2 -- 0:Heap, 1:Clustered
GROUP BY sOBJ.schema_id
         ,sOBJ.name
ORDER BY TableName;

The result:

Not really that useful.

sys.dm_db_partition_stats

SELECT
     TableName  = QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.[name])
    ,[RowCount] = SUM(sdmvPTNS.row_count)
FROM sys.objects                sOBJ
JOIN sys.dm_db_partition_stats  sdmvPTNS ON sOBJ.object_id = sdmvPTNS.object_id
WHERE   sOBJ.type = 'U'
    AND sOBJ.is_ms_shipped = 0x0
    AND sdmvPTNS.index_id < 2
GROUP BY sOBJ.schema_id, sOBJ.[name]
ORDER BY [TableName];

But alas…

If you’re wondering, the DMVs sys.dm_db_index_physical_stats and sys.dm_db_stats_properties are also not supported.

statistics

The following query returns the statistics objects on our (user) tables:

SELECT
    object_name        = OBJECT_NAME(s.object_id)
    ,stats_name        = s.[name]
    ,s.stats_id
    ,stats_update_date = STATS_DATE(s.object_id, s.stats_id)
    ,s.auto_created
    ,s.user_created
    ,s.stats_generation_method_desc
    ,s.*
FROM sys.stats         s
JOIN sys.objects       o  ON o.object_id = s.object_id            
WHERE   o.[type] = 'U';

Unfortunately, DBCC SHOW_STATISTICS isn’t that helpful:

The ACE-Cardinality holds a row count according to the docs, but it’s not queryable:

We can however create a manual statistic with a full scan:

CREATE STATISTICS DimEmployee_EmployeeKey
ON dbo.dimension_employee(EmployeeKey) WITH FULLSCAN;

And get the information using DBCC STATISTICS:

DBCC SHOW_STATISTICS('dimension_employee','DimEmployee_EmployeeKey') WITH STAT_HEADER

However, creating a statistics object on each table on a column that has a unique value for each row (this might be hard to find for a fact table) and then querying it with DBCC seems a bit cumbersome. Also, I would like the results in a format I can store in a table.

So let’s try the brute-force method where we just count the rows.

SELECT COUNT(1)

The stored proc sp_MSforeachtable is not available in Fabric (it seems we have no luck today):

So let’s use some plain old dynamic SQL. From that same MSSQLTips.com article:

DECLARE @QueryString NVARCHAR(MAX);
SELECT @QueryString = COALESCE(@QueryString + ' UNION ALL ','')
                      + 'SELECT '
                      + '''' + QUOTENAME(SCHEMA_NAME(sOBJ.schema_id))
                      + '.' + QUOTENAME(sOBJ.[name]) + '''' + ' AS [TableName]
                      , COUNT(*) AS [RowCount] FROM '
                      + QUOTENAME(SCHEMA_NAME(sOBJ.schema_id))
                      + '.' + QUOTENAME(sOBJ.[name])
FROM sys.objects AS sOBJ
WHERE   sOBJ.[type] = 'U'
    AND sOBJ.is_ms_shipped = 0x0
ORDER BY SCHEMA_NAME(sOBJ.schema_id), sOBJ.[name];

EXEC sp_executesql @QueryString;

The results:

Conclusion

At the moment, it seems the only method to get reliable row counts for all tables is to do a SELECT COUNT(1) on each table. You can use statistics, but that method seems more work than it is worth. The upside of using SELECT COUNT(1) is that you might trigger automatic data compaction for the table.

With some dynamic SQL, you can create a reusable script that you can run on each warehouse (or SQL Analytics endpoint). You can modify it to store the results in a table.


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Recent Posts

Free webinar – Tackling the Gaps and Islands Problem with T-SQL Window Functions

I'm hosting a free webinar at MSSQLTips.com at the 19th of December 2024, 6PM UTC.…

5 days ago

dataMinds Connect 2024 – Session Materials

The slides and scripts for my session "Tackling the Gaps & Islands Problem with T-SQL…

4 weeks ago

Connect to Power BI as a Guest User in another Tenant

Sometimes your Microsoft Entra ID account (formerly known as Azure Active Directory) is added as…

2 months ago

How to use a Script Activity in ADF as a Lookup

In Azure Data Factory (ADF, but also Synapse Pipelines and Fabric Pipelines), you have a…

4 months ago

Database Build Error – Incorrect syntax near DISTINCT

I wrote a piece of SQL that had some new T-SQL syntax in it: IS…

4 months ago

Speaking at dataMinds Connect 2024

I'm very excited to announce I've been selected as a speaker for dataMinds Connect 2024,…

5 months ago