DAX: partitioned conditional aggregations

We want to find how often customers are behind in their payments or calculate their average debt when they are in debt.  The question might be

  1. How many customer-months of debt am I experiencing (counting 1 for each month that each customer is in debt).
  2. What is the average monthly debt of customers (aggregating only months where a customer is in debt).

And similar questions, such as profit in profitable-months.

The aggregations (i.e. COUNT or AVERAGE) should be partitioned by

  1. customer
  2. time interval (e.g. month)

and there will need to be some form of conditional clause to return a value only when the customer’s balance is above zero (debt).

Debt in indebted-months = SUMX(
	CROSSJOIN( 
		VALUES( 'Date'[Month] ), 
		VALUES( 'Customer'[CustomerId] ) 
	),
	VAR customer_balance = [Customer balance]
	RETURN
	IF (
		customer_balance > 0,
		customer_balance
	)
)

Performance

How can performance be improved? 

  1. The most obvious optimisation is already in place:  [Customer balance] is assigned to a variable, which avoids calculating it twice for indebted customers.
  2. The number of iterations will be the product of Customers and Months, so if these dimensions are small (a restricted date range is in context) then performance might be OK.
  3. If there is no filter on date in context then it might help to filter the date table to include only months where there is at least one customer balance.

Anything else?

  • Can we avoid using IF? Is there an expression using only mathematic functions that will return only positive values?  (A limitation here that I can see is that math expression will return zero where the balance is <= 0 which will be OK for some uses but not when calculating average.)
  • Is the CROSSJOIN optimal?  And the use of VALUES to return tables? For Date, the VALUES clause returns a table containing the set of months, but for Customer the VALUES clause returns the table grain, so the table itself could have been used: do those extra columns have a cost where they are not referred to? Something to investigate.

Referential integrity checks

This script is aimed at data warehouse relational databases which don’t have foreign key constraints.  Using column names it matches key column values between fact and dimension tables to identify orphans.

/*
	Referential integrity checker

	Aimed at data warehouses where dimension tables are prefixed 'dim', fact tables 'fact'
	and where the dimension primary key column name is part or all of the fact table foreign key column name.
	And there are no foreign key constraints:(

	@Template holds a dynamic sql template for counting number of rows in fact table, the number of
	orphaned foreign keys in the fact table, and the number of distinct orphaned values, etc.  It 
	contains placeholders ('{DimensionTable}', '{PrimaryKey}', '{FactTable}', '{ForeignKey}')  
	which are all that are requried to make it run a test on a pair of tables.

	The CTE "Dimensions" is a list of tables beginning "dim" together with the name of the 
	primary key column.

	The CTE "FactColumns" is all tables beginning "fact" that have a column name that contains the
	primary key column name.  ** These are identified by their names rather than by foreign key constraint.

	These two tables are joined to produce a set of pairs of dimension and fact tables that is
	used to populate a cursor.

	The cursor loop substitutes the four placeholders in the sql template, and the resulting 
	sql statement is executed.

	OUTPUT
		DimensionTable	:	the dimension table e.g. [dbo].[dimCustomer]
		PrimaryKey		:   the primary key of the dimension table  e.g. [CustomerKey]
		FactTable		:	a fact table that has a foreign key to a dimension e.g. [dbo].[factSales]
		ForeignKey		:	the foreign key of the fact table e.g. [CustomerKey] or [DeliveryCustomerKey]
		FactRows		:   total number of rows in the fact table
		NbrOfOrphans	:	number of rows in fact table where the foreign key does not match the dimension primary key
		NbrOfOrphanedValues : number of distinct values that do not match a dimension primary key value
		MaxOrphanedValue	: the highest orphaned foreign key value (including Null).  'n/a' if there are no orphaned rows.
		NbrOfSpecialRows	: number of rows in fact table where foreign key is < 0. "Unknown" and suchlike.

	Change log
		18-08-2020 AI version 1.

*/



CREATE TABLE
	#ReferentialIntegrityTests(
		RunTime				datetime NOT NULL,
		DimensionTable			nvarchar(300) NOT NULL,
		PrimaryKey				nvarchar(300) NOT NULL,
		FactTable				nvarchar(300) NOT NULL,
		ForeignKey				nvarchar(300) NOT NULL,
		FactRows				int,
		NbrOfOrphans			int,
		NbrOfOrphanedValues		int,
		MaxOrphanedValue		varchar(20),
		NbrOfSpecialRows		int
	)


DECLARE @sql nvarchar(max),
		@Template nvarchar(max),
	    @DimensionTable nvarchar(300),
		@FactTable nvarchar(300),
		@PrimaryKey nvarchar(300),
		@ForeignKey nvarchar(300),
		@cur CURSOR;


SET @Template = N';WITH RI AS
(
	SELECT 
		d.{PrimaryKey} AS PrimaryKey,
		f.{ForeignKey} AS ForeignKey,
		Count(*) AS NbrOfRows
	FROM 
		{FactTable} f
		LEFT JOIN {DimensionTable} d ON f.{ForeignKey} = d.{PrimaryKey} 
	GROUP BY
		d.{PrimaryKey},
		f.{ForeignKey}
)
INSERT #ReferentialIntegrityTests (
	RunTime				,
	DimensionTable		,
	PrimaryKey			,
	FactTable			,
	ForeignKey			,
	FactRows			,
	NbrOfOrphans		,
	NbrOfOrphanedValues	,
	MaxOrphanedValue	,
	NbrOfSpecialRows
)
SELECT 
	RunTime = GETDATE(),
	DimensionTable	= ''{DimensionTable}'',
	PrimaryKey		= ''{PrimaryKey}'',
	FactTable		= ''{FactTable}'',
	ForeignKey		= ''{ForeignKey}'',
	FactRows			= ISNULL(( SELECT SUM(NbrOfRows) FROM RI ), 0),
	NbrOfOrphans		= ISNULL(( SELECT SUM(NbrOfRows) FROM RI WHERE PrimaryKey IS NULL), 0),
	NbrOfOrphanedValues = ISNULL(( SELECT COUNT(*) FROM RI WHERE PrimaryKey IS NULL GROUP BY PrimaryKey), 0),
	MaxOrphanedValue	= ISNULL((SELECT MAX( ISNULL( CAST(ForeignKey AS varchar(25)), ''Null'')) FROM RI WHERE PrimaryKey IS NULL), ''n/a''),
	NbrOfSpecialRows	= ISNULL( (SELECT SUM(NbrOfRows) FROM RI WHERE ForeignKey < 0), 0)';
	
SET @cur = CURSOR STATIC FOR
WITH Dimensions AS
(
	SELECT 
		 QUOTENAME(S.NAME) + '.' + QUOTENAME(t.[name]) AS TableName
		,QUOTENAME(c.name) AS PrimaryKey

	FROM sys.objects t
		INNER JOIN sys.schemas S ON S.schema_id =t.schema_id
		INNER JOIN sys.columns c ON c.object_id = t.object_id
		INNER JOIN sys.index_columns  ic ON ic.object_id = t.object_id
				AND ic.column_id = c.column_id
		INNER JOIN sys.key_constraints kc ON kc.parent_object_id = ic.object_id
				AND ic.index_id = kc.unique_index_id
	WHERE 
		t.[type] = 'U'
		AND kc.[type] = 'PK'
		AND t.[name] LIKE 'dim%'

), 
FactColumns AS
(
	SELECT 
		QUOTENAME(S.NAME) + '.' + QUOTENAME(t.[name]) AS TableName
		,QUOTENAME(c.name) AS ColumnName

	FROM sys.objects t
		INNER JOIN sys.schemas S ON S.schema_id =t.schema_id
		INNER JOIN sys.columns c ON c.object_id = t.object_id
	WHERE 
		t.[type] = 'U'
		AND t.[name] LIKE 'fact%'
)

	SELECT
		d.TableName AS DimensionTable,
		d.PrimaryKey AS PrimaryKey,
		f.TableName AS FactTable,
		f.ColumnName As ForeignKey
	FROM
		Dimensions d
		CROSS JOIN FactColumns f 
	WHERE
		CHARINDEX(REPLACE(d.PrimaryKey, '[', ''), f.ColumnName) > 0;
OPEN @cur

WHILE 1 = 1
BEGIN
	FETCH @cur INTO @DimensionTable,
					@PrimaryKey,
					@FactTable,
					@ForeignKey
	IF @@fetch_status <> 0 BREAK
	SET @sql = REPLACE(@Template, '{DimensionTable}', @DimensionTable)
	SET @sql = REPLACE(@sql, '{FactTable}', @FactTable)
	SET @sql = REPLACE(@sql, '{PrimaryKey}', @PrimaryKey)
	SET @sql = REPLACE(@sql, '{ForeignKey}', @ForeignKey)

	EXEC sp_executesql @sql
END
		
SELECT * FROM #ReferentialIntegrityTests ORDER BY NbrOfOrphans desc, FactRows desc
DROP TABLE #ReferentialIntegrityTests