We want to find how often customers are behind in their payments or calculate their average debt when they are in debt. The question might be
- How many customer-months of debt am I experiencing (counting 1 for each month that each customer is in debt).
- What is the average monthly debt of customers (aggregating only months where a customer is in debt).
And similar questions, such as profit in profitable-months.
The aggregations (i.e. COUNT or AVERAGE) should be partitioned by
- time interval (e.g. month)
and there will need to be some form of conditional clause to return a value only when the customer’s balance is above zero (debt).
Debt in indebted-months = SUMX( CROSSJOIN( VALUES( 'Date'[Month] ), VALUES( 'Customer'[CustomerId] ) ), VAR customer_balance = [Customer balance] RETURN IF ( customer_balance > 0, customer_balance ) )
How can performance be improved?
- The most obvious optimisation is already in place: [Customer balance] is assigned to a variable, which avoids calculating it twice for indebted customers.
- The number of iterations will be the product of Customers and Months, so if these dimensions are small (a restricted date range is in context) then performance might be OK.
- If there is no filter on date in context then it might help to filter the date table to include only months where there is at least one customer balance.
- Can we avoid using IF? Is there an expression using only mathematic functions that will return only positive values? (A limitation here that I can see is that math expression will return zero where the balance is <= 0 which will be OK for some uses but not when calculating average.)
- Is the CROSSJOIN optimal? And the use of VALUES to return tables? For Date, the VALUES clause returns a table containing the set of months, but for Customer the VALUES clause returns the table grain, so the table itself could have been used: do those extra columns have a cost where they are not referred to? Something to investigate.