Populating Date Dimension with Fiscal Periods

A standard method for populating a date table with fiscal periods will always come to grief over exceptions.  This post includes a procedure that should be customisable enough to cater for most variations on fiscal periods – provided all periods contain a whole number of weeks.

Fiscal Periods

… do vary.

  • A 4-4-5 week pattern is standard… except when adjustments are made for busy months.
  • Normally a quarter contains three months… except when a company is adjusting its financial year end.

The procedure detailed below takes a few parameters which specify the fiscal period schema:

  • FirstDayOfFiscalYear  : the day it all starts, which also determines the first-day-of-week.
  • WeeksPerMonthPattern : a csv list stating the number of weeks in each month.  e.g.  4,4,5,4,4,5,4,4,5,4,5,4.    The procedure defaults to 4-4-5 throughout the year.
  • MonthsPerQuarterPattern : a csv list stating the number of months per quarter.  defaults to 3,3,3,3 so only needs overriding in those exceptional circumstances.

There is some validation to ensure that the numbers of months, quarters and weeks align.  There’s a “checksum” NbrOfWeeks parameter which defaults to 52; set a value if 53 weeks applies – or for fiscal years with other numbers of weeks.

Columns to Update

A date table might have any number of specialised fiscal columns to update – week and month numbers, start and end date keys, overall period counts.  Therefore the procedure provides a simple but flexible way of specifying an update clause using placeholders for values to be inserted.

  • TableToUpdate : name of table
  • DateKeyColumn : The Date-key column of the TableToUpdate. Column must be integer conversion of date format 112 (yyyymmdd).
  • ColumnUpdatePattern :  The contents of a SET clause as it would appear in a an UPDATE statement, with placeholders.  E.g.  “FiscalMonthNumber={m}, FiscalQuarterNumber={q},… FiscalMonthStart={mst},…”. Available placeholders:
    • {year} : the value of @FiscalYearName
    • {d}, {w}, {m}, {q}, {s} : day, week, month, quarter and semester numbers.
    • {wst}, {mst}, {qst}, {sst} : week, month, quarter, semester start date key
    • {wend}, {mend}, {qend}, {send} : week, month, quarter, semester end date key

Procedure ai_GenerateFiscalCalendar

This procedure generates table update statements for a single financial year.

Fiscal Calendar Update Generator.sql

Example Call

This call uses a specialised week-per-month pattern.

DECLARE @ColumnUpdatePattern varchar(1000) = 'Fiscal_year={year}, Fiscal_quarter={q}, ' +
    'Fiscal_period={m}, Fiscal_Week={w}, ' +
	'Fiscal_week_overall_nbr=(1161 + {w}), Fiscal_period_overall_nbr=(1037 + {m}), ' +
	'[quarter_Fiscal_of_year_end_date_key] = {qend}';
EXEC ai_GenerateFiscalCalendar    
	@FirstDayOfFiscalYear='2017-04-30',
	@FiscalYearName = '2017',
	@TableToUpdate = 'dbo.DimDate',
	@DateKeyColumn = 'date_key',
	@ColumnUpdatePattern = @ColumnUpdatePattern,
	@WeeksPerMonthPattern = '4,4,5,4,4,5,4,5,4,4,4,5';

A summary is output to the messages tab so that the periods can be checked.

FISCAL YEAR: 2017
START DATE: 2017-04-30 Sunday
TOTAL DAYS: 364
TOTAL WEEKS: 52
MONTHS:
 1  2017-04-30 Sun  28 days  4 weeks  Qtr: 1  Sem: 1
 2  2017-05-28 Sun  28 days  4 weeks  Qtr: 1  Sem: 1
 3  2017-06-25 Sun  35 days  5 weeks  Qtr: 1  Sem: 1
 4  2017-07-30 Sun  28 days  4 weeks  Qtr: 2  Sem: 1
 5  2017-08-27 Sun  28 days  4 weeks  Qtr: 2  Sem: 1
 6  2017-09-24 Sun  35 days  5 weeks  Qtr: 2  Sem: 1
 7  2017-10-29 Sun  28 days  4 weeks  Qtr: 3  Sem: 2
 8  2017-11-26 Sun  35 days  5 weeks  Qtr: 3  Sem: 2
 9  2017-12-31 Sun  28 days  4 weeks  Qtr: 3  Sem: 2
10  2018-01-28 Sun  28 days  4 weeks  Qtr: 4  Sem: 2
11  2018-02-25 Sun  28 days  4 weeks  Qtr: 4  Sem: 2
12  2018-03-25 Sun  35 days  5 weeks  Qtr: 4  Sem: 2

The update statements (output as rows) are created by simple concatenation of the TableToUpdate, ColumnUpdatePattern and DateKeyColumns specified.   One update is created for each day in the fiscal year.

UPDATE dbo.DimDate SET Fiscal_year=2017, Fiscal_quarter=1, Fiscal_period=1, Fiscal_Week=1, Fiscal_week_overall_nbr=(1161 + 1), Fiscal_period_overall_nbr=(1037 + 1), [quarter_Fiscal_of_year_end_date_key] = 20170729 WHERE date_key = 20170430;

-- 364 update statements generated.

 

A Procedure for generating Data Warehouse Update code

Background

Transactional tables exist in Database A.   These tables have partner archive tables in the same or another database.  For each insert or change of a value in the transactional database a timestamped new row is entered into the counterpart history table.  Standard archiving, similar to the temporal archiving offered in SQL Server 2016.

The difficulty is that writing the code to populate the archive table (or a Type 2 Slowly Changing Dimension data warehouse table) is complex and error prone.  Particularly since a Merge statement cannot on its own achieve this, so a temporary table has to populated from the Merge Output clause, and so on.

Code Generator

The procedure UTIL_GenerateArchiveProcedureDDL will generate the code required to populate an archive table.  The procedure is in the file attached at the end of this article.

It can generate code for a single specified table or all tables in a schema or the database.

The source table(s) must have an explicit primary key.

The archive table(s) must

  • have same name as the source tables (TODO add option to add _Archive to the name)
  • have temporal date columns named StartDate and EndDate (TODO allow other naming schemes to be used)
  • have the schema as the source tables with the addition of two date columns StartDate (not null) and EndDate (nullable).

The generator and generated code requires this Error reporting procedure.

Examples

Usage and description.

Standard Table (single Primary Key)

-- Typical table with a single primary key.
CREATE TABLE dbo.Products(
	ProductId int NOT NULL IDENTITY PRIMARY KEY,
	ProductName varchar(30) NOT NULL,
	AvailableFromDate smalldatetime NOT NULL,
	IntroducerId int NULL);

Run this with results to grid (it may be necessary to set the SSMS option to display carriage returns in the output grid view).

-- Generate Archive procedure for this table only by specifying table name
-- (if there is a similarly named table in another schema, provide @SourceSchema too)
EXEC UTIL_GenerateArchiveProcedureDDL
	@WarehouseDatabaseName = Archive,
	@SourceTable = 'Products';

This is the generated DDL.

  • The main action occurs in the MERGE statement. This joins all rows in the LIVE table with all current rows, i.e. rows where the EndDate is Null, in the Warehouse table (selected by CTE TargetCTE).
  • The Live and Warehouse tables are joined on all columns of the Primary Key (this example has only one).
  • If there is a match it means the same record (“same” here means having the same Primary Key value) exists in live and warehouse databases.
    • The procedure then compares the sets of non-PK columns in each table using the EXCEPT set operator. If a row is returned, there is a difference in one of the non-PK columns.
    • The THEN clause UPDATES the EndDate column of the warehouse table to obsolete the row in the warehouse table.
    • Note that at this stage no new row can be inserted into the warehouse to represent the new value; that is handled after the MERGE statement.
  • The remaining two MERGE clauses are simpler
    • Not matched by SOURCE:  the record no longer exists in live so the warehouse row is obsoleted.
    • Not matched by TARGET: the record doesn’t exist in warehouse, so a row is inserted.
  • All the SOURCE columns are OUTPUT-ed to a temporary table, together with the MERGE action key.
  • We still need to insert rows for records that have changed value since the warehouse was last updated.  The Merge Output will contain all rows from the full join of the source and target tables.  These can be filtered to only the records that resulted in an UPDATE operation, but even that is too wide since it includes records which have been deleted from the source; we don’t want to insert those again.  Therefore we filter to insert only rows that exist in the merge Source – that is, those where the ouput-ed Source columns are not null.  To test this, the code chooses the primary key (or the first PK column of a compound key) since this cannot be null if the row exists.
  • The Merge and changed-row Insert statements are wrapped in a transaction.
  • Row counts follow.
CREATE PROCEDURE [dbo].[UpdateDataWarehouse]
/*
	** DO NOT EDIT. 
	** Auto-generated by procedure UTIL_GenerateArchiveProcedureDDL ON Jan 22 2017  3:35AM.

	PURPOSE
		This procedure updates an archive warehouse database with the current values in the source database.
		The warehouse database must contain the same tables with the same columns plus temporal (StartDate, 
		EndDate) columns.		

	CODE GENERATOR HISTORY
		18/01/2017 Andrew Ing 1.02.

*/
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN TRY

DECLARE @EffectiveDate datetime;
DECLARE @ObsoletedRowCount int;
DECLARE @InsertedRowCount int;
DECLARE @UpdatedRowCount int;
DECLARE @TableActionSummary varchar(1000);
DECLARE @DebugInfo varchar(2000);

SET @EffectiveDate = GETDATE();


-------------------------------------------------------------
-- TABLE: [dbo].[Products]

  SET @DebugInfo = 'Create Temp Table [#MergeOutput_Products]';
  CREATE TABLE [#MergeOutput_Products] (
    MergeAction nvarchar(10)
   ,[ProductId] int NULL
   ,[ProductName] varchar(30) NULL
   ,[AvailableFromDate] smalldatetime NULL
   ,[IntroducerId] int NULL
  );

  SET @DebugInfo = 'Merge [dbo].[Products]';
  BEGIN TRANSACTION;
  -- Select only Current (EndDate IS NULL) warehouse rows for the comparison.
  WITH TargetCTE AS
    (SELECT * 
     FROM Archive.[dbo].[Products]
     WHERE EndDate IS NULL
    )
  MERGE  TargetCTE AS [Target]
  USING  [dbo].[Products] AS [Source]
    ON
         [Target].[ProductId] = [Source].[ProductId]
  WHEN MATCHED
    AND EXISTS (
      SELECT
          [Target].[ProductName]
         ,[Target].[AvailableFromDate]
         ,[Target].[IntroducerId]
      EXCEPT
      SELECT
          [Source].[ProductName]
         ,[Source].[AvailableFromDate]
         ,[Source].[IntroducerId]
    )
    THEN  -- Record has changed.
      UPDATE
      SET [EndDate] = @EffectiveDate
  WHEN NOT MATCHED BY SOURCE
    THEN  -- Record deleted from source.
      UPDATE
      SET [EndDate] = @EffectiveDate
  WHEN NOT MATCHED BY TARGET
    THEN  -- New Record.
      INSERT (
        [StartDate]
       ,[ProductId]
       ,[ProductName]
       ,[AvailableFromDate]
       ,[IntroducerId]
       ) VALUES (
        @EffectiveDate
       ,[Source].[ProductId]
       ,[Source].[ProductName]
       ,[Source].[AvailableFromDate]
       ,[Source].[IntroducerId]
       )
  OUTPUT
     $Action AS MergeAction
     ,[Source].[ProductId]
     ,[Source].[ProductName]
     ,[Source].[AvailableFromDate]
     ,[Source].[IntroducerId]
  INTO [#MergeOutput_Products];

  -- Insert new rows for records that have changed (old rows were marked as expired by
  -- the first MATCHED clause. #MergeOutput_ has been populated only with Source columns
  -- but MERGE is a full outer join so exclude rows with a Null primary key column.
  SET @DebugInfo = 'Insert for changed rows [dbo].[Products]';
  INSERT INTO Archive.[dbo].[Products]
    (
      [StartDate]
     ,[ProductId]
     ,[ProductName]
     ,[AvailableFromDate]
     ,[IntroducerId]
    )
    SELECT
      @EffectiveDate
     ,[ProductId]
     ,[ProductName]
     ,[AvailableFromDate]
     ,[IntroducerId]
    FROM [#MergeOutput_Products]
    WHERE
      MergeAction = 'UPDATE'
      AND [ProductId] IS NOT NULL;
    SELECT @UpdatedRowCount = @@ROWCOUNT;

  COMMIT TRANSACTION

  SET @DebugInfo = 'Get Row Counts [dbo].[Products]';
  SELECT
    @ObsoletedRowCount = SUM(CASE WHEN MergeAction = 'UPDATE' THEN 1 ELSE 0 END),
    @InsertedRowCount = SUM(CASE WHEN MergeAction = 'INSERT' THEN 1 ELSE 0 END)
  FROM
    [#MergeOutput_Products];

  SET @TableActionSummary = CONVERT(varchar(19), GETDATE(), 121) + ' [dbo].[Products]
			 ' + CAST(ISNULL(@ObsoletedRowCount, 0) AS varchar(10)) + ' Rows Obsoleted, ' + 
			 CAST(ISNULL(@InsertedRowCount, 0) AS varchar(10)) + ' New Rows Inserted, ' +
			 CAST(@UpdatedRowCount AS varchar(10)) + ' Update Rows Inserted.';
  PRINT @TableActionSummary;
  D ROP TABLE [#MergeOutput_Products];

-- END OF TABLE [dbo].[Products]


END TRY
BEGIN CATCH
  EXECUTE dbo.ERROR_HandleCaughtError @DebugInfo;
  RETURN 5555;
END CATCH
/*
  END OF GENERATED SCRIPT
*/
GO

Many to Many tables

Or any table with just primary key columns.

CREATE TABLE dbo.StudentCourses(
	StudentId int NOT NULL,
	CourseId int NOT NULL,
	CONSTRAINT PK_StudentCourses PRIMARY KEY (StudentId, CourseId)
	);

If the table only has a primary key then it is not possible to have changed values in the same record. The record either exists in both live and warehouse tables (in which case no action is needed) or it exists in only one, in which case it must be inserted or obsoleted in the warehouse accordingly.

If no non-primary key columns are found, the WHEN MATCHED AND EXISTS… clause is omitted from generated code.  Note in the section below that both primary key columns are used in the Merge join statement, and that the Matched And Exists clause is not generated.

...
  WITH TargetCTE AS
    (SELECT * 
     FROM Archive.[dbo].[StudentCourses]
     WHERE EndDate IS NULL
    )
  MERGE  TargetCTE AS [Target]
  USING  [dbo].[StudentCourses] AS [Source]
    ON
         [Target].[StudentId] = [Source].[StudentId]
     AND [Target].[CourseId] = [Source].[CourseId]
  WHEN NOT MATCHED BY SOURCE
    THEN  -- Record deleted from source.
      UPDATE
      SET [EndDate] = @EffectiveDate
  WHEN NOT MATCHED BY TARGET
    THEN  -- New Record.
      INSERT (
...

Generating Updates for all Tables

Omitting the SourceTable and SourceSchema parameters will generate DDL for a procedure that will include all tables in the database.   This can be restricted by providing SourceSchema (which will include all tables in the named schema) or SourceTable.

 

UTIL_GenerateArchiveProcedure.sql (3)

Duration Dimension

For measuring short durations, such as telephone calls or handling times.

Q1 Linear Grain?

The question I haven’t answered is whether the dimension should be at the finest grain for all durations.  The grain I am using is seconds but probably after 30 minutes that’s finer than is needed. That’s 86,400 rows for a day.   An alternative would be using a log (ish) scale for DimDurationKey:  seconds up to 1 hour; minutes up to 24 hours, hours up to ? 7 days, then days… weeks… months.  This would allow a single Duration Dimension to cover a wide range of periods without becoming huge.  DimDimensionKey would have to be calculated during load.

Q2 Range Titles

What is a clear way of creating range titles:  0-1 minutes, 1-5 minutes, etc.   Strictly these are 0-59 secs, 1min – 1:59, 2min – 4:59, although users probably think of them as 1min, 2min, 5min.

  • 0:00 – 0:59
  • 1:00 – 1:59
  • 2:00 – 4:59
  • 5:00 – 59:59

Or

  • 0 – 59 secs
  • 1 min – 1:59
  • 2 min – 4:59
  • 5 min – 59:59

Neither of those is clear or elegant.  The second emphasises the start of each range whereas the information is that it’s in the “5 minute” band, whereas “2 min – 4:59” most salient value is “2 min”.

Based on this I looked at rounding the end of each range.

  • 0 – 60 secs
  • 1:01- 2 min
  • 2:01 – 5 min
  • 5:01 – 1 hour

That looks better but it gets complicated once you move beyond 1 hour.  How to express the next segment up to say 1.5 hours:

  • 1:00:01 to 1 hour 3o mins
  • 1:00:01 to 1:30:00
  • 1:00:01 to 1 and a half hours

All are ugly mixes of units.

I think for short durations I am resigned to accept the one-second inaccuracy of “0-1 minute”, “2-5 minutes” for the sake of clarity and fitting in with the way that people actually look at these figures.  So long as the second in question is not double counted I don’t think it matters that it’s location is indeterminate. ( In contrast this wouldn’t be acceptable for age ranges (0-65, 65-70).)

In doing this I revert to rounding the start of each round. That accords then with the MinuteOfHour value which looks more logical when viewing the table as a whole.

So that leaves

  • 0 – 59 secs
  • 1 – 2 mins   [60 – 119 secs]
  • 2 – 5 mins
  • 45 – 60 mins
  • 60 – 90 mins
  • 90 mins – 1 hour

Creation Script

IF ( (SELECT COUNT(*) AS C FROM dbo.DimDuration) = 0 ) 
BEGIN
	print 'Seed DimDuration...';

		INSERT DimDuration
		(
			[Seconds],
			[Minutes],
			Hours,
			MinuteOfHour,
			DurationAnalysis1,
			DurationAnalysis1_MaxSeconds
         
		)
		VALUES
		(
			0, 0, 0, 0, '0 secs', 0
		);


	WITH
	  Pass0 as (select 1 as C union all select 1), --2 rows
	  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
	  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
	  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
	  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
	  Tally as (select row_number() over(order by C) as Number from Pass4)

	INSERT DimDuration
		(
			[Seconds],
			[Minutes],
			Hours,
			MinuteOfHour,
			DurationAnalysis1,
			DurationAnalysis1_MaxSeconds
         
		)

		SELECT TOP(15000) 
			Number,
			Number / 60,
			Number / (60 * 60),
			(Number / 60) % 60,
			CASE 
				WHEN (Number)       < 60	    THEN '1 - 59 secs'
				WHEN (Number)       < 120	    THEN '1 - 2 mins'
				WHEN (Number / 60)  < 3		    THEN '2 - 3 mins'
				WHEN (Number / 60)  < 4		    THEN '3 - 4 mins'
				WHEN (Number / 60)  < 5		    THEN '4 - 5 mins'
				WHEN (Number / 60)  < 6		    THEN '5 - 6 mins'
				WHEN (Number / 60)  < 7		    THEN '6 - 7 mins'
				WHEN (Number / 60)  < 8		    THEN '7 - 8 mins'
				WHEN (Number / 60)  < 9		    THEN '8 - 9 mins'
				WHEN (Number / 60)  < 10	    THEN '9 - 10 mins'
				WHEN (Number / 60)  < 15	    THEN '10 - 15 mins'
				WHEN (Number / 60)  < 20	    THEN '15 - 20 mins'
				WHEN (Number / 60)  < 30	    THEN '20 - 30 mins'
				WHEN (Number / 60)  < 40	    THEN '30 - 40 mins'
				WHEN (Number / 60)  < 50	    THEN '40 - 50 mins'
				WHEN (Number / 60)  < 60	    THEN '50 - 60 mins'
				WHEN (Number / 60)  < 90	    THEN '60 - 90 mins'
				WHEN (Number / 60)  < 120	    THEN '90 mins - 2 hours'
				WHEN (Number / (60 * 60)) < 3	THEN '2 - 3 hours'
				WHEN (Number / (60 * 60)) < 4	THEN '3 - 4 hours'
				WHEN (Number / (60 * 60)) < 10	THEN '4 - 10 hours'
				WHEN (Number / (60 * 60)) < 15	THEN '10 - 15 hours'
				WHEN (Number / (60 * 60)) < 24	THEN '15 - 24 hours'
				ELSE  'over 24 hours'
			END
			,0
			FROM Tally;

	-- Add a column that can be used as a sort-key for ranges.

		WITH CTE AS
		(
			SELECT DurationAnalysis1,
				DurationAnalysis1_MaxSeconds  = MAX(Seconds)
			FROM dbo.DimDuration
			GROUP BY DurationAnalysis1
		)
	
		UPDATE DD 
		SET
			DurationAnalysis1_MaxSeconds = CTE.DurationAnalysis1_MaxSeconds
		FROM
			DimDuration DD 
			INNER JOIN CTE ON DD.DurationAnalysis1 = CTE.DurationAnalysis1;

	print 'Seed DimDuration complete.';