Data Obfuscation

Some functions for obfuscating data to protect personally identifiable data.  Aims:

  1. fast enough to process a large table
  2. resulting data is “sufficiently” obfuscated
  3. resulting data retains enough characteristics that it makes some sense in context.
  4. transformations are repeatable so that the same values in different datasets continue to match after transformation.

So this is some form of one-way hash, but not one that results in an entirely meaningless string of alphanumerics.

Telephone Number Obfuscation

The resulting number will have the same shape – including spaces and other punctuation – and retains the first character, so UK numbers are still recognisable.

Method (first draft):  Add first digit to last digit, modulo 10;  then add second digit to second-last digit modulo 10; and so on.  Skip first digit and skip non-numeric characters.

I should probably create a pair of functions, for unicode and non-unicode source columns.

/*
	PURPOSE:
		Efficiently obfuscate a telephone number in a repeatable manner.
		There should be collisions in the output so it cannot be reliably
		reverse engineered.
		Preserve the first digit of the number so it resembles a UK
		telephone number.
		Non-numeric characters reproduced verbatim.

	HISTORY

	30/07/2015 AI created.
*/
ALTER FUNCTION dbo.fn_ObfuscateTelNum
(
	@input nvarchar(100)
)
	RETURNS nvarchar(100)
AS
BEGIN
	declare @i int = 1
	declare @CurrentNum int = 0
	declare @SecondNum int = 0
	declare @PrevNum int = 0
	declare @output nvarchar(100) = ''

	IF (@input IS NULL) RETURN NULL
	WHILE (@i <= LEN(@input))
	BEGIN
		IF ( SUBSTRING(@input, @i, 1)  LIKE '[0-9]')
		BEGIN
			SET @CurrentNum = CAST(SUBSTRING(@input, @i, 1) AS int)  + @PrevNum
			IF (@i > 1 AND (SUBSTRING(@input, LEN(@input) - @i, 1) LIKE '[0-9]'))
				SET @SecondNum = CAST(SUBSTRING(@input, LEN(@input) - @i, 1) AS int)
				
			SET @output += CAST(((@CurrentNum + @SecondNum)  % 10) AS nvarchar(1))
			SET @PrevNum = @CurrentNum
		END
		ELSE
		BEGIN
			SET @output += SUBSTRING(@input, @i, 1)
		END
		SET @i = @i + 1
	END
	RETURN @output
END
GO

-- Two similar numbers produce quite different results:
select dbo.fn_ObfuscateTelNum('02072342357')
-- 05434650585
select dbo.fn_ObfuscateTelNum('02072345327')
-- 05734653885
-- but punctuation of number affects output:
select dbo.fn_ObfuscateTelNum('0207 2345327')
-- 0573 3653885
select dbo.fn_ObfuscateTelNum(null)
-- NULL
select dbo.fn_ObfuscateTelNum('+44 207 3248234')
-- +66 230 0488037