Data Obfuscation
Some functions for obfuscating data to protect personally identifiable data. Aims:
- fast enough to process a large table
- resulting data is “sufficiently” obfuscated
- resulting data retains enough characteristics that it makes some sense in context.
- transformations are repeatable so that the same values in different datasets continue to match after transformation.
So this is some form of one-way hash, but not one that results in an entirely meaningless string of alphanumerics.
Telephone Number Obfuscation
The resulting number will have the same shape – including spaces and other punctuation – and retains the first character, so UK numbers are still recognisable.
Method (first draft): Add first digit to last digit, modulo 10; then add second digit to second-last digit modulo 10; and so on. Skip first digit and skip non-numeric characters.
I should probably create a pair of functions, for unicode and non-unicode source columns.
/* PURPOSE: Efficiently obfuscate a telephone number in a repeatable manner. There should be collisions in the output so it cannot be reliably reverse engineered. Preserve the first digit of the number so it resembles a UK telephone number. Non-numeric characters reproduced verbatim. HISTORY 30/07/2015 AI created. */ ALTER FUNCTION dbo.fn_ObfuscateTelNum ( @input nvarchar(100) ) RETURNS nvarchar(100) AS BEGIN declare @i int = 1 declare @CurrentNum int = 0 declare @SecondNum int = 0 declare @PrevNum int = 0 declare @output nvarchar(100) = '' IF (@input IS NULL) RETURN NULL WHILE (@i <= LEN(@input)) BEGIN IF ( SUBSTRING(@input, @i, 1) LIKE '[0-9]') BEGIN SET @CurrentNum = CAST(SUBSTRING(@input, @i, 1) AS int) + @PrevNum IF (@i > 1 AND (SUBSTRING(@input, LEN(@input) - @i, 1) LIKE '[0-9]')) SET @SecondNum = CAST(SUBSTRING(@input, LEN(@input) - @i, 1) AS int) SET @output += CAST(((@CurrentNum + @SecondNum) % 10) AS nvarchar(1)) SET @PrevNum = @CurrentNum END ELSE BEGIN SET @output += SUBSTRING(@input, @i, 1) END SET @i = @i + 1 END RETURN @output END GO -- Two similar numbers produce quite different results: select dbo.fn_ObfuscateTelNum('02072342357') -- 05434650585 select dbo.fn_ObfuscateTelNum('02072345327') -- 05734653885 -- but punctuation of number affects output: select dbo.fn_ObfuscateTelNum('0207 2345327') -- 0573 3653885 select dbo.fn_ObfuscateTelNum(null) -- NULL select dbo.fn_ObfuscateTelNum('+44 207 3248234') -- +66 230 0488037