Intransigent Data

Data, eh? Never where you left it, loves to mix with friends, able to appear in many places at once, can increase or diminish at will; what can you do with it?  It needs a firm hand.

After a long period of trying every mistake in the book I have at last come to some conclusions about how I think data should be treated – by which I mean how one should design databases and control the quality of the data that goes in and out of them.   I am now prepared to be dogmatic on these subjects (there’s no use shilly-shallying around these questions – give data an inch and it will take an undefined amount).  Dogmatism leads fairly naturally to blogging – hence this site.

It’s customary to set out the ideological baggage one brings to the discussion.  Here are some likes and dislikes.

My Idees Fixes

  • Error Handling   “Why do you spend all that time working on error handling, Andrew?”   — “Because it gives me so much more time for everything else.”   Planning for what can go wrong is a good way to start.
  • Naming   Taking ten minutes to decide on a name for something is time well spent.   If you can’t find a convincing name for something, perhaps it shouldn’t exist (the ontological argument).
  • Conventions  It doesn’t matter what they are, so long as you have them.  You can pick and choose them from various sources but you need to record them somewhere: this is a reasonable place.
  • Elegance    Chiefly, visual clarity.  Humans aren’t natural code readers (it’s not a natural language) – we need all the help we can get.  One strength we do have is an ability to see patterns and appreciate form to an aesthetic degree.  We can probably spot a poorly constructed brick wall or badly type-set book with a quick glance.  I think that  code should equally revealing of its shortcomings.
  • Documentation of course; but of the right sort.  Put away Word; don’t even think about Sharepoint.  I like to write what things are on the things themselves, but you also need somewhere to record the outcome of discussions and arguments since these are always forgotten.  I’ve yet to find a better place than the Wiki.

My Betes Noir

  • Technology  Frankly, I’m not a great fan of it.  Of course, I’m as delirious as anyone about the latest Merge and  Windowing functions.  But they’re not irreplaceable, and they don’t address the problem that you and I have.  The problem that we have is poor quality data lurking in badly constructed and named objects whose purpose in many cases has been forgotten.  Upgrading won’t  fix this.  You might get a little value from the latest features, but it’ll be as nothing compared to what you gain from tackling the – difficult – data problems.  I’d far rather have a Sql 2000 database that is well designed than a mess in Sql 2012.
  • Root and Branch or Scorched Earth approach  “This database is badly designed; let’s create a new one properly from scratch.”    The following axioms show why this is impossible.
    • all databases are badly designed .
    • no database remains static long enough for a functional successor to be built.
    • big-bang updates are the hardest.

Starting from scratch is hugely tempting – the term greenfield is in vogue – but it’s wasteful of resources and is very likely to fail.  Wasteful because the bad old database you want to discard is the fruit of many man-hours’ work.  Even if the chance of failure is just fifty percent you’re still taking an even bet that at the end of it all you’ll have spent all that energy and still only have the bad old database.

But the saddest aspect is that it isn’t even necessary: databases are mutable.  They  have a transformational grammar that can change any Database A into any Database B.  This column is of the wrong type and in the wrong table – just write a transformation to change it!  The data in this table should really be in three  – just write a script  to create the tables and move the data.  Those changes have invalidated a lot of other objects – then write a script to find the objects and alter them, or create a backwardly-compatible alternative view.  It is not a trivial operation, but it is scriptable, testable, and can be broken into manageable chunks: you needn’t do more than you’re comfortable with, and there doesn’t have to be a big bang.

The Toolkit

The instruments of subjection that I use on data include:

  • Microsoft Access  “it gets the job done”
  • Sql Server “its mother loves it”
  • SSIS (Sql Server Integration Services) “takes precisely no prisoners”  Invaluable for straightening out one’s (and other peoples’) earlier data mistakes.
  • SSRS (Sql Server Reporting Services) “not even its mother loves it”
  • SSAS (Sql Server Analysis Services)
  • SSDT (Sql Server Data Tools – I think)  a Visual Studio design environment for databases – you write pure T-SQL; the best way.
  • dbatools powershell module
  • Sentry One Plan Explorer
  • OpenWikiNG “greatly missed”
  • Codecharge Studio.  I’ve used it since 1999, and it’s probably done me a little harm in that time.
  • Greenshot 
  • ASCII text generator (ideal for script headers)  : http://www.patorjk.com/software/taag  Use one of the Big Money variants, near top of list.