Column Naming Conventions

Column Naming Convention

I like lower case with underscores:  company_idfirst_name, telephone_nbr, booking_ref

This is Uncle Joe Celko’s convention as stated in SQL Programming Style (Morgan Kaufmann, 2005), and I accept his justification that Invoices.invoice_nbr  is more readable than Invoices.InvoiceNbr in a dense melee of Sql code.

However, pragmatically, PascalCase is more popular and I find I use it more commonly from choice.

Id/Key column Names

  • Use the singular version of the table followed by Id / Key / Nbr
  • e.g.  table Customers has column CustomerId .

There are a number of reasons for this:  having a bunch of columns all named ‘Id’ (as one popular approach has it) is never helpful.  Another reason is that it makes it practically impossible to search for references.   Searching for ‘CustomerId’ will find definite references; searching for ‘Id’ in a schema full of Ids  is useless.  The same applies to Name, Date, Cost etc used as column names.

Column Name Suffixes

The suffix is a type-and-content indicator.  Probably all columns with the same suffix will be of the similar data type and size and have similar constraints. This is reasonably easy to assert and check using a script or even by eye in a data dictionary.

Column Name Suffixes
Suffix Type Remarks
_abbrv short char or varchar an abbreviation for the item; normally for display where space is limited.
_flag char(1) Constrained to accept only uppercase Y or N.  (if an undefined value is required use _triflag)
_date any date type date or date and time
_id int numeric surrogate key – often the primary key. based on table name: table Companies has company_id
_name varchar proper name e.g. first_name. Use sparingly. name should never be used on its own for a column!
_key normally short varchar or char if natural key is of fixed length (e.g. ISO country codes) natural (normally textual) key. Normally unique except for versioned tables (i.e. tables with effective and obsolete dates).
_ref short char or varchar short textual reference – e.g. initials; company reference code.  Need not be unique.
_perc decimal(5 4) percentage scale 1.  (type specification allows maximum of 999.99%)
_token GUID a row identifier – e.g. case_guid. used for security in HTML links; etc
_triflag char(1) Constrained to accept only uppercase Y or  N or  U  (= Unknown).

 Alternative Conventions

Some other conventions that I have advocated or endured.

The Table-Prefix (c_name) Convention

This was the first convention I adopted (largely in MSAccess) and it has some attractive qualities.

  • All column names are prefixed by (normally) a single letter plus underscore.  Thus all columns in table tblCompany are prefixed with c_ e.g. c_idc_name.
  • Sometimes it isn’t possible to keep the single letter convention (what do you do once you have tblProperty, tblPerson, tblProduct?), so two or even three letter prefixes are acceptable: pn_FirstName etc.
  • Foreign key columns are named differently to the column they relate to.  E.g. tblPerson.pn_CompanyId is foreign key to tblCompany.c_Id.


  1. Every column has a unique name.  If you don’t use fully qualified column names, adding a column can break a view if the column name already exists in one of the other tables referenced – or if someone has used Select *.
  2. It’s easy to see which table you’re referring to if you don’t use fully qualified column names:  c_id is the Company ID and p_id is the Product ID.
  3. You rarely have to alias similarly named columns:  pn_address is distinct from c_address.

It works satisfactorily in MS Access and is quick to use.   Working with recordsets in code it is clear which column is being referenced – you never see an anonymous rst!Address.  Because no aliasing is necessary there is no risk of inconsistent aliases: the column name you see in the code is the name it has in the table.

I wouldn’t choose this convention for a new database; however it has caused no problems in any database where I have used it.  With the advent of Sql Data Tools it is now possible to enforce fully qualified column names and prohibit Select * so database-unique column names are less important.  Intellisense in SSDT and SSMS makes full names (with table aliases) easier to write.  Neither of these are available for Sql in Access.

Prefixing keys with PK_ and FK_

I don’t think this helps.  The primary key is indicated in metadata. The naming convention is that it’s the singularised name of the table with a suffix. Customers.customer_id.  The foreign key is again indicated in metadata.  I don’t think confusing them is common.

I think it has disadvantages

  • It burns column properties into the column. Foreign keys might be created or dropped; the table name is unlikely to change.
  • When including the column in a view the output will be FK_customer_id or PK_customer_id – which you might be tempted to alias.  It’s not relevant to the output that the column is a FK or PK.  In fact, PK_customer_id would be confusing if it were not the PK of the rowset being returned.
  • Many to Many tables:  columns can be both PK and FK at the same time.  The conention just confuses here.
  • It obscures the fact that these columns represent the same attribute.

ORM-Generated Tables with Standard Column Names

I have worked with databases where every lookup table had exactly the same column names: twenty plus tables with the columns NameDescriptionKeyEffectiveDate, and so on.  (It was generated using ORM modelling in Visio for Enterprise Architects.  Which of us wouldn’t be flattered by the name of that program?  Unfortunately it put visual effects above good design – another bete noir; thankfully Microsoft dropped Visio for EA after 2005 and put its energies into the grown-up SSDT, which has no visuals at all!)

In addition to the lookup tables, most fact tables in the database also had a Name and Description column.  Almost every Select from more than one table would require the designer to alias columns  (or forget to – Sql Server  is happy to output two columns with the same name; applications are rarely happy to receive them!).

It seems obvious that there will be less confusion if a column name seen by the application is the same as in the table, and if a column always has the same name wherever it appears.    If a column regularly has to be aliased there is a high risk that different aliases will be used.  Why even raise the possibility?

  • Give a unique name to all columns that are likely to appear together in a Select statement.
  • Never use Name as a column name!  (there are a number of other prohibited names).

I’m happy for multiple tables to have effective_date and obsolete_date, for example, because these will normally only be used in where clauses, where full names will be enforced; they won’t often be returned for multiple tables in a view.

Naturally enough it was this database that suffered an ambiguous column reference following the addition of a column to a table.  This could never have happened with the Table-Prefix convention: perhaps that is a good reason to use it, in spite of the aversion to its Access appearance!


Making Email Address Columns Behave

It’s a familiar situation: your email address columns contain phone numbers, important notes, credit card details…  It’s not a problem until you want to use the addresses in bulk – such as for a mailshot – and by that time it is extremely costly to deal with.

The user interface should validate at the time of entry, of course, but we need a backstop validation at database level: just strict enough to stop completely off-the-ball values being entered.  Format validation, in other words.  Non-existent domains and typos are not our concern, and don’t cause us too much trouble (that’s for a different area of data quality); but try sending a telephone number to a mail transport and it will probably raise an exception.

Here is a specification.

Email Column Specification

 email_address nvarchar(254) NULL

Email Address Validation at database layer


  • No attempt to check that the valid address exists: that’s a job for higher tiers.
  • Don’t be too strict.  SQL is not the best language for writing the complex validator that email addresses seem to need.  It would be embarrassing to reject an address that a carefully crafted UI validator has allowed through.
Database Validation rules for Email
Rule Remarks
At least 1 @ symbol must be at least one @ symbol.  Further @s are allowed in local part if quoted  – "local@part"
Allow ! # $ % & ' * + – / = ? ^ _ ` { | } ~  . (dot)
Allow "():;<>@[\] comma
At least one other character before and after the @ symbol There could be more than one @ symbol.  Maybe not worth putting too much energy into handling this very rare case.
Minimum 5 characters overall a@com is possible.
All other characters prohibited

That would satisfy me at the database layer.  The UI is the place to make more stringent checks – when there is an opportunity to correct invalid entries.  Many invalid email addresses can get by these validations but (in a cooperative environment) they are likely to be typos: it is better to preserve the slight mistakes and submit them to correction by post-editing if desired.   Of course, if we were selling addresses to a third party we would want stricter validation, but for that it might be better to purchase a tool or use a bureau service.

UDF : FUNC_IsValidEmailAddress

This function is fairly lenient. Few characters are refused in accordance with the broad specification (£, ¬, ¦ are refused).  Therefore it should be reasonably straightforward to synchronise with higher tiers’ validation.  Nevertheless it will still be difficult to slip a textual comment or telephone number into the column.

The string operations will make this a slow function.  OK for an individual inserts and updates; maybe a problem for batches.

CREATE FUNCTION [dbo].[FUNC_IsValidEmailAddress]
    Performs minimal database-level validation on email addresses.

    Returns Y if email_address meets following criteria, else N.
		- contains @ with at least one character before and three characters after (i.e. 5 chars minimum).
		- all characters must be in this string (case insensitive; includes space):
			'abcdefghijklmnopqrstuvwxyz01234567890@ !#$%&''*+–/=?^_`{|}~."():;<>@[\],'

	NULL email_address returns Y.  

    @email_address VARCHAR(254)
    DECLARE @success_flag CHAR(1) = 'Y'
    DECLARE @allowed_chars VARCHAR(255)
    DECLARE @searched_char VARCHAR(3)
    DECLARE @email_length SMALLINT
    DECLARE @position SMALLINT

    SET @allowed_chars = 'abcdefghijklmnopqrstuvwxyz01234567890@ !#$%&''*+-/=?^_`{|}~."():;<>@[\],'

	IF (@email_address IS NULL) RETURN 'Y'

    SELECT @email_length = LEN(@email_address)
        ,@position = 0

    IF (@email_address NOT LIKE '%_@___%')
        SET @success_flag = 'N'

    WHILE @position < @email_length
        AND @success_flag = 'Y'
        SET @position = @position + 1
        SET @searched_char = SUBSTRING(@email_address, @position, 1)

        SELECT @searched_char = CASE @searched_char
                WHEN '_'
                    THEN '[_]'
                WHEN '%'
                    THEN '[%]'
				WHEN '['
					THEN '[[]'

                ELSE @searched_char

        IF ( NOT @allowed_chars LIKE '%' + @searched_char + '%')
            SET @success_flag = 'N'

    RETURN @success_flag


Script should return the values indicated.

-- Tests
print 'Test dbo.FUNC_IsValidEmailAddress().'
print 'NULl.  (Should return Y)'
print dbo.FUNC_IsValidEmailAddress(null)
print 'Missing @ symbol, or @ only at start or end  (N)'
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('no.symbol')
print 'Invalid characters: £ or ¦  (N)'
print dbo.FUNC_IsValidEmailAddress('test@£')
print dbo.FUNC_IsValidEmailAddress('andrew¦')
print 'Telephone number or comment!  (N)'
print dbo.FUNC_IsValidEmailAddress('Michelle on 01233 12333444')
print dbo.FUNC_IsValidEmailAddress('Check this record!!')
print 'But we don''t catch an email address plus a comment   (Y)'
print dbo.FUNC_IsValidEmailAddress(' Check before emailing.')
print 'Valid examples from Wikipedia article  (Y)'
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('user@[IPv6:2001:db8:1ff::a0b:dbd0]')
print dbo.FUNC_IsValidEmailAddress('"much.more unusual"')
print dbo.FUNC_IsValidEmailAddress('""')
print dbo.FUNC_IsValidEmailAddress('"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"')
print dbo.FUNC_IsValidEmailAddress('postbox@com')
print dbo.FUNC_IsValidEmailAddress('admin@mailserver1')
print dbo.FUNC_IsValidEmailAddress('!#$%&''*+-/=?^_`{}|')
print dbo.FUNC_IsValidEmailAddress('" "')
print 'Not accepting unicode yet.  (N)'
print dbo.FUNC_IsValidEmailAddress('üñîçøðé ')
print 'Invalid examples from Wikipedia article (N)'
print dbo.FUNC_IsValidEmailAddress('')
print 'Invalid examples that we don''t intend to catch with this lenient test  (Y)'
print dbo.FUNC_IsValidEmailAddress('')
print dbo.FUNC_IsValidEmailAddress('a"b(c)d,e:f;g<h>i[j\k] ')
print dbo.FUNC_IsValidEmailAddress('just"not"')
print dbo.FUNC_IsValidEmailAddress('this is"not\ ')
print dbo.FUNC_IsValidEmailAddress('this\ still\"not\\')



It’s not very fast.  Test:  insert 10,000 rows into A a 3-column table with no constraint on the email_address column;  B a similar table, but with a constraint on the email_address column.

  • A: 10,000 row insert, no constraint: CPU time = 15 ms,  elapsed time = 30 ms.
  • B: 10,000 row insert, constraint on email column: CPU time = 1139 ms,  elapsed time = 1214 ms.

An increase of  4,000%, mostly CPU time as you would expect – there is no additional IO.

Let’s break down the function to see what is taking the time.  Start by commenting all in the function except the length check:

1.  Length check only  310% increase:

  • A: CPU time = 32 ms,  elapsed time = 37 ms.
  • B: CPU time = 124 ms,  elapsed time = 116 ms.

2.  “email_address LIKE ‘%_@_%'”  check only  590% increase:

  • A: CPU time = 32 ms,  elapsed time = 30 ms.
  • B: CPU time = 156 ms,  elapsed time = 178 ms.

3.  valid character check only  4,000% increase:

  • A:  CPU time = 16 ms,  elapsed time = 30 ms.
  • B: CPU time = 1138 ms,  elapsed time = 1203 ms.

4. All checks.

  • A: CPU time = 15 ms,  elapsed time = 30 ms.
  • B: CPU time = 1139 ms,  elapsed time = 1214 ms.

Clearly we need an optimisation of that expensive allowed-character check.   Using CLR would give us better string manipulation and regular expressions.


  • Speed  I’d be worried about the effect on bulk inserts or updates  (bulk updates that don’t touch the validated column won’t be affected).  In normal circumstances the customers table is not affected in bulk all that often so it might not be a concern.
  • That string of allowed characters (@allowed_chars) is hidden in the function.  It should be centralised and reportable.  It could be wrapped within another function (func_email_address_allowed_chars()).  Other options are table row (not tempting), or database property (how lightweight is that?)


Error Handler for T-SQL

This is the error handler created E. Sommarskog, with some slight alterations by me.  See

ERROR_HandleCaughtError stored procedure

This is  ES’s error handler in a form that suits me: I usually want to add some additional information to aid debugging – e.g. the values of parameters.

  • I seem to be thinking of an “error” schema with the prefix I’ve given it.  Probably dbo is the appropriate location (since little else would ideally end up there).
  • I think I have solved the question of whether the LOG option of RAISERROR is always allowed; I’ll update this post.
  • I’ve gone with ES’s 5555 return value.   Mostly I don’t use the return value, but I’ve yet to form a complete stance on them.  I wouldn’t trust them to indicate errors, largely because of the ambivalence with which they are treated.
  • I am no longer certain why I added Execute As Caller.  It might be important if I were capturing the login running the procedure (and indeed, that might be what appears in the LOG).  On the other hand, it might expose the procedure to the vagaries of Caller’s permissions.  Again, I will confirm and update the post.
CREATE PROCEDURE [dbo].[ERROR_HandleCaughtError] 

        @DebugInfo varchar(2000) = NULL


        Reports details of an error to user.
        Logs error to SQL Server Log.

                @DebugInfo varchar(200) - optional.  Additional 
                                          debug information to append to error message.

		2014-08-19 includes database name.

DECLARE @errmsg   nvarchar(2048),
        @severity tinyint,
        @state    tinyint,
        @errno    int,
        @proc     sysname,
        @lineno   int,
		@dbname   varchar(50)

SELECT @errmsg = error_message(), @severity = error_severity(),  
       @state  = error_state(), @errno = error_number(),
       @proc   = error_procedure(), @lineno = error_line(),
	   @dbname = db_name()

IF @errmsg NOT LIKE N'***%'                                         
                -- This is the first time the error has been handled in the stack.
                SELECT @errmsg = '*** Proc. ' + @dbname + '..' + coalesce(quotename(@proc), '<dynamic SQL>') + 
                                        ', Ln. ' + ltrim(str(@lineno)) + '. Err. ' + 
                                        ltrim(str(@errno)) + ': ' + @errmsg +
                                        ISNULL(' [' + @DebugInfo + ']', '')
                RAISERROR(@errmsg, @severity, @state) WITH LOG
          -- The error information has already been concatenated; rethrow error. No logging required.
           RAISERROR(@errmsg, @severity, @state)


Implementing the Error Handler

Here is a pattern showing the error handler in use in an outline stored procedure.  Copy and Paste.

CREATE PROCEDURE schema.StoredProcedure_Action

                This is what it does.



                0/0/0000  Initials : created        


DECLARE @DebugInfo varchar(2000)


        SELECT @DebugInfo = 'int_parameter=' + ISNULL(CONVERT(varchar(8), @int_parameter), 'NULL') +
                           ' date_parameter=' + ISNULL(CONVERT(varchar(20), @date_parameter, 120), 'NULL');

        -- Raise errors for any parameters that must not be Null.
        IF (@int_parameter IS NULL) RAISERROR('@int_parameter is required', 16, 1)
        -- Procedure Actions Here --

        EXECUTE dbo.ERROR_HandleCaughtError @DebugInfo
        RETURN 5555


All exceptions are re-thrown and take the error number of 50000.   Sometimes I would prefer to raise more specific errors to help the application using the procedure – e.g. differentiating between invalid parameters (which could be corrected by the application) and invalid data like a missing lookup value (which couldn’t).     Custom error numbers have to be added to sys.messages: it allows you to specify a custom formatted error message (good), but the custom entries would apply to all databases on the instance, and look like a feature that just hasn’t caught on.  What would happen if you loaded a third party database that defined its own conflicting custom errors?


1. Aug 2014.   Knowing the database that the error came from is undeniably useful when trawling logs.  Added to this version.  I haven’t yet found a way of returning the schema of the error procedure.


Syntax Formatting for blogs

I imagine that it’s a well trodden path seeking out a satisfactory code or syntax highlighter for wordpress.

You can, just about, paste colour-formatted code from some editors into the wp text editor  (e.g. SSMS to Outlook Email to WordPress), but it creates a mess of the HTML.  Not pretty at all.

Current preference:

  • Crayon Syntax Highlighter.   Nice to use;  line numbers; expansion to new window; code-block size limits.  The main drawback seems to be that the colouring is rather random.


Conversion to INT failed where no conversion expected

I have twice now been wrong-footed by this SQL Server error message:

“Microsoft SQL Server Native Client 10.0” Hresult: 0x80040E07 Description: “Conversion failed when converting the varchar value ‘abcd123’ to data type int.”.

The statement causing this distress was:

ISNULL(View1.long_nameView1.short_nameAS output_column

Reasonably, ISNULL will expect both arguments to be the same type will attempt conversion if they aren’t.   But in this example both columns long_name and short_name are the same: varchar – aren’t they?

Well, no they aren’t actually.   View1 was hiding some changes that were made to the source table:  long_name was dropped, and to keep things working View1 merely aliased NULL as long_name.     With no other clue, SQL Server interprets the literal NULL as type int, hence the attempted conversion.

T-SQL Pattern: No un-cast literals in views

It’s necessary for numeric literals, fairly common-sense for string literals, but less natural for literal NULL to be cast to the appropriate data type – though a moment’s thought makes it clear why it must.

           NULL as short_name,  -- wrong : what data type will it be? 
           CAST(NULL AS varchar(50)) as long_name  -- the correct pattern

The first line should stand out as unacceptably vague about the data type.