Preferring NULL as “no value” Indicator
Some practical reasons why NULL should be used as a no-value indicator in a column.
These examples arise from a situation where a column will contain a unique value eventually . Until the value is known the column will have a “no value” indicator. I would use NULL. Others may use
- ZLS
- 0
- 1900-01-01
- 00000000-0000-0000-0000-000000000000
Indexing
We must use a filtered index, excluding the “no value” indicator, which may occur multiple times. I feel that an index filtered on “WHERE col IS NOT NULL” will still be used for equi joins, whereas indexes filtered “WHERE col <> ‘00000000-0000-0000-0000-000000000000′” may not.
- sp_executesql ‘… WHERE col = @x’, ‘@x uniqueidentifier’, ‘9B534305-16B2-4DE5-B063-A83B90B87C62’ does not use the index filtered on “WHERE col <> ‘00000000-0000-0000-0000-000000000000’
- sp_executesql ‘… WHERE col = @x AND col <> ”000…000”’, ‘@x uniqueidentifier’, ‘9B534305-16B2-4DE5-B063-A83B90B87C62’ does use the index.
My contention, to be confirmed, is that an index filtered on “IS NOT NULL” would be used because no equality match could be made against the excluded values (i.e. NULL).
Prepared Queries
However the real problem here is that the prepared query plan is ruling out the filtered index. Notice that an ad-hoc query can use the filtered index.
- “SELECT … FROM table WHERE col = ‘9B534305-16B2-4DE5-B063-A83B90B87C62’ ” does use the index filtered on WHERE col <> ‘00000000-0000-0000-0000-000000000000’
This is because the ad-hoc query is planned against the literal value in the where query. This is not a value excluded by the filter on the index, so the index can be used.
The prepared query plan generated by sp_executesql will be optimised for “col = @x” – a non-specific value. Because @x could be the “no-value” ‘00000000-0000-0000-0000-000000000000’ the optimiser has to exclude use of the filtered index.
My assertion is that a filtered index excluding only NULL could still be used in the analogous situation. (at least with the ANSI NULL option). Is this true, though?
Statistics
If this is a unique value then the statistics should reflect this distribution of values. Thus if searching on a single value an index seek will always be indicated. But the existence of a possibly large number of 00000000-0000-0000-0000-000000000000 values means that for at least one case a table scan might be better: even though it is not a search the application would make. This adds a query plan option that ideally would never be considered (scan).
Unwanted Joins
Since these “no value” values are intended to represent a lack of information there is no wish (in the designer’s mind) for them to join to any other value. However, since they are valid values, they will join with undesired results.
Some Workarounds
One solution has been indicated above – add a WHERE clause that matches the filter on the index (in the literal text of the query if using sp_executesql).
However that is not an (easy) option for generated code (e.g. entity framework). In the real case behind this example creating a non-unique index for this column did enable all queries to use an index lookup – with a considerable improvement in performance since the table was large. Unfortunately this means that the table has two indexes on the same column – one to enforce uniqueness (where the value is not 00000000-0000-0000-0000-000000000000) and the other for use in lookups on this column.
The correct solution is to refactor this column to replace 00000000-0000-0000-0000-000000000000 with NULL for “no-value” entries.