Error flows in Azure Data Factory
Azure data factory pipelines consist of activities that are controlled for execution precedence and flow by constructs similar to SSIS precedence constraints.
However, unlike SSIS, ADF has no error event that can be given pipeline-wide scope which is raised on any and all errors. Adding an activity to respond to an error results in the error being cleared and the rest of the pipeline continues to run – and the pipeline can report Success on completion even if individual activities within it fail.
One solution is to use a “crowbar” activity (an activity that is guaranteed to fail) to expose an error condition.
The following screenshots illustrate the problem and the crowbar solution, and I try to formulate a rule to explain the error flow behaviour in pipelines.
Practical examples
In the final examples above a final logging or clean-up activity which is set to run on completion, not success, of preceding activities will effectively mask an error from a preceding activity. Although the failure appears in the logs, any subsequent action will run as if no error had occurred.
Crowbar activity
A solution is to “re-throw” the error by capturing the Failure output of each activity and terminate it on a dummy activity. The dummy activity – even if not executed (which it isn’t in the first example below) – creates a second vector which transitively causes the failed activity to fail the pipeline.
(f) The Crowbar activity (e.g. divide 1 by zero) does not execute but “exposes” the error vector. |
Pipeline reports failure |
(g) The crowbar activity only executes, in this configuration, if all the precedent activities fail |
The rule
A pipeline can contain multiple paths from a start activity to a final activity. There are three paths in example (a). The error reporting rule (I propose) is:
A pipeline fails if the last activity to be executed on any path fails.
In example (d) all three paths terminate on the “log outcome” activity – which runs on completion (success or failure) of the previous activities and which itself succeeds. Therefore all three paths end in success, and the pipeline reports success.
In example (f) the additional crowbar activity creates a further three potential execution paths. The error in Activity 1 causes the (red) error path to be followed. There are now four paths being followed in (f). In three of them the last activity to execute is “log outcome”, which succeeds. The fourth path is from Activity 1 to the Crowbar activity. The crowbar activity does not execute, so the last activity to execute in this path is Activity 1 – which failed. Thus the pipeline returns failure.
Why doesn’t the Crowbar activity execute in (f)? Because its precedent constraints restrict it to executing only on failure of all three Activities 1, 2 and 3 (which happens in (g)). But that doesn’t stop the pathway being followed and effectively exposing the failed Activity 1 as a path-final error.