SSISDB getting large – failure of Retention Window Cleanup
The SSIS Catalog Rentention Period (days) defaults to 365. The SSIS database can quickly become very large (as a function of SSIS activity and logging levels).
A stored procedure SSISDB.internal.cleanup_server_retention_window is executed by a job SSIS Server Maintenance Job on a nightly schedule at 00:00:00.
(All defaults observed in 2012, v.11.0.5058.0)
cleanup_server_retention_window
This stored procedure deletes rows from internal.operations table that are older than the retention period. One row in internal.operations represents a package (?) invocation – this table won’t grow too large, but related detail tables can; internal.event_messages, internal.event_message_context, internal.operation_messages, internal.executable_statistics being the largest.
The cleanup stored procedure deletes rows from internal.operations tables. Cascading foreign key constraints delete rows from other tables – some indirectly via internal.executions. This is operational part of the procedure:
WHILE (@rows_affected = @delete_batch_size) BEGIN DELETE TOP (@delete_batch_size) FROM [internal].[operations] WHERE ( [end_time] <= @temp_date OR ([end_time] IS NULL AND [status] = 1 AND [created_time] <= @temp_date )) SET @rows_affected = @@ROWCOUNT END
Procedure continually fails to complete
The default for @delete_batch_size is 1,000 – and herein lies the problem. The deletion of a single internal.operations row will cascade to many thousands of rows. But internal.operations itself is relatively low cardinality. Assembling such a large deletion set is a time consuming, log-expanding and blocking operation which can fail – and fail continually thereafter. Although the deletion is sensibly in a loop, 1000 is a large proportion of internal.operations rows. As can be seen from the 16GB example above there are only 2.8K rows in the table to start with, so the loop would barely exceed a couple of iterations. Naturally all this depends on the activity and retention period of the SSIS instance. But 1,000 seems a lot in any case.
Some posts state that @delete_batch_size is 10, though not in the versions I have checked. If the procedure continually fails the solution is to run a copy the code with a smaller batch size to clear the backlog. I am tempted to reduce the setting in the stored procedure itself – should a backlog of deletions accrue again for whatever reason the deletions will again fail.
The following chart shows the table sizes once the retention period has been reached – in this case, 2.8K rows in internal.operations were reduced to 827. The projected results on the other large tables is significant.