Configure data purge
Data purging helps maintain a reasonable storage volume for the data location and the repository by pruning the history of data changes and job logs.
Introduction to data purge
The data location stores the lineage and history of the certified golden data—that is, the data that led to the current state of the golden data.
Preserving the lineage and history is a master data governance requirement. It is key in a regulatory compliance focus. However, keeping this information may also create a large volume of data in the hub storage.
To make sure lineage and history are preserved according to the data governance and compliance requirements, model designers will want to define a data retention policy for the model.
When a model is deployed to a data location, a purge job is automatically created to handle data pruning according to the retention policy. The purge job prunes lineage and history data according to the retention policy. Optionally, it also prunes job logs, batches, loads, direct authoring, duplicate manager, and workflow instances when all the associated data has been purged.
To keep a reasonable volume of information, data location managers must schedule regular executions of this job.
Configure a purge schedule
To create a purge schedule:
-
In the Management view, expand the Data Locations node.
-
Expand the data location for which you want to configure a purge.
-
Double-click the Purge node. The Purge Schedule editor opens.
-
Select or clear the Active checkbox to make the purge schedule active or inactive.
-
Click the
Edit button, and set the schedule for the purge with a purge frequency (monthly, weekly, or daily) or using a cron expression. -
Click OK to save the schedule.
-
Select the Purge Repository Artifacts option to prune the job logs, batches, loads, direct authoring, duplicate manager, and workflow instances when all their data is purged.
Repository artifacts are retained according to the longest retention policy of the entities associated with them. If one associated entity is configured with a Forever retention policy, the related repository artifacts are not purged, even when other associated entities have expired. -
Press Control+S (Command+S on macOS) to save the editor.
| Regardless of the frequency of the purges scheduled by the data location manager, the data history retained is as defined by the model designer in the data retention policies. |
| Enabling Purge Repository Artifacts does not purge repository artifacts independently for each expired entity. When a repository artifact is associated with multiple entities, it is purged only when all associated data is eligible for purge. |