Duplicate managers

Overview

A Duplicate Manager defines the user interface into which data stewards review, merge, split groups of matching records, and override the values of the golden consolidated records.

A duplicate manager is used when the user triggers a duplicate management action with a selection of records. Actions include:

  • Review and Confirm Duplicates to confirm groups of fuzzy matching records after reviewing these groups into details.

  • Merge or Split Duplicates to reorganize groups of fuzzy matching records.

  • Review Duplicates Suggestions to accept, reject or reorganized suggestions made on groups of fuzzy matching records.

In a duplicate manager:

  • Actions on match groups and suggestions are performed using two alternate views:

    • A Graph view, into which master records, golden records, and suggestions appear as graph nodes. This view helps data stewards understanding groups/suggestions as well as the match rules that relate master records. The content of the nodes is defined by a Display Card.

    • A Table view, into which master records, golden records, and suggestions appear as rows in a tree table. This view helps data stewards understanding groups/suggestions, even for very large groups and suggestions, but relationships between master records (match rules) are not exposed. The appearance of the table is defined by a Table View Collection.

  • The user may select and add records to the duplicate management operation. He selects these records from a Collection, optionally sorted and filtered according to search configurations.

  • The user can visualize the details of a record as defined in a selected Form Tab.

  • The same form tab is also used in the Explain Record view that shows the master records values that consolidate to compose a golden record.

  • The form tab is also used as an authoring form to override the values of the consolidated golden record.

Duplicate managers work at the entity level, and do not necessarily affect the child records referencing those modified by the duplicate manager:

  • When merging records in a duplicate manager, the child records can also automatically be merged, if their matching rule uses the ID of the referenced golden record, and if the job executed after the duplicate manager takes the child entity into account.

  • When splitting a match group into multiple golden, only the records in the entity managed by the duplicate manager are affected by the split. To propagate the split to child records, the reference linking the child record to the records being split must be configured with Split Duplicates Propagation set to Reset Matching, and the job executed after the duplicates manager must process the child entity.

A duplicate manager provides features that are available to the user depending on the configuration of the actions in the Action sets that use this duplicates manager.

Create duplicate managers

To create a duplicate manager:

  1. Right-click the Duplicate Managers node under an entity and select Add Duplicate Manager. The Create New Duplicate Manager wizard opens.

  2. In the Create New Duplicate Manager wizard, enter the following values:

    • Name: Internal name of the object.

    • Label: User-friendly label for this object.

    • On Finish Job: Normally this should be left empty. Semarchy xDM automatically generates a job for you. In some advanced cases, you may need the ability to specify a particular job to execute. In these cases, you may select the job to execute when the user completes the duplicate manager and submits the changes.

    • Collection: Select the collection used to select records to add to the duplicate manager.

    • Display Card: Select the display card used to represent the records in the graph.

    • Form Tab: Select the form tab used to show or edit values of a record.

  3. Click Finish to close the wizard. The Duplicate Manager editor opens.

  4. Configure how value overrides take place for golden records:

    • Enable Master Value Picking: This option enables users overriding a consolidated value to select the override from the master records in addition to being able to enter their own values.

  5. By default, only the Graph view is enabled for managing duplicates. If you want to enable the Table view, select a Table View Collection in the Display Section. Note that the selected collection should have the Allow Table and the Display Card Column options selected.

  6. Configure in the Golden Record Selection section how records are selected and added to the duplicate manager:

    • Collection: Select the collection used to show the list of records.

    • Search Configuration: Click the Edit button to select and order search methods available to filter the list of records. The user will be able to filter the records using the selected search methods before seeing them in the collection.

    • Select Customize Sort and enter a SemQL Sort Expression to sort the records in the collection. You can also enable User-Defined Sort for this collection.

    • Select the available display type for the collection using the Allow Table, Allow List and Allow Grid properties, and select one of these as the Default Display Type.