Manage duplicate records using SQL

Semarchy xDM supports duplicates management using SQL to perform programmatically actions typically available to users in applications' duplicate managers.

This document provides the methods to perform various duplicates management operations using SQL with the UM table.

All these methods use a Load ID (in a vLoadID variable), which is initialized and submitted using a process similar to the one used when publishing data.

Make sure to use a job that processes the entities for which the match and merge will run, as well as the related (child) entities.
You can also use continuous loads to perform the duplicate management operations.

Confirm matches

This method automatically confirms programmatically certain matches. It reproduces the behavior of the user interface.

Columns to load into the UM table:

  • B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, or with the ID of the Continuous Load. Use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID from the continuous load name.

  • B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to confirm.

  • B_CLASSNAME: Load this column with the name of the entity of these records.

  • <GoldenID>: Load this column with the ID of the golden record to which the master record is attached to.

  • B_CONFIRMEDSDPK: Load this column with the same value as <GoldenID>.

Example 1. Confirm all unconfirmed master records
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	CUSTOMER_ID,
	CUSTOMER_ID FROM MD_CUSTOMER
WHERE
  B_CONFIRMATIONSTATUS = 'NOT_CONFIRMED';

Force a new record to join a golden record

When loading a source record, use this method to load at the same time a user decision forcing this record to:

  • Either join a golden record with a known ID,

  • Or create a new golden record with this ID.

To perform such an operation, combine data loading with the previous technique.

Example 2. Load a new customer record and force it to merge with the existing golden record with CUSTOMER_ID 2
/* Load source record */

INSERT into SD_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  -- Data fields are below
  CUSTOMER_NAME,
  ...

) values (
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  -- Values are below
  'GADGETRON',
  ...

)

/* Force a merge of this source record with a known golden ID */

INSERT INTO UM_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  CUSTOMER_ID,
  B_CONFIRMEDSDPK
)
SELECT
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  2,   -- Target Golden ID
  2    -- Confirm the golden ID FROM MD_CUSTOMER;

COMMIT;

Move master and golden records to golden records

This method automatically moves master records to golden records. It reproduces the behavior of the user interface for moving masters into golden records, or for merging golden records into golden records

Columns to load into the UM table:

  • B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, of with the ID of the Continuous Load.

  • B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to move.

  • B_CLASSNAME: Load this column with the name of the entity of these records.

  • <GoldenID>: Load this column with the ID of the golden record to which the master record should be moved to.

  • B_CONFIRMEDSDPK: Set this column to the <GoldenID> value if you want to flag the match as confirmed

  • B_XGRP: Load this column with a UUID value, representing the exclusion group. This UUID value should be same for all the records that belong to the same exclusion group, that is records that previously matched and that should no longer match. The value to load in this column depends on the exclusion groups of the master and golden records:

    • If neither the master record being moved nor the target golden record is part of an exclusion group, this value should be null.

    • If only the target golden record has an exclusion group, this value should be the UUID of this exclusion group.

    • If only the master record being moved has an exclusion group, this value should be the UUID of this exclusion group.

    • If the master and golden have different exclusion groups, then both these exclusion groups need to be merged in a new exclusion group. This value should be a new UUID, generated for example using the SYS_GUID() Oracle function. In addition, all the records from the two original exclusion groups should be loaded into the UM table with this new exclusion group.

Example 3. Merging the golden record with CUSTOMER_ID 1 into the golden record with CUSTOMER_ID 2 by moving all its master records
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	2,          -- Target golden ID
	2,          -- Confirm the golden ID
	NULL        -- Neither master nor golden are in an exclusion group
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 1;

Similarly, to split a golden record by moving its master records to new golden records, you do not provide an existing golden ID but seed a new golden ID using the golden ID generation method for the entity.

Example 4. Splitting all the master records from a golden record with CUSTOMER_ID 5 into different golden records
-- Store a new SYS_GUID value in a raw (16) variable which is used to create a new exclusion group.
select SYS_GUID() into vSysguidXGrp from dual;

-- Split the master records using this new exclusion group.

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	SEQ_CUSTOMER.NEXTVAL, -- New golden ID generated by a sequence.
	SEQ_CUSTOMER.CURRVAL, -- The same golden ID is used to confirm.
	vSysguidXGrp            -- New exclusion group ID.
FROM MD_CUSTOMER WHERE
  CUSTOMER_ID = 5;

Reset user decisions

This method gets rid of user decisions previously taken with duplicate managers, forcing the master records to merge according to the matcher’s decisions.

Columns to load into the UM table:

  • B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, or with the ID of the Continuous Load.

  • B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to reconsider for the matching.

  • B_CLASSNAME: Load this column with the name of the entity of these records.

  • This operation is not possible in the data hub application user interface.

  • All user decisions are lost. Exclusion groups and match groups are recomputed for these records.

  • All golden IDs are regenerated in the process.

Example 5. Resetting all user decisions
INSERT INTO UM_CUSTOMER (
    B_LOADID,
    B_SOURCEID,
    B_PUBID,
    B_CLASSNAME
)
SELECT
    vLoad_id,
    B_SOURCEID,
    B_PUBID,
    'Customer'
FROM MD_CUSTOMER;

Replay matching

The previous method does not force the match rules to run again. It only removes user decisions. To force the entire matching process to replay, you must reset of the user decisions, as described in the previous method, and reload the data to re-match into the SD table.

Example 6. Replaying the matching process
-- Remove User Decisions

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer'
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

-- Re-load the data from the latest master values.
INSERT into SD_CUSTOMER (
    B_LOADID,
    B_PUBID,
    B_SOURCEID,
    B_CLASSNAME,
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
)
SELECT
	vLoad_id,
	B_PUBID,
	B_SOURCEID
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

COMMIT;