Welcome to Semarchy xDM.
This guide contains information about publishing and consuming data in an MDM hub generated by Semarchy xDM.

Preface

Audience

This document is intended for integration architects and developers setting up an MDM hub as part of their enterprise integration architecture.

If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials.
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Overview

Using this guide, you will:

  • Understand the Integration Component and Jobs of Semarchy xDM.

  • Learn how to publish data into an MDM hub through an external load.

  • Learn how to consume data from an MDM hub.

  • Learn how to interact with applications and workflows programmatically.

Introduction to Integration

What is Semarchy xDM?

Semarchy xDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.

Semarchy xDM is based on a coherent set of features for all Master Data Management projects.

Integration Capabilities

Semarchy xDM certifies golden data from source applications’ data and allows applications to consume this golden data.

Integration with Semarchy xDM is performed in several ways:

  • Publishing source data in an MDM hub deployed by Semarchy xDM, and have the hub certify golden records from this data.

  • Consuming golden or master data from the MDM hub.

  • Interacting with Workflows Instances to manage data.

Integration Methods

Semarchy xDM provides several integration services, including:

  • REST API, accessible from the Semarchy xDM application.

  • SQL Interfaces, that is a set of functions stored in the repository schema, plus a set of tables stored in the Data Location Schema.

Using these components, applications and middleware products can consume and publish data in the hub.

Integration Concepts

Publishers and Consumers

In the context of integration, we refer to applications publishing source data into the hub as Publishers.
We also refer to application consuming golden data from the hub as Consumers.

These roles are not exclusive. An application can publish data into the MDM hub and consume certified golden data from the hub to update its records.

For example, a business intelligence application is typically a consumer-only as it consumes only golden records for reporting purposes. An operational application may publish its contact information to the MDM hub for certification and update its contact information with the certified golden records.

Publishers

This term refers to the original application from which the data originates, and not necessarily to the middleware tool or user-designed program actually doing the data movement. The publisher is typically an operational application such as a Customer Relationship Management (CRM), a Product Lifecycle Management (PLM) or an Enterprise Resource Planner (ERP).

A middleware tool may be an Extract-Transform-Load (ETL), Enterprise Service Bus (ESB) or any other data integration product. It may also be a user-designed program or script implemented in SQL, Java, etc. The middleware tool or user-designed program communicates with the publisher to extract data and communicates with the MDM Hub using the publishing methods described in this guide to load this data.

Consumers

Similarly to the publisher, this term refers to the applications consuming data from the hub. This consumption usually takes place via a middleware tool or a user-designed program, and uses the consumption methods described in this guide.

For the rest of this guide, we will refer to the middleware tool or user-defined product used for publishing or consuming data as the middleware.

Data Consumption

Data consumption is available via the REST API or SQL through the tables stored in the Data Location Schema.

Various views on the data can be used for consumption, including:

  • Golden Data enriched, standardized, validated, de-duplicated and certified in the hub.

  • Master Data pushed in the hub by source systems.

  • Errors raised by records pushed in the hub when violating the data quality rules defined in the model.

Data Consumption is always done on a specific Data Location.

Data Publishing

Publishing Concepts

Publishing source data for certification into golden data is performed as a transaction. Such publisher transaction is called an External Load. It is a Semarchy xDM transaction identified by a sequential Load ID.

An External Load represents a source data load transaction.

When an External Load is submitted with an Integration Job Name, a Batch - identified by a Batch ID - is created, and the Job starts processing the data published in this load.

A Batch represents a transaction certifying loaded data and writing in the hub the resulting golden data.

Both Loads and Batches can be reviewed from the Data Locations perspective in the Semarchy Workbench.

External Load Lifecycle

An external load lifecycle is described below:

  1. Initialize the External Load

    • The middleware uses the SQL Interface or the REST API to initialize an external load.

    • It receives from the platform a Load ID identifying the external load.

    • At that stage, an external load transaction is open with the platform.

  2. Load Data

    • The middleware inserts data into the landing tables in the data location schema. This done using the SQL Interface or the REST API.

    • When loading data, the middleware provides both the Load ID and a Publisher Code corresponding to the publisher application.

  3. Submit the External Load

    • The middleware uses the SQL Interface or the REST API to submit the external load.

    • It provides the Load ID as well as the name of the Integration Job to trigger with this submission.

    • The platform creates a Batch to process the data published in this external load.

    • It receives from the platform a Batch ID identifying the batch that is processed by the platform for this external load.

    • At that stage, the external load transaction is closed.

The middleware can also Cancel the External Load to abort the external load instead of submitting it.

Data locations may be moved to a Maintenance status by their administrator. When a data location is in that state, it is not possible to initialize external loads.

Continuous Loads

Continuous loads enable integration developers to push data into the MDM hub in a continuous way without having to take care of Load Initialization or Load Submission.

With continuous loads:

  • Integration developers do not need to initialize and submit individual external loads. They directly load data into the hub using the Load ID of the continuous load.

  • At regular intervals, Semarchy xDM automatically creates then submits an external load with the data loaded in the continuous load. This external load is submitted with a program name, a job, and a submitter name.

  • The continuous load remains, with the same Load ID. Subsequent data loads made with this Load ID are processed at the next interval.

Continuous loads are configured and managed by the administrator in a data location. Unlike external loads, they cannot be created, submitted or canceled via integration methods.

The SQL Interface or the REST API can use continuous loads to push data into the hub without having to perform the load ID management operation.

When using a continuous load, make sure that it is configured with a job that processes the entities that are loaded.

Batch Lifecycle

When an external load is submitted, the following operations take place:

  1. The platform creates a batch and returns to the submitter the Batch ID

  2. The integration batch poller picks up the batch on its schedule:

    1. It creates a Job instance using the Job Definition which name is provided in the submit action.

    2. It moves the job into the Queue specified in the job definition

  3. The Execution engine processes this job in the queue.

  4. When the job completes, the batch is considered finished.

Even when multiple loads take place simultaneously, the sequence into which the external loads are submitted defines the order into which the data is processed by the integration jobs and golden data certified from this source data.

Interfaces for Integration

This section describes the interfaces available for integration.

REST API

This interface is available for application to consume or publish data from the hub in a programmatic way.

SQL Interface

This interface is available for data integration and ETL products to publish or consume data. It is composed of the Integration Load Functions and the Data Location Database Schema.

The Integration Load Functions are stored in the repository schema, contains function to manage the External Load Lifecycle, for the purpose of publishing data in batch mode.

The Data Location Database Schema stores a set of tables that contain the hub data.
This schema contains the landing (or staging) tables used to publish data into the hub. It contains also the golden records tables and the intermediate tables handled by the integration job that create golden records from the source records. There is a single data structure for the entire hub, and a single set of tables for each entity regardless of the model edition.

The data location schema is accessed for integration purposes to:

  • Publish data in batch mode in the landing tables.

  • Consume data from the golden data and master data tables.

The structure of the tables stored in the data location schema is detailed in the Table Structures section.

Data Certification

This section explains the artifacts involved when certify data published into the hub.

Integration Job

The integration job processes the data submitted in an external load and runs this data through the Certification Process, which is a series of steps to create and certify golden data out of this source data.

This job is generated from the certification rules defined at design time, and it uses the data structures automatically created in the MDM hub when deploying the model edition.

Although understanding the details of the process is not needed for publishing source data or consuming golden data, it is necessary to have complete understanding of this process to drill down into the various structures between the source and the golden data. For example, to review the rejects or the duplicates detected by the integration job for a given golden record.

An integration job is a sequence of tasks used to certify golden data for a group of entities. The model edition deployed in the data location brings several integration jobs definitions with it. Each of these job definitions is designed to certify data for a group of entities.

Integration jobs definitions as well as integration job logs are stored in the repository

For example, a multi-domain hub contains entities for the PARTY domain and for the PRODUCTS domain, and has two integration jobs definition:

  • INTEGRATE_CUSTOMERS certifies data for the Party, Location, etc… entities.

  • INTEGRATE_PRODUCTS certifies data for the Brand, Product, Part, etc… entities.

Integration jobs are started when source data has been loaded in the landing tables and is submitted for golden data certification.

Each integration job is the implementation of the overall certification process template. It may contain all or some of the steps of this process. The following section details the structure of the certification process.

Certification Process

Introduction to the Certification Process

The Certification Process creates consolidated and certified Golden Records from various sources:

  • Source Records, pushed into the hub by middleware systems on behalf of upstream applications (known as the Publishers).
    Depending on the type of the entity, these records are either converted to golden records directly (basic entities) or matched and consolidated into golden records (ID and fuzzy matched entities). When matched and consolidated, these records are referred to as Master Records the golden record they have contributed to create are referred to as Master Based golden records.

  • Source Authoring Record, authored by users in the MDM applications.
    When a user authors data in an MDM application, depending on the entity type and the application design, he performs one of the following operations:

    • He creates new golden records or updates existing golden records that exist only for the hub, and do not exist in any of the publishers. These records are referred to as Data Entry Based golden records This pattern is allowed for all entities, but basic entities support only this pattern.

    • He creates or updates master records on behalf of publishers, submitting these records to matching and consolidation. This pattern is allowed only for ID and fuzzy matched entities.

    • He overrides golden values resulting from the consolidation of records pushed by publishers. This pattern is allowed only for ID and fuzzy matched entities.

  • Delete Operations, made by users on golden records from entities with delete enabled.

  • Matching Decisions, taken by data stewards for fuzzy matched entities, using duplicates managers. Such decisions include confirming, merging or splitting groups of matching records as well as accepting/rejecting suggestions.

The certification process takes these various sources, applies the rules and constraints defined in the model in order to create, update or delete the golden data that business users browse using the MDM applications and that downstream application consume from the hub.

This process is automated and involves several phases, automatically generated from the rules and constraints, which are defined in the model based on the functional knowledge of the entities and the publishers involved.

The following sections describe the details of the certification process for ID, fuzzy matched and basic entities, and the delete process for all entities.

Certification Process for ID and Fuzzy Matched Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

  1. Enrich and Standardize Source Data: Source Authoring Records (in the SA tables) created or updated on behalf of publishers and Source Records (in the SD tables) are enriched and standardized using the SemQL and Plug-in Enrichers executed Pre-Consolidation.

  2. Validate Source Data: The enriched and standardized records are checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged into the SE - source errors - and AE - authoring errors - tables.

    Note that source authoring records are enriched and validated only for basic entities. For ID and fuzzy matched, source authoring records are not enriched and validated.

  3. Match and Find Duplicates: For fuzzy matched entities, this step matches pairs of records using a Matcher and creates groups of matching records (match groups). For ID matched entities, matching is simply made on the ID value.
    The matcher works as follows:

    • It runs a set of Match Rules. Each rule has two phases: first, a binning phase creates small bins of records. Then a matching phase compares each pair of records within these small bins to detects duplicates.

    • Each match rule has a Match Score that expresses how strongly the pair of records matches. A pair of records that match according to one or more rules is given the highest Match Score of all these rules. Match pairs with scores and rules are stored in the DU table.

    • When a match group is created, an overall Confidence Score is computed for that group. According to this score, the group is marked as a suggestion or immediately merged, and possibly confirmed. These automated actions are configured in the Merge Policy and Auto-Confirm Policy of the matcher.

    • Matching Decisions taken by users on match groups (stored in the UM table) are applied at that point, superseding the matcher’s choices.

  4. Consolidate Data: This step consolidates match group duplicates into single consolidated records. The Consolidation Rules created in the Survivorship Rules defines how the attributes consolidate. Integration master records and integration golden (consolidated) records are stored at that stage in the GI and MI tables

  5. Enrich Consolidated Data: The SemQL and Plug-in Enrichers executed Post-Consolidation run to standardize or add data to the consolidated records.

  6. Publish Certified Golden Data: This step finally publishes the Golden Records for consumption. The final master and golden records are stored in the GD and MD tables

    • This step applies possible overrides from Source Authoring Record (Involving the SA, SF, GA and GF tables) , according to the Override Rules defined in the Survivorship Rules.

    • This step also creates or updates Data Entry Based golden records (that exist only in the MDM), from Source Authoring Records.

  7. Validate Golden Data: The quality of the golden records is checked against the various Constraints executed on golden records (Post-Consolidation). Note that unlike the pre-consolidation validation, it does not remove erroneous golden records from the flow but flags them as erroneous. The errors are also logged (in the GE tables) .

  8. Historize Data: Golden and master data changes are added to their history (stored in the GH and MH tables) if historization is enabled.

Source Authoring Records are not enriched or validated for ID and fuzzy matched entities as part of the certification process. These records should be enriched and validated as part of the steppers into which users author the data.

Certification Process for Basic Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

  1. Enrich and Standardize Source Data: During this step, the Source Records and Source Authoring Records (both stored in the SA tables) are enriched and standardized using SemQL and Plug-in Enrichers executed Pre-Consolidation.

  2. Validate Source Data: The quality of the enriched source data is checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged (in the AE tables) .

  3. Publish Certified Golden Data: This step finally publishes the Golden Records for consumption (in the GD tables) .

  4. Historize Data: Golden data changes are added to their history (stored in the GH table) if historization is enabled.

Note that:

  • Basic entities do not separate Source Records from Source Authoring Records. Both follow the same process.

  • Source data for basic entities does not pass through enrichers or validations executed post-consolidation.

Deletion Process

A Delete Operation (for basic, ID or fuzzy matched entities) involves the following steps:

  1. Propagate through Cascade: Extends the deletion to the child records directly or indirectly related to the deleted ones with a Cascade configuration for Delete Propagation.

  2. Propagate through Nullify: Nullifies child records related to the deleted ones with a Nullify configuration for the Delete Propagation.

  3. Compute Restrictions: Removes from deletion the records having related child records and a Restrict configuration for the Delete Propagation. If restrictions are found, the entire delete is canceled as a whole.

  4. Propagate Delete to owned Master Records to propagate deletion to the master records attached to deleted golden records. This step only applies to ID and fuzzy matched entities.

  5. Publish Deletion: Tracks record deletion (in the GX and MX tables), with the record values for soft deletes only, and then removes the records from the golden and master data (in the GD and MD tables). When doing a hard delete, this step deletes any trace of the records in every table (SA, SD, UM, MH, GH, etc.). The only trace of a hard delete is the ID (without data) of the deleted master and golden records, in the GX and MX tables. Deletes are tracked in the history for golden and master records (in the MH and GH tables), if historization is configured.

Is is not necessary to configure in the job all the entities deletion should cascade to. The job generation automatically detects the entities that must be included for deletion based on the entities managed by the job.

For more information about these various phases and how the certification process is designed, refer to the Certification Process Design chapter in the Semarchy xDM Developer’s Guide.

Table Structures

This section describes the general structure of the tables involved in the certification process.

Tables

For each entity, a pre-defined set of tables is created. These tables are named with the following convention: <Prefix>_<Physical Table Name>

The prefix is automatically generated depending on the table nature (Source Data, Source Error, etc.), and the Physical Table Name is set in the entity definition.

The following list describes the tables created for a given entity.

Table Name Applicable Entity Type Access Description

SD_<Physical Table Name>

Source Data

Fuzzy and ID Matched

Read/Write

This table contains source data loaded by publishers, using the middleware.

SA_<Physical Table Name>

Source Authoring

All

Read (Write for Basic Entities)

For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated to a SF_<Physical Table Name>, storing flags that indicate which field was overriden.
For basic entities, it contains the records created by the user, and can be used to load data into the hub instead of the SD table. This table also used to trace deletion operations.

SE_<Physical Table Name>
AE_<Physical Table Name>

Source Errors
Source Authoring Error

All

Read

This table contains the errors detected during the pre-consolidation validation phase. It is named AE_<Physical Table Name> for basic entities.

DU_<Physical Table Name>

Duplicates

Fuzzy Matched

Read

This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score.

UM_<Physical Table Name>

Matching Groups

Fuzzy Matched

Read

Table containing the master records being re-grouped by a user during a duplicates management operation.

MI_<Physical Table Name>

Master Integration

Fuzzy and ID Matched

Internal

This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase.

MD_<Physical Table Name>

Master Data

Fuzzy and ID Matched

Read

This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data.

MX_<Physical Table Name>

Deleted Master

Fuzzy and ID Matched

Read

Table containing deleted master records logs, plus the deleted data for soft deletes.

MH_<Physical Table Name>

Master History

Fuzzy and ID Matched

Read

Master records history.

GI_<Physical Table Name>

Golden Integration

Fuzzy and ID Matched

Internal

This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase.

GA_<Physical Table Name>

Golden Authoring

Fuzzy and ID Matched

Internal

This table contains data overrides performed on golden records . It is associated to a GF_<Physical Table Name>, storing flags that indicate which field was overriden, and a GP_<Physical Table Name> storing values consolidated before an override.

GX_<Physical Table Name>

Deleted Golden

All

Read

Table containing deleted golden records logs, plus the deleted data for soft deletes.

GH_<Physical Table Name>

Golden History

Fuzzy and ID Matched

Read

Golden records history.

GD_<Physical Table Name>

Golden Data

All

Read

This table contains the golden records, including those flagged as erroneous by post-consolidation validations.

GE_<Physical Table Name>

Golden Errors

Fuzzy and ID Matched

Read

This table contains the errors detected during the post-consolidation validation phase on golden records.

For publishing and consuming data, internal tables are not needed. The structure of these tables is not explained in this document.

Columns

About Date Values and Time Zones
Date values are stored, read and written in the time zone of the application server. When consuming dates and timestamp values, time zone conversion may be necessary if the client time zone differs from the application server time zone. Similarly, the certification process must convert dates and timestamp values in the application server’s time zone.
Attribute Columns

Attributes appear in the tables’ structure as follows:

  • Attributes using list of values, built-in and user-defined types are mapped to single columns with a database type and length corresponding to the attribute type. The column name for such a simple attribute is the Physical Column Name value specified in the simple attribute definition.

  • Complex attributes are mapped on columns named with the following convention: <Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>

For example: The Country (COUNTRY) and City (CITY) definition attributes of the Geocoded Address complex attribute (Prefix: GEO) will be mapped to columns named GEOCOUNTRY and GEOCITY.

Built-in Columns

In addition to the attribute columns, built-in columns are added to the tables’ structure, and are used in the certification process.

They track for example:

  • Batch information: B_BATCHID

  • Cause of a rejected record: B_CONSTRAINTNAME, B_CONSTRAINTTYPE

  • Class information: B_CLASSNAME

  • Match/Merge information: B_MATCHGRP, B_CONFSCORE, B_HASSUGGMERGE, B_SUGGMERGEID, B_SUGGMERGECONFSCORE, B_SUGGMERGEMASTERSCOUNT, B_CONFIRMATIONSTATUS, B_MASTERSCOUNT, B_ISCONFIRMED

  • Delete operation information: B_DELETETYPE, B_DELETEDATE, B_DELETEAUTHOR, B_DELETEOPERATION,

The following list describes these columns.

Column Name Exists in Tables Datatype Description

B_AUTHORINGTYPE

GA, SA

VARCHAR2(30 CHAR)

For matched entities, type of authoring operation:

  • OVERRIDE for override

  • DATA_ENTRY for data creation

B_BATCHID

AE, GA, GD, GE, GF, GI, GP, GX, MD, MI, MX, SE

NUMBER(38, 0)

ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected.

B_CHECKOUTINCLUSION

UM

VARCHAR2(30 CHAR)

Cause that made the record part of the duplicate management transaction. Possible causes are:

  • User (USER): the record was checked out by the user.

  • Same Golden (SAME_GOLDEN): the record is part of the same matching group than another master that was checked out.

  • Same Suggestion (SAME_SUGG): the record is part of the same suggestion than another master that was checked out.

  • Same Exclusion Group (SAME_XGRP): the record is part of the same exclusion group than another master that was checked out.

  • Other (OTHER): Another reason.

B_CLASSNAME

AE, GA, GD, GE, GF, GH, GI, GP, GX, MD, MH, MI, MX, SA, SD, SE, SF, UM

VARCHAR2(128 CHAR)

Unique name of class / entity to which this record belongs. See Class Name for more information.

B_CONFIRMATIONSTATUS

GD, GH, GI, GX, MD, MH, MI, MX, UM

VARCHAR2(30 CHAR)

Confirmation status for duplicate management:

  • Confirmed (CONFIRMED): Indicates that a master is confirmed in a golden or that a golden has all its masters confirmed.

  • Not-confirmed (NOT_CONFIRMED): Indicates that a master is not confirmed or that a golden is entirely made of unconfirmed masters.

  • Historically confirmed (WAS_CONFIRMED, for master records): If a master was confirmed into to a golden but this golden was fused into another golden.

  • Partially confirmed (PARTIALLY_CONFIRMED for golden records only): Indicates that a golden has part of his masters confirmed.

B_CONFIRMEDSDPK

MD, MI, UM

Varies

ID of the confirmed golden record this master record is attached to.

B_CONFSCORE

GD, GH, GI, GX, UM

NUMBER(38, 0)

Confidence Score of the golden record. It is the average of the match scores in the match group.

B_CONSTRAINTNAME

AE, GE, SE

VARCHAR2(128 CHAR)

For error records, name of the constraint causing this error.

B_CONSTRAINTTYPE

AE, GE, SE

VARCHAR2(30 CHAR)

For error records, type of the constraint causing this error.

B_COPIEDFROM

SA

NUMBER(38, 0)

ID of the original record copied into this record.

B_CREATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which for record was created. For source and master data, the user who has created the record in a workflow, stepper, duplicate manager or a user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_CREDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit date (timestamp) of the batch into which the record was created. For source data (SD, SA, UM), the date (timestamp) at which the source record was created.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit date (timestamp) of the batch.

B_DELETEAUTHOR

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Author of the delete operation.

B_DELETEDATE

GX, MX, SA, SD

TIMESTAMP(6)

Deletion date of a record

B_DELETEOPERATION

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Delete operation ID.

B_DELETEPATH

GX, MX, SA, SD

VARCHAR2(4000 CHAR)

Cascade path through which the record was reached during a delete. Null for record directly selected for deletion.

B_DELETETYPE

GX, MX, SA, SD

VARCHAR2(30 CHAR)

Delete Type (SOFT_DELETE or HARD_DELETE for golden and master deletion). The type of delete can be LEGLESS_DELETE for golden records deleted when they loose all their master records.

B_ERROR_STATUS

GD, GH, GX, SA, SD

VARCHAR2(30 CHAR)

Error Status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:

  • VALID if the record has no error.

  • ERROR if the record has errors

  • RECYCLED if the record was recycled and considered valid

  • OBSOLETE_ERROR if the record had errors but a newer version of the record fixes them.

  • a <NULL> value also indicates a record with no error.  

B_FROMBATCHID

GH, MH

NUMBER(38, 0)

Batch at which the history record was created.

B_GOLDENTYPE

GD, GH, GX

VARCHAR2(30 CHAR)

For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM (DE_BASED) or was consolidated from publishers'' data and possibly overriden (MASTER_BASED)

B_HASOVERRIDE

GD, GH, GX

CHAR(1 CHAR)

For fuzzy matching and ID matching entities, this flag (0 or 1) indicates whether the golden record has override values.

B_HASSUGGMERGE

GD, GH, GI, GX, MD, MH, MI, MX, UM

CHAR(1 CHAR)

Flag (0 or 1) indicating that match and merge suggestions are available for this record.

B_ISCONFIRMED

GD, GH, GI, GX

CHAR(1 CHAR)

Flag (0 or 1) indicated whether this golden record has been confirmed (Fuzzy Matched entities only).

B_LOADID

AE, SA, SD, SE, SF, UM

NUMBER(38, 0)

Load Identifier used as the unique transaction ID for external application pushing data to the platform

B_MASTERSCOUNT

GD, GH, GI, GX, UM

NUMBER(38, 0)

Number of master records contributing to the golden record.

B_MATCHGRP

MD, MH, MI, MX

NUMBER(38, 0)

ID of the match group for the master record. This column is set when matching takes place.

B_MATCHRULE

DU

VARCHAR2(128 CHAR)

Match rule having cause the pair of records to match.

B_MATCHSCORE

DU

NUMBER(38, 0)

Score of the matched pair.

B_OLDMATCHGRP

MI

NUMBER(38, 0)

Previous identifier of the match group for the master record.

B_OLDSDPK

MD, MH, MI, MX

Varies

ID of the previous golden record the master record was attached to.

B_ORIGINALBATCHID

SA, SD, UM

NUMBER(38, 0)

Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.

B_ORIGINALCONFIRMATIONSTATUS

UM

VARCHAR2(30 CHAR)

Original Confirmation Status in a duplicate management operation.

B_ORIGINALCONFIRMEDSDPK

UM

Varies

ID of the original confirmed golden record to which the master record was attached to.

B_ORIGINALCONFSCORE

UM

NUMBER(38, 0)

Confidence Score of the original golden in a duplicate management operation.

B_ORIGINALMASTERSCOUNT

UM

NUMBER(38, 0)

Number of master records in the original golden in a duplicate management operation.

B_ORIGINALSDPK

UM

Varies

ID of the original golden record to which the master record was attached to.

B_ORIGINALXGRP

UM

RAW(16)

Original exclusion group to which the master record belonged to.

B_PUBID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(30 CHAR)

For matching entities, code of the publisher that published the record.

B_PUBID2

DU

VARCHAR2(30 CHAR)

For duplicate pairs, code of the publisher that published the second record.

B_SOURCEID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(128 CHAR)

ID of the source record in the source publisher system (Fuzzy Matched entities only). See below for a detailed explanation of the primary key options.

B_SOURCEID2

DU

VARCHAR2(128 CHAR)

For duplicate pairs, ID of the second source record in the source publisher system (Fuzzy Matched entities only).

B_SUGGMERGECONFSCORE

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Confidence Score for the suggested match group.

B_SUGGMERGEID

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

ID of the merge suggested by the automated matching.

B_SUGGMERGEMASTERSCOUNT

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Number of master records in the suggested merge.

B_TOBATCHID

GH, MH

NUMBER(38, 0)

Batch at which history record stopped being current or null if the records is still current.

B_UPDATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was updated. For source and master data, the latest user who has updated the record in a workflow, stepper, duplicate manager or the user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_UPDDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit date (timestamp) of the batch into which the record was updated. For source data (SD, SA, UM), the date (timestamp) at which the source record was created or updated.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit date (timstamp) of the batch.

B_XGRP

MD, MH, MI, MX, UM

RAW(16)

Exclusion group ID. An exclusion group represents a group of records for which a user has taken split decisions.

Primary Keys Columns

The primary key to load depends on the Entity Type:

Basic

For basic entities, Semarchy xDM uses a single identifier, stored in a column named after the Physical Column Name specified in the primary key attribute definition. This column will exist in all tables. This identifier is simply propagated into the hub from the source records.

ID Matched

When using ID Matching, Semarchy xDM assumes a common identifier across all systems. In this case, this common identifier is stored in a column named after the Physical Column Name specified in the primary key attribute definition.

This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.

Fuzzy Matched

When using Fuzzy Matching, Semarchy xDM assumes no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.

In this case, the source identifier is stored in the column named B_SOURCEID, along with the publisher code stored in the B_PUBID. This is the case for the SD, SE and MD tables.

When publishing data into the hub, the middleware loads this B_SOURCEID column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID.

When the records are consolidated in a golden record (GD and GE tables), a System Defined Primary Key is generated and stored in a column named after the Physical Column Name specified in the primary key attribute definition. This key is referred to as the Golden Record ID.

The MD table makes the bridge between the Source ID and the Golden ID as it contains both these values.

Reference Columns

When a reference exists in the source publisher and need to be expressed in the landing table, this reference mapping in the table structure depends on the Entity Type of the referenced entity.

Reference to a Basic Entity

For a reference to a Basic entity, the referenced key stored is simply the primary key of the basic entity. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_COUNTRIES.

Reference to an ID Matched Entity

For a reference to an ID Matched entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_EMPLOYEE.

For example: if Customer references Employee and this entity uses ID Matching, SD_CUSTOMER will contain the following information.

CUSTOMER_NAME F_EMPLOYEE

Gadgetron

11

Roxxon

56

This information means that the customer Gadgetron references the employee number 11, and Roxxon references employee number 56. This employee number is the same in all systems.

Reference to a Fuzzy Matched Entity

For a reference to a Fuzzy Matched entity, the referenced may point to a master record from a publisher, or directly to a golden record in the hub:

  • For a reference to a master record:

    • The referenced value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with FS_ (for Foreign Source ID). For example, FS_CUSTOMER.

    • The referenced publisher is also stored, in a column named after the Physical Name provided in the definition of the reference and prefixed with FP_ (for Foreign Publisher). For example, FP_CUSTOMER. For a reference to a golden record:

  • The reference value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with F_ (for Foreign ID).

Example 1. Reference to a master record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME FP_CUSTOMER FS_CUSTOMER

John

Doe

CRM

1235

Jane

Smith

MKT

A3251

This information means that the contact John Doe references the customer with the primary key 1235 in the CRM publisher, and that Jane Smith references the customer with the primary key A3251 in the MKT publisher.

Example 2. Reference to a golden record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME F_CUSTOMER

John

Doe

6598

Jane

Smith

6556

This information means that the contact John Doe references the golden customer record with the golden ID 6598, and that Jane Smith references the golden customer with the golden ID 6556.

If both the reference to the master and to the golden are loaded, only the reference to the golden is taken into account.
To clear a reference, you must push a null value in the three FS_, FP_ and F_ columns.

Class Name

Several entities involved in an inheritance relation have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The B_CLASSNAME column is used to identify the class (entity) of a record in a table.

For example, when Person and Company inherit from the Party entity, the resulting table is named after the parent entity (Party), and will contain all the attributes of Person and Company as well. In the GD_PARTY table for example, records representing persons will have B_CLASSNAME='Person'. When publishing person or company information in the SD_PARTY table, the middleware must set B_CLASSNAME='Person' or B_CLASSNAME='Company' accordingly.

Constraints

The only constraints phyisically enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.

For example, on an SD_ table, the following constraints are enforced:

  • primary key on B_LOADID, B_PUBID, B_SOURCEID (or the primary key column for ID Matched entities)

  • B_LOADID, B_CLASSNAME, B_PUBID and B_SOURCEID (or the primary key column for ID Matched entities) are not null

Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model but checked during the validation phase.

Publishing Data Using SQL

Publishing data is done using function calls to initialize/submit the external load and DML statements to load the data into the landing (SD for matching entities, SA for basic entities) tables.

Overview

In this approach, external loads are handled using the and SQL interface. It works as follows:

  1. The external load is initialized using a function call to the GET_NEW_LOADID function on the repository schema. This function call returns a Load ID.

  2. SD (SA for Basic Entities) tables are loaded using SQL inserts issued on the data location schema.

  3. The external load is submitted or cancelled using function calls on the repository schema:

    • SUBMIT_LOAD to submit a load identified by its Load ID. This function call returns a Batch ID.

    • CANCEL_LOAD to cancel a load identified by its Load ID.

This approach is recommended when the middleware is an ETL or data integration product.

Alternately, you can use Continuous Loads to load data into the SD tables with the Load ID of the Continuous Load, without having to explicitly initialize and submit individual external loads.

Initializing a Load

Initializing an external load uses the following parameters:

  • data_location_name: name of the data location. This data location is the one hosting the MDM hub into which the external load is performed.

  • program_name: This variable is for information only. It is used to identify the middleware performing the external load. For example ’ETL Custom Script’.

  • load_description: This variable is for information only. It is used to describe the nature of this load. For example: ’Daily updates for Customers and Contacts’.

  • user_name: name of the user initializing the external load. This user may or may not be a user defined in the security realm of the application server.

This function call to GET_NEW_LOADID initializes an external load and returns a Load ID. It is performed on the repository database schema.

In Oracle, the integration load functions are part of a INTEGRATION_LOAD package. You must prefix the function name with the name of this package.
Initializing and External Load (Oracle)
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
        '<data_location_name>'  /* Data Location Name (As in the UI) */
        '<program_name>'        /* Informational. Identifies the Middleware*/
        '<load_description>'    /* Informational. Describes the load. */
        '<user_name>'           /* User initializing the load. */
          );
Initializing and External Load (PostgreSQL)
vLoad_id := <repository_schema>.get_new_loadid(
        '<data_location_name>'  /* Data Location Name (As in the UI) */
        '<program_name>'        /* Informational. Identifies the Middleware*/
        '<load_description>'    /* Informational. Describes the load. */
        '<user_name>'           /* User initializing the load. */
          );

The following example performs a function call initializing an external load on the CustomerMDMHub data location. The repository schema is REPO. The returned Load ID is stored in the vLoad_id variable.

Example: Initializing an external load (Oracle).
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
        'CustomerMDMHub',
        'Custom ETL',
        'Initial Load for the Hub',
        'John Doe' );

Loading Data

Loading data consists in inserting new source records in the SD (and SA for Basic Entities) tables for the set of entities taken into account in the integration job.

When loading data in the SD or SA tables:

  • You use SQL insert commands or your regular ETL/Data Integration Platform.

  • The insert commands are issued on the SD or SA tables stored in the data location schema.

Tables to Load

Make sure to load all the SD and SA tables for the entities that will be taken into account in the integration job. For example, if the integration job processes the Customer and Contact entities, then you should load the SD_CUSTOMER and SD_CONTACT tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded the SD_EMPLOYEE table will not be taken into account by the integration job to certify golden records for the Employee entity.

Referential Integrity and Load Order

There is no required order to load the SD and SA tables, as no foreign keys are implemented on these tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the Columns to Load section.

Using Transactions

It is recommended to use a database transaction when writing to the SD and SA tables.

  • For external loads: Canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables then canceling the load leaves useless information in the tables.
    Using a transaction gives you the capability to rollback all changes on the tables when canceling a load.

  • For continuous loads, it is strongly recommended to use a transaction and commit only when all tables are loaded: as the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is completed.

Columns to Load

This section provides some guidance for loading the columns of the SD and SA tables.

System Columns (Mandatory)

The following system columns must be loaded as indicated:

  • B_LOADID: This column must be loaded with the Load ID provided by GET_NEW_LOADID function call, of with the ID of the Continuous Load.

  • B_CLASSNAME: Name of the entity (or class) being loaded. When inheritance is used, the same table stores data for all parent and child classes. Set explicitly B_CLASSNAME to the name of the entity for which data is being published. For example: Person, Party, Company. See the Class Name section for more details.

  • B_PUBID: This column must be loaded with a Publisher Code for ID Matched and Fuzzy Matched entities. For example: CRM, MKT, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the Publishers node in the Model Edition view in the Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a Preferred Publisher consolidation strategy.

Basic entities assume a single source, and do not use the B_PUBID column.
Publisher codes are case sensitive. Make sure to load B_PUBID with the publisher code as defined in the model. Publisher codes may contain uppercase letters, digits and underscores.
Within a single load, you can load the records from various publishers, using the B_PUBID column to identify each publisher
Primary Keys (Mandatory)

For matching entities, the primary key that you load into the SD table allows identifying the source record from the publisher (identified by Publisher Code B_PUDIB). This primary key will allow taking into account the creation or update of a source record and report through the consolidation process the corresponding data changes to the golden record.

For basic entities, the primary key that you provide in the SA table is the one of the golden record and will allow updating this record.

The primary key column to load depends on the Entity Type of the entity.

Basic Entity

If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

ID Matched Entity

If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

Fuzzy Matched Entity

If the entity uses Fuzzy Matching, then you must load into the B_SOURCEID column the value of the primary key from the source system. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the B_SOURCEID column.

The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.
References (Mandatory)

When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the Entity Type of the referenced entity.

Reference to a Basic Entity

If the referenced entity is an Basic entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

Reference to an ID Matched Entity

If the referenced entity is an ID Matched entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

For example, if Customer references Employee and this entity uses ID Matching, then you must load into SD_CUSTOMER the F_EMPLOYEE column with the primary key of the source employee record referenced by each customer record.

Reference to a Fuzzy Matched Entity

If the referenced entity is a Fuzzy Matched entity, then you need to load two columns:

  • FS_<Physical Name of the Referenced To Role Name>: Load this column with the Source ID of the referenced record.

  • FP_<Physical Name of the Referenced To Role Name>: Code of the publisher of the referenced record.

Note that these columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.

For example, if Contact references Customer and this entity use Fuzzy Matching, you must load the following columns into the SD_CONTACT table:

  • FP_CUSTOMER: Code of the publisher providing the customer referenced by the given contact, e.g., MKT.

  • FS_CUSTOMER: Source ID of the customer referenced by the given contact, e.g., 81239.

Attribute Columns

You should load the attribute columns relevant for the entity you are loading.

Make sure to load:

  • The attribute columns that make sense for the entity class (B_CLASSNAME) you are loading.

  • The mandatory attribute columns. If not, pre-consolidation validation may reject source records with null values.

  • The columns for attributes using a list of values type. If these are loaded with values out of the LOV range of values, pre-consolidation validation may reject source records.

Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation.
Other Columns (Optional)

The following columns do not need to be loaded or can be optionally loaded:

  • B_ORIGINALBATCHID: This column is not used for external loads and should not be loaded.

  • B_CREATOR, B_UPDATOR: These columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch.

  • B_CREDATE, B_UPDDATE: These columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. If left null, these columns are set to the submit date (timestamp) of the batch. Note that values loaded here are not propagated beyond the SD table.

Submitting a Load

Submitting an external load uses the following parameters:

  • load_id: Load ID returned by the load initialization.

  • integration_job: Name of the integration job to process this load.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The SUBMIT_LOAD function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.

Submitting an External Load (Oracle)
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
        <load_id>                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<integration_job>'        /* Name of the Integration Job to trigger. */
        '<user_name>'                /* User who has initialized the load. */
          );
Submitting an External Load (PostgreSQL)
vBatch_id := <repository_schema>.submit_load(
        <load_id>                /* Load ID returned by get_new_loadid */
        '<integration_job>'        /* Name of the Integration Job to trigger. */
        '<user_name>'                /* User who has initialized the load. */
          );

The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name INTEGRATE_DATA. The repository schema is REPO. The returned Batch ID is stored in the vBatch_id variable.

Submitting an external load identified by the Load ID 22. (Oracle)
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
        22,
        'INTEGRATE_DATA',
        'John Doe' );

Canceling a Load

Canceling a load is performed using the CANCEL_LOAD function with the following parameters:

  • load_id: Load ID returned by the load initialization.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The CANCEL_LOAD procedure cancels an external load identified by its Load ID. It is performed on the repository database schema.

This procedure does not flush the content of the SD_% tables loaded during the external load. This must be taken care of separately.
Canceling an External Load (Oracle)
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
        <load_id>                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<user_name>'                /* User who has initialized the load. */
          );
Canceling an External Load (PostgreSQL)
<repository_schema>.cancel_load(
        <load_id>                /* Load ID returned by get_new_loadid */
        '<user_name>'                /* User who has initialized the load. */
          );
Call to INTEGRATION_LOAD.CANCEL_LOAD to cancel an external load identified by the Load ID 22. (Oracle)
REPO.INTEGRATION_LOAD.CANCEL_LOAD(
        22,
        'John Doe' );

Publishing Record Deletions

Semarchy xDM supports publishing golden records deletion using SQL for all entities using the same method.

To publish a golden record deletion, you must load the following columns into the SA table (even for ID and fuzzy matched entities):

  • B_LOADID: This column must be loaded with the Load ID provided by the GET_NEW_LOADID function call, of with the ID of the Continuous Load.

  • B_DELETETYPE: load this column with the SOFT_DELETE or HARD_DELETE value depending on the type of delete that you want to trigger.

  • B_DELETEOPERATION Delete operation ID. This ID must be a unique value for each deleted record. For example, use a UUID that you convert to a string (e.g.: Using the functions RAWTOHEX(SYS_GUID())).

You can optionally load the B_DELETEAUTHOR and B_DELETEDATE columns to provide an author and date for the delete operation.

When pushing a delete, the delete (and the Delete Operation ID) is automatically propagated to the child records according to the delete propagation configured on the references.
If a delete fails (for example because a child record prevents the delete), the deletion error is traced in the AE - authoring error - table for the entity.
For more information about deleting records, see the Entity Records Hard and Soft Delete in the section in the Semarchy xDM Developer’s Guide.

Managing Duplicates Using SQL

It is possible to manage duplicates using SQL, and perform programmatically actions typically available to users in applications' duplicate managers. The following sections provide the methods to perform various duplicates management operations using SQL with the UM table.

All these methods use a Load ID (in a vLoadID variable), which is initialized and submitted using a process similar to the one used when Publishing Data Using SQL.

Make sure to use a job that processes the entities for which the match and merge will run, as well as the related (child) entities.
You can also use the load ID of a continuous load to perform the duplicate management operations.

Confirming Matches

This method automatically confirms programmatically certain matches. It reproduces the behavior of the user interface.

Columns to load into the UM table:

  • B_LOADID: This column must be loaded with the Load ID.

  • B_SOURCEID, B_PUBID: These columns must be loaded with the ID and the publisher of the master records to confirm.

  • B_CLASSNAME: This column must be loaded with the name of the entity of these records.

  • <GoldenID>: This column must be loaded with the ID of the golden record to which the master record is attached to.

  • B_CONFIRMEDSDPK: This column must be loaded with the same value as <GoldenID>.

Confirm all unconfirmed master records
INSERT INTO UM_CUSTOMER (
        B_LOADID,
        B_SOURCEID,
        B_PUBID,
        B_CLASSNAME,
        CUSTOMER_ID,
        B_CONFIRMEDSDPK
)
SELECT
        vLoad_id,
        B_SOURCEID,
        B_PUBID,
        'Customer',
        CUSTOMER_ID,
        CUSTOMER_ID
FROM MD_CUSTOMER
WHERE
  B_CONFIRMATIONSTATUS = 'NOT_CONFIRMED';

Forcing a New Record to Join a Golden Record

When loading a source record, use this method to load at the same time a user decision forcing this record to:

  • Either join a golden record with a known ID,

  • Or create a new golden record with this ID.

To perform such an operation, combine data loading with the previous technique.

Load a new customer record and force it to merge with the existing golden record with CUSTOMER_ID 2.
/* Load source record */

INSERT into SD_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  -- Data fields are below
  CUSTOMER_NAME,
  ...

) values (
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  -- Values are below
  'GADGETRON',
  ...

)

/* Force a merge of this source record with a known golden ID */

INSERT INTO UM_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  CUSTOMER_ID,
  B_CONFIRMEDSDPK
)
SELECT
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  2,   -- Target Golden ID
  2    -- Confirm the golden ID
FROM MD_CUSTOMER;

COMMIT;

Moving Master and Golden Records to Golden Records

This method automatically moves master records to golden records. It reproduces the behavior of the user interface for moving masters into golden records, or for merging golden records into golden records

Columns to load into the UM table:

  • B_LOADID: This column must be loaded with the Load ID.

  • B_SOURCEID, B_PUBID: These columns must be loaded with the ID and the publisher of the master records to move.

  • B_CLASSNAME: This column must be loaded with the name of the entity of these records.

  • <GoldenID>: This column must be loaded with the ID of the golden record to which the master record should be moved to.

  • B_CONFIRMEDSDPK: This column must be set to `<GoldenID> if you want to flag the match as confirmed

  • B_XGRP: This column must be loaded with a UUID value. This UUID value should be same for all the records that were previously matched and that should no longer match. You can generate UUIDs using the SYS_GUID() Oracle function.

Merging the golden record with CUSTOMER_ID 1 into the golden record with CUSTOMER_ID 2 by moving all its master records.
INSERT INTO UM_CUSTOMER (
        B_LOADID,
        B_SOURCEID,
        B_PUBID,
        B_CLASSNAME,
        CUSTOMER_ID,
        B_CONFIRMEDSDPK,
        B_XGRP
)
SELECT
        vLoad_id,
        B_SOURCEID,
        B_PUBID,
        'Customer',
        2,          -- Target golden ID
        2,          -- Confirm the golden ID
        SYS_GUID()  -- New exclusion group ID
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 1;

Similarly, to split a golden record by moving its master records to new golden records, you do not provide an existing golden ID but seed a new golden ID using the golden ID generation method for the entity.

Splitting all the master records from golden record with CUSTOMER_ID 5 into different golden records.
INSERT INTO UM_CUSTOMER (
        B_LOADID,
        B_SOURCEID,
        B_PUBID,
        B_CLASSNAME,
        CUSTOMER_ID,
        B_CONFIRMEDSDPK,
        B_XGRP
)
SELECT
        vLoad_id,
        B_SOURCEID,
        B_PUBID,
        'Customer',
        SEQ_CUSTOMER.NEXTVAL, -- New golden ID generated by a sequence.
        SEQ_CUSTOMER.CURRVAL, -- The same golden ID is used to confirm.
        SYS_GUID()            -- New exclusion group ID
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 5;

Reset User Decisions

This methods gets rid of user decisions previously taken with duplicate managers, forcing the master records to merge according to the matcher’s decisions.

Columns to load into the UM table:

  • B_LOADID: This column must be loaded with the Load ID.

  • B_SOURCEID, B_PUBID: These columns must be loaded with the ID and the publisher of the master records to reconsider for the matching.

  • B_CLASSNAME: This column must be loaded with the name of the entity of these records.

  • This operation is not possible in the MDM user interface.

  • All user decisions are lost. Exclusion groups and match groups are recomputed for these records.

  • All golden IDs are regenerated in the process.

Resetting all user decisions
INSERT INTO UM_CUSTOMER (
    B_LOADID,
    B_SOURCEID,
    B_PUBID,
    B_CLASSNAME
)
SELECT
    vLoad_id,
    B_SOURCEID,
    B_PUBID,
    'Customer'
FROM MD_CUSTOMER;

Replaying the Matching

The previous method does not force the match rules to run again. It only removes user decisions. To force the entire matching process to replay, you must reset of the user decisions, as described in the previous method, and reload the data to re-match into the SD table.

Replaying the matching process
-- Remove User Decisions

INSERT INTO UM_CUSTOMER (
        B_LOADID,
        B_SOURCEID,
        B_PUBID,
        B_CLASSNAME
)
SELECT
        vLoad_id,
        B_SOURCEID,
        B_PUBID,
        'Customer'
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

-- Re-load the data from the latest master values.
INSERT into SD_CUSTOMER (
    B_LOADID,
    B_PUBID,
    B_SOURCEID,
    B_CLASSNAME,
        -- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
        ...
)
SELECT
        vLoad_id,
        B_PUBID,
        B_SOURCEID
        -- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
        ...
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

COMMIT;

Consuming Data Using SQL

Consuming Data from Semarchy xDM is done in SQL using the tables of the hubs. This chapter covers this type of consumption.

Overview

Consuming the data mainly involves:

  • The Golden Data. Contains the enriched, consolidated, validated and certified golden records

  • The Master Data. Contains the master records linked to the golden records. Master records contain the references to the source records.

Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.

Consuming Golden Data

Golden Data Table Structure

The complete list of system columns available is provided in the Table Structures section of this guide. The following table lists the system columns used when consuming information from the hub:

Column Name Datatype Description

B_BATCHID

NUMBER(38,0)

ID of the batch that created this record.

B_CLASSNAME

VARCHAR2(128 CHAR)

Class name of this record.

B_CREDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was created.

B_UPDDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was updated.

B_CREATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was created.

B_UPDATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was updated.

  • B_CREDATE, B_UPDDATE: These timestamps do not necessarily correspond to the moment when the records were created or updated in a source system or in the hub using a stepper, duplicate manager, etc. They give the moment when the record was submitted to the hub. Note that a record, even unmodified, it considered updated as soon as it is submitted to the hub.

  • B_CREDATOR, B_UPDATOR: For records created or modified using direct authoring or duplicate managers, this corresponds to the user who has performed the changes. For records created or modified in workflows, this corresponds to the user who has submitted the workflow.

Accessing Golden Data Using SQL

To access golden data using SQL, you query on the GD table. The queries are filtered to access data for one or more entity classes,

The following code sample gives a query to access golden records in the GD table.

Selecting the golden data.
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <classname_list> ) (3)

Explanation of the query:

1 We select all columns from the golden record. You can select specific columns from the GD record.
2 The GD table accessed is named after the entity. Replace <Physical Table Name> with the physical table name defined for the entity.
3 Provide the list of entity classes stored in this table you want to access. For example, Person,Company. Note that if you are no using inheritance, only one class is stored in the table and this clause is not required.
Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity.
select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where G.B_CLASSNAME = 'Contact'

Golden Data Primary Key

The primary key for a golden data table depends on the Entity Type and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:

  • For ID Generation - Sequence: NUMBER(38,0)

  • For ID Generation - UUID_: RAW

  • For ID Generation - Manual, the datatype is the one defined for the primary key attribute.

Consuming Master Data

Accessing master data from fuzzy and ID matched entities uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.

There is no master data for basic entities. Only source data, source errors and golden data are stored for this type of entities.

Depending on the Entity Type (ID Matched, Fuzzy Matched), access to the master data differs:

Accessing Master Data Using SQL (ID Matched Entity)

With ID Matched Entity, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (<primary_key_column> in the sample below)

Select the golden data corresponding to a given source record ID (ID Matched Entity)
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M
inner join GD_<Physical_Table_Name> G on (
         G.<primary_key_column> = M.<primary_key_column>
  and        G.B_CLASSNAME in ( <classname_list> )
)
where
  and        M.B_CLASSNAME in ( <classname_list> )
  and         M.B_PUBID = '<publisher_code>'
  and         M.<primary_key_column> = '<searched_source_id>'

In this code, access is filtered with the class name for both the golden data and the master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #9) and ID of the source record (line #10).

Example: Select the golden data for the Employee source record having the EMPLOYEE_NUMBER=100 in the HR system (ID Matched Entity).
select         M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME
from         MD_EMPLOYEE M
inner join GD_EMPLOYEE G on (
         G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and        G.B_CLASSNAME = 'Employee'
)
where
  and        M.B_CLASSNAME = 'Employee'
  and         M.B_PUBID = 'HR'
  and         M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Accessing Master Data Using SQL (Fuzzy Matched Entity)

With Fuzzy Matched Entity, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:

  • B_PUBID (VARCHAR2(30 CHAR)) contains the code of the publisher that published this record.

  • B_SOURCEID (VARCHAR2(128 CHAR)) contains the ID of the source record in that publisher

As a consequence, the link between golden and master records is done using the primary key column as in an ID Matched Entity, but the link to the source is done using the B_SOURCEID column.

Select the golden data for a given source record ID (Fuzzy Matched Entity)
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.*
from MD_<entity> M
inner join GD_<entity> G on (
         G. <primary_key_column> = M. <primary_key_column>
  and        G.B_CLASSNAME in ( <classname_list> )
)
where
  and        M.B_CLASSNAME in ( <classname_list> )
  and         M.B_PUBID = '<publisher_code>'
  and         M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #9) and ID of the source record (line #10), using the B_SOURCEID column.

Example: Select the golden values for the source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity).
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME
from         MD_CONTACT M
inner join GD_CONTACT G on (
         G.CONTACT_ID = M.CONTACT_ID
  and        G.B_CLASSNAME = 'Contact'
)
where
  and        M.B_CLASSNAME = 'Contact'
  and         M.B_PUBID = 'CRM'
  and         M.B_SOURCEID = '27030'

Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity). In this example, the master data table is used twice (aliased as M and MM) to retrieve the two sides of a duplicate pair.

Example: Side-by-side duplicates
select
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME
from         MD_CONTACT M
inner join GD_CONTACT G on
 (
         G.CONTACT_ID = M.CONTACT_ID
  and        G.B_CLASSNAME = 'Contact'
 )
inner join MD_CONTACT MM on
 (
         MM.CONTACT_ID = M.CONTACT_ID
  and        MM.B_CLASSNAME = 'Contact'
 )
where
  and        M.B_CLASSNAME = 'Contact'
  and         M.B_PUBID = 'CRM'       /* Publisher ID */
  and         M.B_SOURCEID = '27030'  /* Source ID */
  /* and M.B_PUBID = MM.B_PUBID */
  /* Uncomment the previous line to restrict the duplicates
     to those within the CRM application */

Consuming Errors

Accessing Pre-Consolidation Errors Using SQL

Pre-consolidation errors can be accessed via the Source Errors (SE, or AE for basic entities) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.

Note that the B_ERROR_STATUS column on the SD table provides a simple way to detect the records with errors.

Example: Select the source errors for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

Select the contact records in error
  select * from SD_CONTACT from B_ERROR_STATUS = 'ERROR';
Select the errors as well as the data for the source contacts in error
select SE.B_BATCHID, SD.B_LOADID,
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
  SE.B_PUBID, SE.B_SOURCEID,
  SD.FIRST_NAME, SD.LAST_NAME,
  SD.FP_CUSTOMER, SD.FS_CUSTOMER
from SE_CONTACT SE, SD_CONTACT SD
where
  SE.B_CLASSNAME = 'Contact'
  and SD.B_CLASSNAME = 'Contact'
  and SD.B_PUBID = SE.B_PUBID
  and SD.B_SOURCEID = SE.B_SOURCEID
  and SD.B_LOADID = SE.B_LOADID
  and SD.B_ERROR_STATUS = 'ERROR';

Accessing Post-Consolidation Errors Using SQL

Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

GE tables do not store the erroneous data itself, but they contain the identifier to the golden (GD) records in error.

Note that the B_ERROR_STATUS column on the GD table provides a simple way to detect the records with errors.

Example: Select the errors and data for the golden contacts in error.

Select the golden contacts in error
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
Select the errors as well as the data for the golden contacts in error
select GE.B_BATCHID,
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
  GE.CONTACT_ID,
  GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER
from GE_CONTACT GE, GD_CONTACT GD
where
  GE.B_CLASSNAME = 'Contact'
  and GD.B_CLASSNAME = 'Contact'
  and GD.CONTACT_ID = GE.CONTACT_ID
  and GD.B_BATCHID = GE.B_BATCHID
  and GD.B_ERROR_STATUS = 'ERROR';

Consuming Data Using the REST API

The REST API provides programmatic access to read data in Semarchy xDM. The REST API identifies users using authentication methods supported by Semarchy xDM; responses are in JSON format.

Overview

The REST API is available at the following base URL (referred to as [base_url] in this section):

http://<host>:<port>/{wars-base-name}/api/rest/

Each operation in the REST API is available with a parameterized URL under this base URL. The URL may also take additionnal parameters.

URL parameters names are case sensitive. Make sure to use them with the case described in this documentation. For example $baseExprs is a valid parameter, but $baseexprs or $BASEXPRS are invalid and will be ignored.

The operations available in the REST API are described below.

Querying Records

Query Multiple Records

Method

GET

Base URL

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [entity_name]: Name of the entity to query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value is required for certain view types only.

  • [batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value is required for certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

Response Format

The response contains the list of records with the query expressions.

Multiple records query: sample response.
{
  "records": [
        {
          "CustomerName": "Gadgetron",
          "City": "Honolulu"
        },
        {
          "CustomerName": "StayPaft",
          "City": "Honolulu"
        }
  ]
}

Query a Single Record

Method

GET

Base URL

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type]/[record_id]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])/[record_id]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])/[record_id]

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [entity_name]: Name of the entity to query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value is required for certain view types only.

  • [batch_id or as_of_date]: ID or date of the batch into which the record was created, modified or deleted. This value is required for certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

  • [record_id]: ID of the record to query.

Record ID Structure

The record_id value depends on the view_type used. The IDs for the various view types are listed below:

  • GD, GH4B, GI, GDWE, SA4L and SA4LK: <id_attribute>

  • MD, MH4B, MI and UM:

    • Fuzzy matched entity: PublisherID.SourceID

    • ID matched entity: PublisherID.<id_attribute>

  • GH: FromBatchID.<id_attribute>

  • MH:

    • Fuzzy matched entity: FromBatchID.PublisherID.SourceID

    • ID matched entity: FromBatchID.PublisherID.<id_attribute>

  • GX: BatchID.<id_attribute>

  • MX:

    • Fuzzy matched entity: BatchID.PublisherID.SourceID

    • ID matched entity: BatchID.PublisherID.<id_attribute>

  • SD and SDWE:

    • Fuzzy: LoadID.PublisherID.SourceID

    • ID matched entity: LoadID.PublisherID.<id_attribute>

  • SA and SAWE: LoadID.<id_attribute>

  • GE: <id_attribute>$ContrainstName.ConstraintType

  • SE:

    • Fuzzy: LoadID.PublisherID.SourceID$ContrainstName.ConstraintType

    • ID matched entity: LoadID.PublisherID.<id_attribute>$ContrainstName.ConstraintType

  • AE: LoadID.<id_attribute>$ContrainstName.ConstraintType

  • SA: LoadID.<id_attribute>

For example, to query a source data record (SD view), for matching entity you must provide the concatenation of LoadID.PublisherID.SourceID for the record_id

Supported Parameters

Response Format

The response contains the list of query expressions.

Single record query: sample response.
{
    "CustomerName": "Gadgetron",
    "City": "Honolulu",
    "UPPER(CustomerName)": "GADGETRON"
}

Counting Records

Method

GET

Base URL

URL

[base_url]/count/[data_location_name]/[entity_name]/[view_type]
[base_url]/count/[data_location_name]/[entity_name]/[view_type]([load_id])
[base_url]/count/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to count

  • [entity_name]: Name of the entity to count

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value is required for certain view types only.

  • [batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value is required for certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

Response Format

The response contains the count of records.

Record count query: sample response.
{
  "recordCount": "42",
}

Named Queries

Named queries provide customized REST endpoints to consume data with a predefined structure and query parameters.

For more information about Named Queries, see the Creating Named Queries section in the Semarchy xDM Developer’s Guide.

Method

GET

Base URL

URL

[base_url]/named-query/[data_location_name]/[named_query]/[view_type]
[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([load_id])
[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [named_query]: Name of the named query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value is required for certain view types only.

  • [batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value is required for certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

In addition, query parameters are passed to the query using their name. For example: ?param1=value1&param2=value2

Sorting applies before the sort expression defined in the name query itself. Pagination and Filtering only apply to the root object of the named query.

Response Format

The response contains a hierarchy of objects corresponding to the named query definition.

Query Parameters

View Types

Certain URLs support a view type parameter. This view parameter uses the alias of the view.

The most common view types are listed below:

  • GD: Golden consolidated and certified records.

  • GE: Errors detected after master data consolidation (post-consolidation), for matching entities.

  • GDWE: Golden consolidated and certified records, filtered to only show those with errors (matching entities only)

  • GH: Golden History.

  • GH4B: Golden data history as of batch or golden data if not historized. This view type requires a batch_id or as_of_date value in the URL.

  • GX: Deleted golden records logs (and data for soft delete).

  • MD: Enriched, validated and cleansed master records.

  • MH: Master records history.

  • MH4B: Master data history as of batch or master data if not historized. This view type requires a batch_id or as_of_date value in the URL.

  • MX: Deleted master records logs (and data for soft delete).

  • SD: Source records from finished loads,

  • SE: Errors on source records, returned with the data from the erroneous record,

  • SDWE: Source records from finished loads, filtered to only show those with errors,

  • SD4L Source data loaded by the publishers for a given load. This view type requires a load_id value in the URL.

  • SA: Source data authored by users.

  • AE: Errors detected on source authoring data.

  • SAWE: Source data authored by users with errors.

  • SA4L for source data authored by users for a given load. This view type requires a load_id value in the URL.
    Note that this view automatically resolves references to source/golden records (for basic entities) or source/master records (for matching entities), as performed when authoring the records in the applications.

All the views and their aliases are detailed in the Semarchy xDM SemQL Reference Guide.

Pagination

When calls return a large number of records, it is recommended to page the record set. By requesting smaller subsets of data, you will get a response much faster than when requesting the entire, potentially large, record set.

Parameter Default Value Description

$offset

0

Defines the record offset for pagination.

$limit

100

Defines the maximum number of returned records.

Example: Request the Customers between position 20 and 29
/query/CustomerAndFinancialMDM/Customer/GD?$offset=20&$limit=10

Query Expressions

Query expressions define which values are returned by the query for each record.

Parameter Default Value Description

$baseExprs

USER_ATTRS

Defines the set of base attributes to include in the records, in addition to those specified using the expr parameter. Possible values:

  • NONE: No attributes.

  • USER_ATTRS: All entity attributes, except the built-in attributes and references.

  • VIEW_ATTRS: All entity attributes, except references. Include the built-in attributes.

$expr

N/A

Expression to include to the record in addition to, or to remove from, the base attributes (baseExprs). You can put as many expressions as you need as query parameters. These expressions may be one of the following:

  • [semql_expression]: A SemQL expression to add to the base attributes. The expression string is used as the alias for the value.

  • [alias]:[semql_expression]: A SemQL expression to add to the base attributes, preceded by its alias.

  • $exclude:[semql_expression]: A SemQL expression to remove from the base attributes.

Example: Query all Customer attributes, except the built-in ones. Add the Creator and CreationDate built-in attributes.
/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=USER_ATTRS&$expr=Creator&$expr=CreationDate
Example: Query only the CustomerName (in uppercase, aliased as 'Name') and TotalRevenue (aliased as 'Revenue').
/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=NONE&$expr=Name:Upper(CustomerName)&$expr=Revenue:TotalRevenue
Example: Query all Customer attributes, except the built-in ones and the TotalRevenue.
/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=USER_ATTRS&$$exclude:TotalRevenue
For more information about SemQL expressions and attributes, refer to the Semarchy xDM SemQL Reference Guide.

Sorting

Sort expressions define the order of the records in the response.

Parameter Default Value Description

$orderBy

None

SemQL Order By Clause defining the order of the records in the response.

Example: Query Customers sorted by TotalRevenue and CustomerName
/query/CustomerAndFinancialMDM/Customer/GD?$orderBy=TotalRevenue%20DESC,CustomerName%20ASC
For more information about sorting with SemQL, refer to the Semarchy xDM SemQL Reference Guide.

Filtering

Use filters to request only specific records, that match the criteria you choose.

Parameter Default Value Description

$f

SemQL filter. It is a SemQL condition applied to filter records. Only records matching this condition are returned. You can combine several several SemQL filters in the same query.

Example: Query Contacts whose FirstName starts with 'Joe'
/query/CustomerAndFinancialMDM/Contact/GD?$f=FirstName%20LIKE%20%27Joe%25%27
For more information about SemQL conditions and expressions, refer to the Semarchy xDM SemQL Reference Guide.

Publishing Data Using the REST API

The REST API provides also provides programmatic management of data loads in Semarchy xDM, using requests as well as responses in JSON format.

Overview

The REST API provides the capabilities to manage loads, which includes querying, creating, submitting and canceling loads. It also supports persisting records in existing loads. Finally, it provides a shortcut to Load and Submit data in a single request.

Querying Existing Loads using REST

Method

GET

Base URL

URL

[base_url]/loads/[data_location_name]

Supported Parameters

  • $offset: Defines the results offset for pagination.

  • $limit Defines the maximum number of returned results.

  • $batchId: Limit results to the loads with given Batch ID.

  • $loadStatus: Limit results to the loads in a given status.

  • $jobNamePattern: Limit results to the loads with a job name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

  • $programNamePatter: Limit results to the loads created with program name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

  • $loadCreator: Limit results to the loads with this creator.

  • $batchSubmitter: Limit results to the loads with this submitter.

  • $loadDescriptionPattern: Limit results to the load with a description matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

Response Format

The response contains the list of loads meeting the criteria. The information returned for each load depends on the status of the load.

Query Existing Loads: sample response.
{
    "loads": [
        {
            "loadId": 1135,
            "loadStatus": "RUNNING",
            "loadCreator": "semadmin",
            "loadCreationDate": "2017-12-29T16:19:16.860Z",
            "numberOfJobExecutions": 0,
            "submitInterval": -1,
            "submittable": true
        },
        {
            "loadId": 1134,
            "loadStatus": "CANCELED",
            "loadCreator": "semadmin",
            "loadCreationDate": "2017-12-29T16:00:16.114Z",
            "numberOfJobExecutions": 0,
            "submitInterval": -1,
            "submittable": true
        },
        ...
   ]
}

Load Status

The possible values for the Load Status are listed in the table below.

Load Status Description

RUNNING

The load is currently running.

CANCELED

The load was canceled (CancelLoad)

PENDING

The load was submitted. A batch was created and is waiting for the batch poller to pick it.

SCHEDULED

The batch was taken into account by the batch poller. The job is queued by the engine.

PROCESSING

The batch’s job is currently being processed by the engine.

SUSPENDED

The job is suspended, either by an administrator or due to an error. It awaits for administrator intervention.

WARNING

The job completed successfully, but some records have caused validation errors.

DONE

The job completed successfully with no validation errors.

ERROR

The job did not complete successfully, it was canceled by an administrator.

Querying a Load by ID using REST

Method

GET

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Response Format

The response contains the load identified by [load_id]. The information returned for the load depends on its status.

Query one Load: sample response.
{
    "loadId": 1135,
    "loadStatus": "RUNNING",
    "loadCreator": "semadmin",
    "loadCreationDate": "2017-12-29T16:19:16.860Z",
    "numberOfJobExecutions": 0,
    "submitInterval": -1,
    "submittable": true
}

Initializing a Load using REST

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD action, as well as the information required to create a new load.

Load creation: sample request.
{
        "action":"CREATE_LOAD",
        "programName": "curl",
        "loadDescription": "Load Customers"
}

Response Format

The response contains the load information, including the load ID, load type, and an indication of the status.

Load creation: sample response.
{
    "loadId": 1150,
    "loadStatus": "RUNNING",
    "loadCreator": "semadmin",
    "loadCreationDate": "2018-01-05T14:41:26.693Z",
    "programName": "curl",
    "loadDescription": "Load Customers",
    "numberOfJobExecutions": 0,
    "submitInterval": -1,
    "submittable": true
}

Loading Data using REST

To load data in a given load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the PERSIST_DATA action, as well as the information required to load data. This information includes:

  • The persistOptions, which define how records are published into the hub. See Configuring Data Loading in REST for more information.

  • The persistRecords, which contain the records to be loaded into the hub.

Load data: sample request.
{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Customer": {
        "enrichers":[],
        "validations":[],
        "queryPotentialMatches": true
        }
      },
    "missingIdBehavior": "GENERATE",
    "persistMode": "IF_NO_ERROR_OR_MATCH"
    },
  "persistRecords": {
    "Customer": [
      {
        "CustomerName": "Gadgetron"
      }
    ]
  }
}

Response Format

The response contains, in the records element, the enriched records, the failedValidations as well as the potentialMatches (if any).
The response also contains the records' status as well as the load information..

Load submission: sample response.
{
    "status": "PERSISTED",   (1)
    "load": {                (2)
        "loadId": 1159,
        "loadStatus": "RUNNING",
        "loadCreator": "semadmin",
        "loadCreationDate": "2018-01-05T14:53:58.181Z",
        "programName": "curl",
        "loadDescription": "Load Customers",
        "numberOfJobExecutions": 0,
        "submitInterval": -1,
        "submittable": true
    },
    "records": {
        "Customer": [
            {
                "entityName": "Customer",
                "recordValues": {
                    "PublisherID": "CRM",
                    "SourceID": "5",
                    ...
                    "CustomerName": "Gadgetron",
                    "TotalRevenue": null,
                    "InputAddress.Address": null,
                    ...
                    "GeocodedAddress.Quality": null,
                    "FID_AccountManager": null
                },
                "failedValidations": [],
                "potentialMatches": []
            }
        ]
    }
}
1 The status is PERSISTED if the data was persisted, or or PERSIST_CANCELLED, if the data has not been persisted, for example if validations failed, or if a match was found.
2 Load information, similar to the information returned when Querying a Load by ID using REST

Configuring Data Loading in REST

Persist Options

When loading one or more records you can configure the following elements in the persistOptions element:

  • For each entity:

    • enrichers: List of enrichers that should be executed before persisting the records.

    • validations: List of validations that should be executed after the enrichers.

    • queryPotentialMatches: Detect and report potential matches, using the Matcher defined for the entity.

  • For the entire load:

    • missingIdBehavior: Option to define whether to generate IDs when they are not provided in the payload. Possible values are GENERATE to generate the ID or FAIL to fail loading if the ID is missing.

    • persistMode: Flag to define whether the records should be persisted or not.

Enrich, Validate and Detect Matches

When loading data, you can choose to execute enrichers, validations, and matchers for each entity.

You must create, for each entity you want to configure, under the optionsPerEntity element, one element named after the entity. For each entity, you can:

  • give a list of enrichers to run, with their enricher names,

  • give a list of validations to execute, with their validationType and validationName. Possible validationType values are CHECK, PLUGIN, MANDATORY, LOV, FOREIGN or UNIQUE,

  • indicate with the queryPotentialMatches boolean element whether the platform should check for duplicates according to the matching rules defined for the entity.

Data Loading Behavior

When invoked with a payload, the REST operation will run the enrichers, validations and matchers for each record, depending on the entity configuration.

It will then return:

  • The enriched data

  • a list of validation errors if any

  • a list of potential matches detected by the matching rules.

The record may be persisted or not at that stage, depending on the persistMode option:

  • If set to ALWAYS, then the records are persisted even if they have errors and potential matches.

  • If set to NEVER, then the records are not persisted. Use this option to perform a dry-run to test your records.

  • If set to IF_NO_ERROR_OR_MATCH (default value), then the records are persisted if no validation error was raised and no potential match was found.

Submitting a Load using REST

To submit a load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the SUBMIT action, as well as the job to use when submitting the data.

Load submission: sample request.
{
          "action":"SUBMIT",
        "jobName": "INTEGRATE_DATA"
}

Response Format

The response contains the load information, including the load ID, batch ID, and an indication of the status.

Load submission: sample response.
{
    "loadId": 1149,
    "loadStatus": "PENDING",
    "loadCreator": "semadmin",
    "loadCreationDate": "2018-01-05T14:40:32.838Z",
    "programName": "curl",
    "loadDescription": "Load Customers",
    "loadSubmitDate": "2018-01-05T14:40:45.824Z",
    "batchId": 563,
    "numberOfJobExecutions": 0,
    "batchSubmitter": "semadmin",
    "submitInterval": -1,
    "submittable": true
}

Canceling a Load using REST

To cancel a load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains only the CANCEL action.

Load cancellation: sample request.
{
          "action":"CANCEL"
}

Response Format

The response contains load ID as well as an indication of the status.

Load cancellation: sample response.
{
    "loadId": 1151,
    "loadStatus": "CANCELED",
    "loadCreator": "semadmin",
    "loadCreationDate": "2018-01-05T14:43:07.069Z",
    "programName": "curl",
    "loadDescription": "Load Customers",
    "numberOfJobExecutions": 0,
    "submitInterval": -1,
    "submittable": true
}

Load and Submit Data using REST

Using the REST API, it is possible to create a load, load data and submit the load in a single request.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD_AND_SUBMIT action, as well as the information required to create a new load, load data and submit the load. This includes the the persistOptions and persistRecords elements.

Load and Submit: sample request.
{
          "action":"CREATE_LOAD_AND_SUBMIT",
        "programName": "curl",
        "loadDescription": "Customer Load",
        "jobName": "INTEGRATE_DATA",
        "persistOptions": (1)
        "persistRecords": (2)
}
1 See Configuring Data Loading in REST for more information about the persistOptions.
2 See Loading Data using REST for more information about the persistRecords.

Response Format

The response contains load ID as well as an indication of the status.

Load and Submit: sample response.
{
    "status":  (1)
    "records": (2)
    "load":    (3)
}
1 PERSISTED or PERSIST_CANCELLED, if the data has not been persisted, for example if a match was found.
2 See Loading Data using REST for more information about this payload.
3 Load information, similar to the information returned when Querying a Load by ID using REST.