Welcome to Semarchy xDM.
This guide contains information about publishing and consuming data in an MDM hub generated by Semarchy xDM.

Preface

Audience

This document is intended for integration architects and developers setting up an MDM hub as part of their enterprise integration architecture.

If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials.
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Overview

Using this guide, you will:

  • Understand the Integration Component and Jobs of Semarchy xDM.

  • Learn how to publish data into an MDM hub through an external load.

  • Learn how to consume data from an MDM hub.

  • Learn how to interact with applications and workflows programmatically.

Introduction to Integration

What is Semarchy xDM?

Semarchy xDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.

Semarchy xDM is based on a coherent set of features for all Master Data Management projects.

Integration Capabilities

Semarchy xDM certifies golden data from source applications’ data and allows applications to consume this golden data.

Integration with Semarchy xDM is performed in several ways:

  • Publishing source data in an MDM hub deployed by Semarchy xDM, and have the hub certify golden records from this data.

  • Consuming golden or master data from the MDM hub.

  • Interacting with Workflows Instances to manage data.

Integration Methods

Semarchy xDM provides several integration services, including:

  • REST API, accessible from the Semarchy xDM application.

  • SQL Interfaces, that is a PL/SQL Package, and a set of tables stored in the Data Location Schema.

Using these components, applications and middleware products can consume and publish data in the hub.

Integration Concepts

Publishers and Consumers

In the context of integration, we refer to applications publishing source data into the hub as Publishers.
We also refer to application consuming golden data from the hub as Consumers.

These roles are not exclusive. An application can publish data into the MDM hub and consume certified golden data from the hub to update its records.

For example, a business intelligence application is typically a consumer-only as it consumes only golden records for reporting purposes. An operational application may publish its contact information to the MDM hub for certification and update its contact information with the certified golden records.

Publishers

This term refers to the original application from which the data originates, and not necessarily to the middleware tool or user-designed program actually doing the data movement. The publisher is typically an operational application such as a Customer Relationship Management (CRM), a Product Lifecycle Management (PLM) or an Enterprise Resource Planner (ERP).

A middleware tool may be an Extract-Transform-Load (ETL), Enterprise Service Bus (ESB) or any other data integration product. It may also be a user-designed program or script implemented in SQL, Java, etc. The middleware tool or user-designed program communicates with the publisher to extract data and communicates with the MDM Hub using the publishing methods described in this guide to load this data.

Consumers

Similarly to the publisher, this term refers to the applications consuming data from the hub. This consumption usually takes place via a middleware tool or a user-designed program, and uses the consumption methods described in this guide.

For the rest of this guide, we will refer to the middleware tool or user-defined product used for publishing or consuming data as the middleware.

Data Consumption

Data consumption is available via the REST API or SQL through the tables stored in the Data Location Schema.

Various views on the data can be used for consumption, including:

  • Golden Data enriched, standardized, validated, de-duplicated and certified in the hub.

  • Master Data pushed in the hub by source systems.

  • Errors raised by records pushed in the hub when violating the data quality rules defined in the model.

Data Consumption is always done on a specific Data Location.

Data Publishing

Publishing Concepts

Publishing source data for certification into golden data is performed as a transaction. Such publisher transaction is called an External Load. It is a Semarchy xDM transaction identified by a sequential Load ID.

An External Load represents a source data load transaction.

When an External Load is submitted with an Integration Job Name, a Batch - identified by a Batch ID - is created, and the Job starts processing the data published in this load.

A Batch represents a transaction certifying loaded data and writing in the hub the resulting golden data.

Both Loads and Batches can be reviewed from the Data Locations perspective in the Semarchy Workbench.

External Load Lifecycle

An external load lifecycle is described below:

  1. Initialize the External Load

    • The middleware uses the SQL Interface or the REST API to initialize an external load.

    • It receives from the platform a Load ID identifying the external load.

    • At that stage, an external load transaction is open with the platform.

  2. Load Data

    • The middleware inserts data into the landing tables in the data location schema. This done using the SQL Interface or the REST API .

    • When loading data, the middleware provides both the Load ID and a Publisher Code corresponding to the publisher application.

  3. Submit the External Load

    • The middleware uses the SQL Interface or the REST API to submit the external load.

    • It provides the Load ID as well as the name of the Integration Job to trigger with this submission.

    • The platform creates a Batch to process the data published in this external load.

    • It receives from the platform a Batch ID identifying the batch that is processed by the platform for this external load.

    • At that stage, the external load transaction is closed.

The middleware can also Cancel the External Load to abort the external load instead of submitting it.

Data locations may be moved to a Maintenance status by their administrator. When a data location is in that state, it is not possible to initialize external loads.

Continuous Loads

Continuous loads enable integration developers to push data into the MDM hub in a continuous way without having to take care of Load Initialization or Load Submission.

With continuous loads:

  • Integration developers do not need to initialize and submit individual external loads. They directly load data into the hub using the Load ID of the continuous load.

  • At regular intervals, Semarchy xDM automatically creates then submits an external load with the data loaded in the continuous load. This external load is submitted with a program name, a job, and a submitter name.

  • The continuous load remains, with the same Load ID. Subsequent data loads made with this Load ID are processed at the next interval.

Continuous loads are configured and managed by the administrator in a data location. Unlike external loads, they cannot be created, submitted or canceled via integration methods.

The SQL Interface or the REST API can use continuous loads to push data into the hub without having to perform the load ID management operation.

When using a continuous load, make sure that it is configured with a job that processes the entities that are loaded.

Batch Lifecycle

When an external load is submitted, the following operations take place:

  1. The platform creates a batch and returns to the submitter the Batch ID

  2. The integration batch poller picks up the batch on its schedule:

    1. It creates a Job instance using the Job Definition which name is provided in the submit action.

    2. It moves the job into the Queue specified in the job definition

  3. The Execution engine processes this job in the queue.

  4. When the job completes, the batch is considered finished.

Even when multiple loads take place simultaneously, the sequence into which the external loads are submitted defines the order into which the data is processed by the integration jobs and golden data certified from this source data.

Interfaces for Integration

This section describes the interfaces available for integration.

REST API

This interface is available for application to consume or publish data from the hub in a programmatic way.

SQL Interface

This interface is available for data integration and ETL products to publish or consume data. It is composed of the INTEGRATION_LOAD PL/SQL Package and the Data Location Database Schema.

The INTEGRATION_LOAD package, stored in the repository schema, contains function to manage the External Load Lifecycle, for the purpose of publishing data in batch mode.

The Data Location Database Schema stores a set of tables that contain the hub data.
This schema contains the landing (or staging) tables used to publish data into the hub. It contains also the golden records tables and the intermediate tables handled by the integration job that create golden records from the source records. There is a single data structure for the entire hub, and a single set of tables for each entity regardless of the model edition.

The data location schema is accessed for integration purposes to:

  • Publish data in batch mode in the landing tables.

  • Consume data from the golden data and master data tables.

The structure of the tables stored in the data location schema is detailed in the Table Structures section.

Data Certification

This section explains the artifacts involved when certify data published into the hub.

Integration Job

The integration job processes the data submitted in an external load and runs this data through the Certification Process, which is a series of steps to create and certify golden data out of this source data.

This job is generated from the certification rules defined at design time, and it uses the data structures automatically created in the MDM hub when deploying the model edition.

Although understanding the details of the process is not needed for publishing source data or consuming golden data, it is necessary to have complete understanding of this process to drill down into the various structures between the source and the golden data. For example, to review the rejects or the duplicates detected by the integration job for a given golden record.

An integration job is a sequence of tasks used to certify golden data for a group of entities. The model edition deployed in the data location brings several integration jobs definitions with it. Each of these job definitions is designed to certify data for a group of entities.

Integration jobs definitions as well as integration job logs are stored in the repository

For example, a multi-domain hub contains entities for the PARTY domain and for the PRODUCTS domain, and has two integration jobs definition:

  • INTEGRATE_CUSTOMERS certifies data for the Party, Location, etc… entities.

  • INTEGRATE_PRODUCTS certifies data for the Brand, Product, Part, etc… entities.

Integration jobs are started when source data has been loaded in the landing tables and is submitted for golden data certification.

Each integration job is the implementation of the overall certification process template. It may contain all or some of the steps of this process. The following section details the structure of the certification process.

Certification Process

The Certification Process creates consolidated and certified Golden Records from source values. The source values are either Source Records, pushed in the hub by applications (the Publishers) or Source Authoring Record, authored by users in the MDM applications.

When a user authors data in the MDM, he either:

  • Creates or edits data that only exists in the MDM hub (for all entities).

  • Overrides values consolidated from the Source Records (for ID and fuzzy matching entities).

This process is automated and involves several phases, generated from the rules and constraints defined in the model. The rules and constraints are defined in the model based on the functional knowledge of the entities and the publishers involved.

Certification Process for ID and Fuzzy Matching Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

  1. Enrich and Standardize Source Data: During this step, the Source Records (in the SD tables) are enriched and standardized using SemQL and Plug-in Enrichers executed Pre-Consolidation.

  2. Validate Source Data: The quality of the enriched source records is checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged (in the SE tables) .

  3. Match and Find Duplicates: For Fuzzy Matched Entities, this step matches pairs of records using a Matcher and creates groups of matching records (match groups). For ID Matched entities, matching is simply made on the ID value.
    The matcher works as follows:

    • It runs a set of Match Rules. Each rule has two phases: first, a binning phase creates small bins of records. Then a matching phase compares each pair of records within these small bins and detects duplicates.

    • Each match rule has a corresponding Match Score that expresses how strongly the pair of records matches. A pair of records that matches according to one or more rules is given the highest Match Score of all the rules that have caused these records to match. Match pairs with scores and rules are stored in the DU table.

    • When a match group is created, an overall Confidence Score is computed for that group. According to this score, the group is marked as a suggestion or immediately merged, and possibly confirmed. These automated actions are configured in the Merge Policy and Auto-Confirm Policy of the matcher.

    • User decisions made on match groups (stored in the UM table) are applied at that phase, superseding the matcher’s choices.

  4. Consolidate Data: This step consolidates (and possibly confirms) match group duplicates into single consolidated records. The Consolidation Rules created in the Survivorship Rules defines how the attributes consolidate. Integration master records and integration golden (consolidated) records are stored at that stage in the GI and MI tables

  5. Enrich Consolidated Data: The SemQL and Plug-in Enrichers executed Post-Consolidation run to standardize or add data to the consolidated records.

  6. Publish Certified Golden Data: This step finally publishes the Golden Records for consumption. The final master and golden records are stored in the GD and MD tables

    • This process applies possible overrides from Source Authoring Record (Involving the SA, SF, GA and GF tables) , according to the Override Rules defined in the Survivorship Rules.

    • It also creates or updates records that only exist in the MDM from Source Authoring Record.

  7. Validate Golden Data: The quality of the golden records is checked against the various Constraints executed Post-Consolidation. Note that unlike the pre-consolidation validation, it does not remove erroneous golden records from the flow but flags them as erroneous. The errors are also logged (in the GE tables) .

Source Authoring Records are not enriched or validated for ID and fuzzy matching entities as part of the certification process. These records should be enriched and validated as part of the steppers into which users author the data.

Certification Process for Basic Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

  1. Enrich and Standardize Source Data: During this step, the Source Records and Source Authoring Records (both stored in the SA tables) are enriched and standardized using SemQL and Plug-in Enrichers executed Pre-Consolidation.

  2. Validate Source Data: The quality of the enriched source data is checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged (in the AE tables) .

  3. Publish Certified Golden Data: This step finally publishes the Golden Records for consumption (in the GD tables) .

Basic entities do not separate Source Records from Source Authoring Records. Both follow the same process.

Source data for basic entities does not pass through enrichers or validations executed post-consolidation.

For more information about these various phase and how the certification process is designed, refer to the Certification Process Design chapter in the Semarchy xDM Developer’s Guide.

Table Structures

This section describes the general structure of the tables involved in the certification process.

Tables

For each entity, a pre-defined set of tables is created. These tables are named with the following convention: <Prefix>_<Physical Table Name>

The prefix is automatically generated depending on the table nature (Source Data, Source Error, etc.), and the Physical Table Name is set in the entity definition.

The following list describes the tables created for a given entity.

Table Name Applicable Entity Type Access Description

SD_<Physical Table Name>

Source Data

Fuzzy and ID Matched

Read/Write

This table contains source data loaded by publishers, using the middleware.

SA_<Physical Table Name>

Source Authoring

All

Read (Write for Basic Entities)

For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated to a SF_<Physical Table Name>, storing flags that indicate which field was overriden.
For basic entities, it contains the records created by the user, and can be used to load data into the hub instead of the SD table. This table also used to trace deletion operations.

SE_<Physical Table Name>
AE_<Physical Table Name>

Source Errors
Source Authoring Error

All

Read

This table contains the errors detected during the pre-consolidation validation phase. It is named AE_<Physical Table Name> for basic entities.

DU_<Physical Table Name>

Duplicates

Fuzzy Matched

Read

This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score.

UM_<Physical Table Name>

Matching Groups

Fuzzy Matched

Read

Table containing the master records being re-grouped by a user during a duplicates management operation.

MI_<Physical Table Name>

Master Integration

Fuzzy and ID Matched

Internal

This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase.

MD_<Physical Table Name>

Master Data

Fuzzy and ID Matched

Read

This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data.

GI_<Physical Table Name>

Golden Integration

Fuzzy and ID Matched

Internal

This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase.

GA_<Physical Table Name>

Golden Authoring

Fuzzy and ID Matched

Internal

This table contains data overrides performed on golden records . It is associated to a GF_<Physical Table Name>, storing flags that indicate which field was overriden, and a GP_<Physical Table Name> storing values consolidated before an override.

GX_<Physical Table Name>

Deleted Golden

Basic

Read

Table containing deleted records logs, plus the deleted data for soft deletes.

GD_<Physical Table Name>

Golden Data

All

Read

This table contains the golden records, including those flagged as erroneous by post-consolidation validations.

GE_<Physical Table Name>

Golden Errors

Fuzzy and ID Matched

Read

This table contains the errors detected during the post-consolidation validation phase on golden records.

For publishing and consuming data, internal tables are not needed. The structure of these tables is not explained in this document.

Columns

About Date Values and Time Zones
Date values are stored, read and written in the time zone of the application server. When consuming dates and timestamp values, time zone conversion may be necessary if the client time zone differs from the application server time zone. Similarly, the certification process must convert dates and timestamp values in the application server’s time zone.
Attribute Columns

Attributes appear in the tables’ structure as follows:

  • Attributes using list of values, built-in and user-defined types are mapped to single columns with a database type and length corresponding to the attribute type. The column name for such a simple attribute is the Physical Column Name value specified in the simple attribute definition.

  • Complex attributes are mapped on columns named with the following convention: <Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>

For example: The Country (COUNTRY) and City (CITY) definition attributes of the Geocoded Address complex attribute (Prefix: GEO) will be mapped to columns named GEOCOUNTRY and GEOCITY.

Built-in Columns

In addition to the attribute columns, built-in columns are added to the tables’ structure, and are used in the certification process.

They track for example:

  • Batch information: B_BATCHID

  • Cause of a rejected record: B_CONSTRAINTNAME, B_CONSTRAINTTYPE

  • Class information: B_CLASSNAME

  • Match/Merge information: B_MATCHGRP, B_CONFSCORE, B_HASSUGGMERGE, B_SUGGMERGEID, B_SUGGMERGECONFSCORE, B_SUGGMERGEMASTERSCOUNT, B_CONFIRMATIONSTATUS, B_MASTERSCOUNT, B_ISCONFIRMED

  • Delete operation information: B_DELETETYPE, B_DELETEDATE, B_DELETEAUTHOR, B_DELETEOPERATION,

The following list describes these columns.

Column Name Datatype Exists in Tables Description

B_LOADID

NUMBER(38,0)

SD, SA, SE, AE, UM

ID of the load into which new data, data changes, overrides or duplicate decisions were sent. ID of the load which source data errors were detected.

B_BATCHID

NUMBER(38,0)

SE, MD, GD, GE

ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied. ID of the batch into which source data errors were detected.

B_PUBID

VARCHAR2(30 CHAR)

SD, SE, UM, MD, GD

For matching entities, code of the publisher that published the record.

B_SOURCEID

VARCHAR2(128 CHAR)

SD, SE, UM, MD, GD

ID of the source record in the source publisher system (Fuzzy Matched entities only). See below for a detailed explanation of the primary key options.

B_CLASSNAME

VARCHAR2(128 CHAR)

SD, SA, SE, UM, MD, GE, GD, GE, GX

Class name of this record. See Class Name for more information.

B_CONSTRAINTNAME

VARCHAR2(128 CHAR)

SE, GE

Name of the constraint causing this error (Error Records)

B_CONSTRAINTTYPE

VARCHAR2(30 CHAR)

SE, GE

Type of the constraint causing this error (Error Records)

B_ERROR_STATUS

VARCHAR2(30 CHAR)

SD, SA, GD

Flag indicating whether the source or golden record has passed successfully or not validations. Possible values are VALID (if the record has no errors), ERROR (if the record has errors), RECYCLED (if the record was recycled and considered valid) or OBSOLETE_ERROR (if this record had errors but a newer version of the record fixed them).

B_DELETETYPE

VARCHAR2(30 CHAR)

SA, GX

Type of delete operation (SOFT_DELETE or HARD_DELETE).

B_DELETEDATE

TIMESTAMP(6)

SA, GX

Date of the delete operation.

B_DELETEAUTHOR

VARCHAR2(128 CHAR)

SA, GX

Author of the delete operation.

B_DELETEOPERATION

VARCHAR2(128 CHAR)

SA, GX

ID of the delete operation.

B_DELETEPATH

VARCHAR2(4000 CHAR)

SA, GX

Cascade path through which the record was reached during delete. Null for record directly selected for deletion.

B_MATCHGRP

NUMBER(38,0)

MD

ID of the match group for the master record. This column is set when matching takes place.

B_CONFSCORE

NUMBER(38,0)

UM, GD

Confidence score of the golden record. It is the average of the match scores in the match group.

B_XGRP

RAW(16))

UM, MD

Exclusion group ID. An exclusion group represents a group of records for which a user has taken split decisions.

B_HASSUGGMERGE

VARCHAR2(1)

UM, MD, GD

Flag indicating that match and merge suggestions are available for this record.

B_SUGGMERGEID

NUMBER(38,0)

UM, MD, GD

New group ID suggested by the automated matching for the record.

B_SUGGMERGECONFSCORE

NUMBER(38,0)

UM, MD, GD

Confidence score for the suggested group.

B_SUGGMERGEMASTERSCOUNT

NUMBER(38,0)

UM, MD, GD

Number of masters for the suggested group

B_CONFIRMATIONSTATUS

VARCHAR2(30)

UM, MD, GD

Confirmation status for duplicate management:
- Confirmed (CONFIRMED): Indicates that a master is confirmed in a golden or that a golden has all its masters confirmed.
- Not-confirmed (NOT_CONFIRMED): Indicates that a master is not confirmed or that a golden is entirely made of unconfirmed masters.
- Historically confirmed (WAS_CONFIRMED, for master records): If a master was confirmed into to a golden but this golden was fused into another golden.
- Partially confirmed (PARTIALLY_CONFIRMED for golden records only): Indicates that a golden has part of his masters confirmed.
See the Confirming Golden Record section in the Semarchy xDM Developer’s Guide for more information about these statuses.

B_MASTERSCOUNT

NUMBER(38,0)

UM, GD

Number of master records contributing to the golden record.

B_ISCONFIRMED

VARCHAR2(1)

GD

Flag indicated whether this golden record has been confirmed (Fuzzy Matched entities only).

B_ORIGINALBATCHID

NUMBER(38,0)

SA, UM

Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.

B_GOLDENTYPE

VARCHAR2(30 CHAR)

GD

For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM (DE_BASED) or was consolidated from publishers' data and possibly overriden (MASTER_BASED)

B_HASOVERRIDE

VARCHAR2(1)

GD

For fuzzy matching and ID matching entities, this flag indicates whether the golden record has override values.

B_CREDATE

TIMESTAMP(6)

SD, SA, UM, DU, MD, GD, GX

Date at which the data creation, overrides or duplicate decisions were made. Note that this date may reflect internal data changes and processing, which does not make it suitable for consumption as a business date.

B_UPDDATE

TIMESTAMP(6)

SD, SA, UM, DU, MD, GD, GX

Last update date of the record. Note that this date may reflect internal data changes and processing, which does not make it suitable for consumption as a business date.

B_CREATOR

VARCHAR2(128 CHAR)

SD, SA, UM, DU, MD, GD, GX

Creator of the new data, override or duplicate decision.

B_UPDATOR

VARCHAR2(128 CHAR)

SD, SA, UM, DU, MD, GD, GX

Latest updator of the data, override or duplicate decision.

Primary Keys Columns

The primary key to load depends on the Entity Type:

Basic

For basic entities, Semarchy xDM uses a single identifier, stored in a column named after the Physical Column Name specified in the primary key attribute definition. This column will exist in all tables. This identifier is simply propagated into the hub from the source records.

ID Matched

When using ID Matching, Semarchy xDM assumes a common identifier across all systems. In this case, this common identifier is stored in a column named after the Physical Column Name specified in the primary key attribute definition.

This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.

Fuzzy Matched

When using Fuzzy Matching, Semarchy xDM assumes no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.

In this case, the source identifier is stored in the column named B_SOURCEID, along with the publisher code stored in the B_PUBID. This is the case for the SD, SE and MD tables.

When publishing data into the hub, the middleware loads this B_SOURCEID column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID.

When the records are consolidated in a golden record (GD and GE tables), a System Defined Primary Key is generated and stored in a column named after the Physical Column Name specified in the primary key attribute definition. This key is referred to as the Golden Record ID.

The MD table makes the bridge between the Source ID and the Golden ID as it contains both these values.

Reference Columns

When a reference exists in the source publisher and need to be expressed in the landing table, this reference mapping in the table structure depends on the Entity Type of the referenced entity.

Reference to a Basic Entity

For a reference to a Basic entity, the referenced key stored is simply the primary key of the basic entity. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_COUNTRIES.

Reference to an ID Matched Entity

For a reference to an ID Matched entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_EMPLOYEE.

For example: if Customer references Employee and this entity uses ID Matching, SD_CUSTOMER will contain the following information.

CUSTOMER_NAME F_EMPLOYEE

Gadgetron

11

Roxxon

56

This information means that the customer Gadgetron references the employee number 11, and Roxxon references employee number 56. This employee number is the same in all systems.

Reference to a Fuzzy Matched Entity

For a reference to a Fuzzy Matched entity, the referenced may point to a master record from a publisher, or directly to a golden record in the hub:

  • For a reference to a master record:

    • The referenced value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with FS_ (for Foreign Source ID). For example, FS_CUSTOMER.

    • The referenced publisher is also stored, in a column named after the Physical Name provided in the definition of the reference and prefixed with FP_ (for Foreign Publisher). For example, FP_CUSTOMER. For a reference to a golden record:

  • The reference value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with F_ (for Foreign ID).

Example 1. Reference to a master record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME FP_CUSTOMER FS_CUSTOMER

John

Doe

CRM

1235

Jane

Smith

MKT

A3251

This information means that the contact John Doe references the customer with the primary key 1235 in the CRM publisher, and that Jane Smith references the customer with the primary key A3251 in the MKT publisher.

Example 2. Reference to a golden record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME F_CUSTOMER

John

Doe

6598

Jane

Smith

6556

This information means that the contact John Doe references the golden customer record with the golden ID 6598, and that Jane Smith references the golden customer with the golden ID 6556.

If both the reference to the master and to the golden are loaded, only the reference to the golden is taken into account.
To clear a reference, you must push a null value in the three FS_, FP_ and F_ columns.

Class Name

Several entities involved in an inheritance relation have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The B_CLASSNAME column is used to identify the class (entity) of a record in a table.

For example, when Person and Company inherit from the Party entity, the resulting table is named after the parent entity (Party), and will contain all the attributes of Person and Company as well. In the GD_PARTY table for example, records representing persons will have B_CLASSNAME='Person'. When publishing person or company information in the SD_PARTY table, the middleware must set B_CLASSNAME='Person' or B_CLASSNAME='Company' accordingly.

Constraints

The only constraints phyisically enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.

For example, on an SD_ table, the following constraints are enforced:

  • primary key on B_LOADID, B_PUBID, B_SOURCEID (or the primary key column for ID Matched entities)

  • B_LOADID, B_CLASSNAME, B_PUBID and B_SOURCEID (or the primary key column for ID Matched entities) are not null

Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model but checked during the validation phase.

Publishing Data Using SQL

Publishing data can be done using PL/SQL to initialize and submit the external load and SQL load the data in the landing (SD for matching entities, SA for basic entities) tables.

Overview

In this approach, external loads are handled via PL/SQL and SQL interface. It works as follows:

  1. The external load is initialized using a PL/SQL call to the INTEGRATION_LOAD.GET_NEW_LOADID function on the repository schema. This function call returns a Load ID.

  2. SD (SA for Basic Entities) tables are loaded using SQL inserts issued on the data location schema.

  3. The external load is submitted or cancelled using PL/SQL function calls on the repository schema:

    • INTEGRATION_LOAD.SUBMIT_LOAD to submit a load identified by its Load ID. This function call returns a Batch ID.

    • INTEGRATION_LOAD.CANCEL_LOAD to cancel a load identified by its Load ID.

This approach is recommended when the middleware is an ETL or data integration product.

Alternately, you can use Continuous Loads to load data into the SD tables with the Load ID of the Continuous Load, without having to explicitly initialize and submit individual external loads.

Initializing a Load

Initializing an external load uses the following parameters:

  • data_location_name: name of the data location. This data location is the one hosting the MDM hub into which the external load is performed.

  • program_name: This variable is for information only. It is used to identify the middleware performing the external load. For example ’ETL Custom Script’.

  • load_description: This variable is for information only. It is used to describe the nature of this load. For example: ’Daily updates for Customers and Contacts’.

  • user_name: name of the user initializing the external load. This user may or may not be a user defined in the security realm of the application server.

This function call to INTEGRATION_LOAD.GET_NEW_LOADID initializes an external load and returns a Load ID. It is performed on the repository database schema.

Initializing and External Load
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
        '<data_location_name>'  /* Data Location Name (As in the UI) */
        '<program_name>'        /* Informational. Identifies the Middleware*/
        '<load_description>'    /* Informational. Describes the load. */
        '<user_name>'           /* User initializing the load. */
          );

The following example performs a function call initializing an external load on the CustomerMDMHub data location. The repository schema is REPO. The returned Load ID is stored in the vLoad_id variable.

Example: Initializing an external load.
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
        'CustomerMDMHub',
        'Custom ETL',
        'Initial Load for the Hub',
        'John Doe' );

Loading Data

Loading data consists in inserting new source records in the SD (and SA for Basic Entities) tables for the set of entities taken into account in the integration job.

When loading data in the SD or SA tables:

  • You use SQL insert commands or your regular ETL/Data Integration Platform.

  • The insert commands are issued on the SD or SA tables stored in the data location schema.

Tables to Load

Make sure to load all the SD and SA tables for the entities that will be taken into account in the integration job. For example, if the integration job processes the Customer and Contact entities, then you should load the SD_CUSTOMER and SD_CONTACT tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded the SD_EMPLOYEE table will not be taken into account by the integration job to certify golden records for the Employee entity.

Referential Integrity and Load Order

There is no required order to load the SD and SA tables, as no foreign keys are implemented on these tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the Columns to Load section.

Using Transactions

It is recommended to use a database transaction when writing to the SD and SA tables.

  • For external loads: Canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables then canceling the load leaves useless information in the tables.
    Using a transaction gives you the capability to rollback all changes on the tables when canceling a load.

  • For continuous loads, it is strongly recommended to use a transaction and commit only when all tables are loaded: as the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is completed.

Columns to Load

This section provides some guidance for loading the columns of the SD and SA tables.

System Columns (Mandatory)

The following system columns must be loaded as indicated:

  • B_LOADID: This column must be loaded with the Load ID provided by INTEGRATION_LOAD.GET_NEW_LOADID() or getNewLoadID() call.

  • B_CLASSNAME: Name of the entity (or class) being loaded. When inheritance is used, the same table stores data for all parent and child classes. Set explicitly B_CLASSNAME to the name of the entity for which data is being published. For example: Person, Party, Company. See the Class Name section for more details.

  • B_PUBID: This column must be loaded with a Publisher Code for ID Matched and Fuzzy Matched entities. For example: CRM, MKT, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the Publishers node in the Model Edition view in the Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a Preferred Publisher consolidation strategy.

Basic entities assume a single source, and do not use the B_PUBID column.
Publisher codes are case sensitive. Make sure to load B_PUBID with the publisher code as defined in the model. Publisher codes may contain uppercase letters, digits and underscores.
Within a single load, you can load the records from various publishers, using the B_PUBID column to identify each publisher
Primary Keys (Mandatory)

For matching entities, the primary key that you load in the SD table allows identifying the source record from the publisher (identified by Publisher Code B_PUDIB). This primary key will allow taking into account the creation or update of a source record and report through the consolidation process the corresponding data changes to the golden record.

For basic entities, the primary key that you provide in the SA table is the one of the golden record and will allow updating this record.

The primary key column to load depends on the Entity Type of the entity.

Basic Entity

If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

ID Matched Entity

If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

Fuzzy Matched Entity

If the entity uses Fuzzy Matching, then you must load in the B_SOURCEID column the value of the primary key from the source system. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the B_SOURCEID column.

The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.
References (Mandatory)

When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the Entity Type of the referenced entity.

Reference to a Basic Entity

If the referenced entity is an Basic entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

Reference to an ID Matched Entity

If the referenced entity is an ID Matched entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

For example, if Customer references Employee and this entity uses ID Matching, then you must load in SD_CUSTOMER the F_EMPLOYEE column with the primary key of the source employee record referenced by each customer record.

Reference to a Fuzzy Matched Entity

If the referenced entity is a Fuzzy Matched entity, then you need to load two columns:

  • FS_<Physical Name of the Referenced To Role Name>: Load this column with the Source ID of the referenced record.

  • FP_<Physical Name of the Referenced To Role Name>: Code of the publisher of the referenced record.

Note that these columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.

For example, if Contact references Customer and this entity use Fuzzy Matching, you must load in SD_CONTACT the following columns:

  • FP_CUSTOMER: Code of the publisher providing the customer referenced by the given contact, e.g., MKT.

  • FS_CUSTOMER: Source ID of the customer referenced by the given contact, e.g., 81239.

Attribute Columns

You should load the attribute columns relevant for the entity you are loading.

Make sure to load:

  • The attribute columns that make sense for the entity class (B_CLASSNAME) you are loading.

  • The mandatory attribute columns. If not, pre-consolidation validation may reject source records with null values.

  • The columns for attributes using a list of values type. If these are loaded with values out of the LOV range of values, pre-consolidation validation may reject source records.

Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation.
Other Columns (Optional)

The following columns do not need to be loaded or can be optionally loaded:

  • B_ORIGINALBATCHID: This column is not used for external loads and should not be loaded.

  • B_CREATOR, B_UPDATOR: These columns can be optionally loaded. Their value will default to the name of the user submitting the load.

  • B_CREDATE, B_UPDDATE: These columns can be optionally loaded.
    If B_CREDATE is not loaded, it will remain null. It is recommended to load this column with the current date.

Submitting a Load

Submitting an external load uses the following parameters:

  • load_id: Load ID returned by the load initialization.

  • integration_job: Name of the integration job to process this load.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The INTEGRATION_LOAD.SUBMIT_LOAD function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.

Submitting an External Load
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
        <load_id>                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<integration_job>'        /* Name of the Integration Job to trigger. */
        '<user_name>'                /* User who has initialized the load. */
          );

The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name INTEGRATE_DATA. The repository schema is REPO. The returned Batch ID is stored in the vBatch_id variable.

Submitting an external load identified by the Load ID 22.
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
        22,
        'INTEGRATE_DATA',
        'John Doe' );

Canceling a Load

Canceling a load is performed using the CANCEL_LOAD function with the following parameters:

  • load_id: Load ID returned by the load initialization.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The INTEGRATION_LOAD.CANCEL_LOAD procedure cancels an external load identified by its Load ID. It is performed on the repository database schema.

This procedure does not flush the content of the SD_% tables loaded during the external load. This must be taken care of separately.
Canceling an External Load
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
        <load_id>                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<user_name>'                /* User who has initialized the load. */
          );
Call to INTEGRATION_LOAD.CANCEL_LOAD to cancel an external load identified by the Load ID 22.
REPO.INTEGRATION_LOAD.CANCEL_LOAD(
        22,
        'John Doe' );

Consuming Data Using SQL

Consuming Data from Semarchy xDM is done in SQL using the tables of the hubs. This chapter covers this type of consumption.

Overview

Consuming the data mainly involves:

  • The Golden Data. Contains the enriched, consolidated, validated and certified golden records

  • The Master Data. Contains the master records linked to the golden records. Master records contain the references to the source records.

Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.

Consuming Golden Data

Golden Data Table Structure

The complete list of system columns available is provided in the Table Structures section of this guide. The following table lists the system columns used when consuming information from the hub:

Column Name Datatype Description

B_BATCHID

NUMBER(38,0)

ID of the batch that created this record.

B_CLASSNAME

VARCHAR2(128 CHAR)

Class name of this record.

B_CREDATE

TIMESTAMP(6)

Creation date of the record. Note that this date may reflect internal data changes and processing, which does not make it suitable for consumption as a business date.

B_UPDDATE

TIMESTAMP(6)

Last update date of the record. Note that this date may reflect internal data changes and processing, which does not make it suitable for consumption as a business date.

B_CREATOR

VARCHAR2(128 CHAR)

Creator of the record.

B_UPDATOR

VARCHAR2(128 CHAR)

Last updater of this record.

Accessing Golden Data Using SQL

To access golden data using SQL, you query on the GD table. The queries are filtered to access data for one or more entity classes,

The following code sample gives a query to access golden records in the GD table.

Selecting the golden data.
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <classname_list> ) (3)

Explanation of the query:

1 We select all columns from the golden record. You can select specific columns from the GD record.
2 The GD table accessed is named after the entity. Replace <Physical Table Name> with the physical table name defined for the entity.
3 Provide the list of entity classes stored in this table you want to access. For example, Person,Company. Note that if you are no using inheritance, only one class is stored in the table and this clause is not required.
Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity.
select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where G.B_CLASSNAME = 'Contact'

Golden Data Primary Key

The primary key for a golden data table depends on the Entity Type and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:

  • For ID Generation - Sequence: NUMBER(38,0)

  • For ID Generation - UUID_: RAW

  • For ID Generation - Manual, the datatype is the one defined for the primary key attribute.

Consuming Master Data

Accessing master data from fuzzy and ID matched entities uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.

There is no master data for basic entities, as on source data, source errors and golden data table are stored.

Depending on the Entity Type (ID Matched, Fuzzy Matched), access to the master data differs:

Accessing Master Data Using SQL (ID Matched Entity)

With ID Matched Entity, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (<primary_key_column> in the sample below)

Select the golden data corresponding to a given source record ID (ID Matched Entity)
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M
inner join GD_<Physical_Table_Name> G on (
         G.<primary_key_column> = M.<primary_key_column>
  and        G.B_CLASSNAME in ( <classname_list> )
)
where
  and        M.B_CLASSNAME in ( <classname_list> )
  and         M.B_PUBID = '<publisher_code>'
  and         M.<primary_key_column> = '<searched_source_id>'

In this code, access is filtered with the class name for both the golden data and the master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #9) and ID of the source record (line #10).

Example: Select the golden data for the Employee source record having the EMPLOYEE_NUMBER=100 in the HR system (ID Matched Entity).
select         M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME
from         MD_EMPLOYEE M
inner join GD_EMPLOYEE G on (
         G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and        G.B_CLASSNAME = 'Employee'
)
where
  and        M.B_CLASSNAME = 'Employee'
  and         M.B_PUBID = 'HR'
  and         M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Accessing Master Data Using SQL (Fuzzy Matched Entity)

With Fuzzy Matched Entity, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:

  • B_PUBID (VARCHAR2(30 CHAR)) contains the code of the publisher that published this record.

  • B_SOURCEID (VARCHAR2(128 CHAR)) contains the ID of the source record in that publisher

As a consequence, the link between golden and master records is done using the primary key column as in an ID Matched Entity, but the link to the source is done using the B_SOURCEID column.

Select the golden data for a given source record ID (Fuzzy Matched Entity)
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.*
from MD_<entity> M
inner join GD_<entity> G on (
         G. <primary_key_column> = M. <primary_key_column>
  and        G.B_CLASSNAME in ( <classname_list> )
)
where
  and        M.B_CLASSNAME in ( <classname_list> )
  and         M.B_PUBID = '<publisher_code>'
  and         M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #9) and ID of the source record (line #10), using the B_SOURCEID column.

Example: Select the golden values for the source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity).
1
2
3
4
5
6
7
8
9
10
select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME
from         MD_CONTACT M
inner join GD_CONTACT G on (
         G.CONTACT_ID = M.CONTACT_ID
  and        G.B_CLASSNAME = 'Contact'
)
where
  and        M.B_CLASSNAME = 'Contact'
  and         M.B_PUBID = 'CRM'
  and         M.B_SOURCEID = '27030'

Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity). In this example, the master data table is used twice (aliased as M and MM) to retrieve the two sides of a duplicate pair.

Example: Side-by-side duplicates
select
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME
from         MD_CONTACT M
inner join GD_CONTACT G on
 (
         G.CONTACT_ID = M.CONTACT_ID
  and        G.B_CLASSNAME = 'Contact'
 )
inner join MD_CONTACT MM on
 (
         MM.CONTACT_ID = M.CONTACT_ID
  and        MM.B_CLASSNAME = 'Contact'
 )
where
  and        M.B_CLASSNAME = 'Contact'
  and         M.B_PUBID = 'CRM'       /* Publisher ID */
  and         M.B_SOURCEID = '27030'  /* Source ID */
  /* and M.B_PUBID = MM.B_PUBID */
  /* Uncomment the previous line to restrict the duplicates
     to those within the CRM application */

Consuming Errors

Accessing Pre-Consolidation Errors Using SQL

Pre-consolidation errors can be accessed via the Source Errors (SE, or AE for basic entities) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.

Note that the B_ERROR_STATUS column on the SD table provides a simple way to detect the records with errors.

Example: Select the source errors for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

Select the contact records in error
  select * from SD_CONTACT from B_ERROR_STATUS = 'ERROR';
Select the errors as well as the data for the source contacts in error
select SE.B_BATCHID, SD.B_LOADID,
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
  SE.B_PUBID, SE.B_SOURCEID,
  SD.FIRST_NAME, SD.LAST_NAME,
  SD.FP_CUSTOMER, SD.FS_CUSTOMER
from SE_CONTACT SE, SD_CONTACT SD
where
  SE.B_CLASSNAME = 'Contact'
  and SD.B_CLASSNAME = 'Contact'
  and SD.B_PUBID = SE.B_PUBID
  and SD.B_SOURCEID = SE.B_SOURCEID
  and SD.B_LOADID = SE.B_LOADID
  and SD.B_ERROR_STATUS = 'ERROR';

Accessing Post-Consolidation Errors Using SQL

Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

GE tables do not store the erroneous data itself, but they contain the identifier to the golden (GD) records in error.

Note that the B_ERROR_STATUS column on the GD table provides a simple way to detect the records with errors.

Example: Select the errors and data for the golden contacts in error.

Select the golden contacts in error
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
Select the errors as well as the data for the golden contacts in error
select GE.B_BATCHID,
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
  GE.CONTACT_ID,
  GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER
from GE_CONTACT GE, GD_CONTACT GD
where
  GE.B_CLASSNAME = 'Contact'
  and GD.B_CLASSNAME = 'Contact'
  and GD.CONTACT_ID = GE.CONTACT_ID
  and GD.B_BATCHID = GE.B_BATCHID
  and GD.B_ERROR_STATUS = 'ERROR';

Consuming Data Using the REST API

The REST API provides programmatic access to read data in Semarchy xDM. The REST API identifies users using authentication methods supported by Semarchy xDM; responses are in JSON format.

Overview

The REST API is available at the following base URL (referred to as [base_url] in this section):

http://<host>:<port>/semarchy/api/rest/

Querying Records

Query Multiple Records

Method

GET

Base URL

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type] [base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [entity_name]: Name of the entity to query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value must be provided for the SDE view only.

Supported Parameters

Response Format

The response contains the list of records with the query expressions.

Multiple records query: sample response.
{
  "records": [
        {
          "CustomerName": "Gadgetron",
          "City": "Honolulu"
        },
        {
          "CustomerName": "StayPaft",
          "City": "Honolulu"
        }
  ]
}

Query a Single Record

Method

GET

Base URL

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type]/[record_id]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])/[record_id]

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [entity_name]: Name of the entity to query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value must be provided for the SDE view only.

  • [record_id]: ID of the record to query.

Supported Parameters

Response Format

The response contains the list of query expressions.

Single record query: sample response.
{
    "CustomerName": "Gadgetron",
    "City": "Honolulu",
    "UPPER(CustomerName)": "GADGETRON"
}

Counting Records

Method

GET

Base URL

URL

[base_url]/count/[data_location_name]/[entity_name]/[view_type]
[base_url]/count/[data_location_name]/[entity_name]/[view_type]([load_id])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to count

  • [entity_name]: Name of the entity to count

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value must be provided for the SDE view only.

Supported Parameters

Response Format

The response contains the count of records.

Record count query: sample response.
{
  "recordCount": "42",
}

Named Queries

Named queries provide customized REST endpoints to consume data with a predefined structure and query parameters.

For more information about Named Queries, see the Creating Named Queries section in the Semarchy xDM Developer’s Guide.

Method

GET

Base URL

URL

[base_url]/named-query/[data_location_name]/[named_query]/[view_type] [base_url]/named-query/[data_location_name]/[named_query]/[view_type]([load_id])

URL Structure

The URL is configured using the following values:

  • [data_location_name]: Name of the data location to query

  • [named_query]: Name of the named query

  • [view_type]: Type of data view to query. See View Types for more information.

  • [load_id]: ID of the load containing the source records. This value must be provided for the SDE view only.

Supported Parameters

In addition, query parameters are passed to the query using their name. For example: ?param1=value1&param2=value2

Sorting applies before the sort expression defined in the name query itself. Pagination and Filtering only apply to the root object of the named query.

Response Format

The response contains a hierarchy of objects corresponding to the named query definition.

Query Parameters

View Types

Certain URLs support a view type parameter. The following view types are available:

  • GD for golden records,

  • GE for errors on golden records (matching entities only), returned with the data from the erroneous record,

  • GDWE for golden records, filtered to only show those with errors (matching entities only),

  • MD for master records (matching entities only)

  • SD for source records from finished loads,

  • SE for errors on source records, returned with the data from the erroneous record,

  • SDWE for source records from finished loads, filtered to only show those with errors,

  • SDE for source records in a load. Note that this view type must be followed by ([load_id]) value. Note that this view automatically resolves references to source/golden records (for basic entities) or source/master records (for matching entities), as performed when authoring the records in the applications.

Pagination

When calls return a large number of records, it is recommended to page the record set. By requesting smaller subsets of data, you will get a response much faster than when requesting the entire, potentially large, record set.

Parameter Default Value Description

$offset

0

Defines the record offset for pagination.

$limit

100

Defines the maximum number of returned records.

Example: Request the Customers between position 200 and 299
/query/CustomerAndFinancialMDM/Customer/GD&$offset=200&$limit=100
Query Expressions

Query expressions define which values are returned by the query for each record.

Parameter Default Value Description

$baseexprs

USER_ATTRS

Defines the set of base attributes to include in the records, in addition to those specified using the expr parameter. Possible values:

  • NONE: No attributes.

  • USER_ATTRS: All entity attributes, except the built-in attributes and references.

  • VIEW_ATTRS: All entity attributes, except references. Include the built-in attributes.

$expr

N/A

Expression to include to the record in addition to, or to remove from, the base attributes (baseexprs). You can put as many expressions as you need as query parameters. These expressions may be one of the following:

  • [semql_expression]: A SemQL expression to add to the base attributes. The expression string is used as the alias for the value.

  • [alias]:[semql_expression]: A SemQL expression to add to the base attributes, preceded by its alias.

  • $exclude:[semql_expression]: A SemQL expression to remove from the base attributes.

Example: Query all Customer attributes, except the built-in ones. Add the Creator and CreationDate built-in attributes.
/query/CustomerAndFinancialMDM/Customer/GD&$baseexprs=USER_ATTRS&$expr=Creator&$expr=CreationDate
Example: Query only the CustomerName (in uppercase, aliased as 'Name') and TotalRevenue (aliased as 'Revenue').
/query/CustomerAndFinancialMDM/Customer/GD&$baseexprs=NONE&$expr=Name:Upper(CustomerName)&$expr=Revenue:TotalRevenue
Example: Query all Customer attributes, except the built-in ones and the TotalRevenue.
/query/CustomerAndFinancialMDM/Customer/GD&$baseexprs=USER_ATTRS&$$exclude:TotalRevenue
Sorting

Sort expressions define the order of the records in the response.

Parameter Default Value Description

$orderby

None

SemQL Order By Clause defining the order of the records in the response.

Example: Query Customers sorted by TotalRevenue and CustomerName
/query/CustomerAndFinancialMDM/Customer/GD&$orderby=TotalRevenue%20DESC,CustomerName%20ASC
Filtering

Use filters to request only specific records, that match the criteria you choose.

Parameter Default Value Description

$f

SemQL filter. It is a SemQL condition applied to filter records. Only records matching this condition are returned. You can combine several several SemQL filters in the same query.

Example: Query Contacts whose FirstName starts with 'Joe'
/query/CustomerAndFinancialMDM/Contact/GD?$f=FirstName%20LIKE%20%27Joe%25%27

Publishing Data Using the REST API

The REST API provides also provides programmatic management of data loads in Semarchy xDM, using requests as well as responses in JSON format.

Overview

The REST API provides the capabilities to manage loads, which includes querying, creating, submitting and canceling loads. It also supports persisting records in existing loads. Finally, it provides a shortcut to Load and Submit data in a single request.

Querying Existing Loads using REST

Method

GET

Base URL

URL

[base_url]/loads/[data_location_name]

Supported Parameters

  • $offset: Defines the results offset for pagination.

  • $limit Defines the maximum number of returned results.

  • $batchId: Limit results to the loads with given Batch ID.

  • $loadStatus: Limit results to the loads in a given status.

  • $loadType: Limit results to the loads of given type.

  • $jobNamePattern: Limit results to the loads with a job name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

  • $programNamePatter: Limit results to the loads created with program name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

  • $loadCreator: Limit results to the loads with this creator.

  • $batchSubmitter: Limit results to the loads with this submitter.

  • $loadDescriptionPattern: Limit results to the load with a description matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

Response Format

The response contains the list of loads meeting the criteria. The information returned for each load depends on the status of the load.

Query Existing Loads: sample response.
{
  "loads": [
    {
      "loadId":55,
      "loadType": INTEGRATION_LOAD,
      ...
    },
    {
      ...
    }
  ]
}

Load Status

The possible values for the Load Status are listed in the table below.

Load Status Description

RUNNING

The load is currently running.

CANCELED

The load was canceled (CancelLoad)

PENDING

The load was submitted. A batch was created and is waiting for the batch poller to pick it.

SCHEDULED

The batch was taken into account by the batch poller. The job is queued by the engine.

PROCESSING

The batch’s job is currently being processed by the engine.

SUSPENDED

The job is suspended, either by an administrator or due to an error. It awaits for administrator intervention.

WARNING

The job completed successfully, but some records have caused validation errors.

DONE

The job completed successfully with no validation errors.

ERROR

The job did not complete successfully, it was canceled by an administrator.

Querying a Load by ID using REST

Method

GET

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Response Format

The response contains the load identified by [load_id]. The information returned for the load depends on its status.

Query one Load: sample response.
{
        "loadId":"55",
        "loadType": INTEGRATION_LOAD,
        ...
}

Initializing a Load using REST

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD action, as well as the information required to create a new load.

Load creation: sample request.
{
        "action":"CREATE_LOAD",
        "programName": "curl",
        "loadDescription": "Load Customers"
}

Response Format

The response contains the load information, including the load ID, load type, and an indication of the status.

Load creation: sample response.
{
          "loadId":"55",
        "loadType": "INTEGRATION_LOAD",
        "loadStatus":"RUNNING",
        ...
}

Loading Data using REST

To load data in a given load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the PERSIST_DATA action, as well as the information required to load data. This information includes:

  • The persistOptions, which define how records are published into the hub. See Configuring Data Loading in REST for more information.

  • The persistRecords, which contain the records to be loaded into the hub.

Load data: sample request.
{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Customer": {
        "enrichers":[],
        "validations":[],
        "queryPotentialMatches": true
        },
      "Contact": {
        "enrichers":["AddressEnricher"],
        "validations":[
          {
          "validationType": "CHECK",
          "validationName": "ValidateAddressCompleteness"
          }
        ],
        "queryPotentialMatches": true
        }
      },
    "missingIdBehavior": "GENERATE",
    "persistMode": "IF_NO_ERROR_OR_MATCH"
    },
  "persistRecords": {
    "Customer": [
      {
        "Name": "Gadgetron"
      }
    ]
  }
}

Response Format

The response contains, in the records element, the enriched records, the failedValidations as well as the potentialMatches (if any).
The response also contains the records' status as well as the load information..

Load submission: sample response.
{
  "records": [
      {
        "Customer": [
          "entityName": "Customer"
          "recordValues": {
              "PublisherID": "CRM",
              "SourceID": "4",
              "CustomerName": "GADGETRON",
          },
          "failedValidations": [
              {
                "validationType": "CHECK",
                "validationName": "ValidateAddressCompleteness"
              }
          ],
          "potentialMatches": [
              {
                "matchRuleName": "MatchByName",
                "matchScore": "80",
                "matchedRecordLocation": "GD",
                "matchedRecordId": {
                    "SourceID": "45",
                    "PublisherID": "HR"
                }
              }
          ]
          }
      },
      ...
  ],
  "status": "PERSIST_CANCELLED", (1)
  "load": { (2)
        "loadId":"55",
        "loadType": INTEGRATION_LOAD,
        ...
        }
}
1 The status is PERSISTED if the data was persisted, or or PERSIST_CANCELLED, if the data has not been persisted, for example if validations failed, or if a match was found.
2 Load information, similar to the information returned when Querying a Load by ID using REST

Configuring Data Loading in REST

Persist Options

When loading one or more records you can configure the following elements in the persistOptions element:

  • For each entity:

    • enrichers: List of enrichers that should be executed before persisting the records.

    • validations: List of validations that should be executed after the enrichers.

    • queryPotentialMatches: Detect and report potential matches, using the Matcher defined for the entity.

  • For the entire load:

    • missingIdBehavior: Option to define whether to generate IDs when they are not provided in the payload. Possible values are GENERATE to generate the ID or FAIL to fail loading if the ID is missing.

    • persistMode: Flag to define whether the records should be persisted or not.

Enrich, Validate and Detect Matches

When loading data, you can choose to execute enrichers, validations, and matchers for each entity.

You must create, for each entity you want to configure, under the optionsPerEntity element, one element named after the entity. For each entity, you can:

  • give a list of enrichers to run, with their enricher names,

  • give a list of validations to execute, with their validationType and validationName. Possible validationType values are CHECK, PLUGIN, MANDATORY, LOV, FOREIGN or UNIQUE,

  • indicate with the queryPotentialMatches boolean element whether the platform should check for duplicates according to the matching rules defined for the entity.

Data Loading Behavior

When invoked with a payload, the REST operation will run the enrichers, validations and matchers for each record, depending on the entity configuration.

It will then return:

  • The enriched data

  • a list of validation errors if any

  • a list of potential matches detected by the matching rules.

The record may be persisted or not at that stage, depending on the persistMode option:

  • If set to ALWAYS, then the records are persisted even if they have errors and potential matches.

  • If set to NEVER, then the records are not persisted. Use this option to perform a dry-run to test your records.

  • If set to IF_NO_ERROR_OR_MATCH (default value), then the records are persisted if no validation error was raised and no potential match was found.

Submitting a Load using REST

To submit a load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the SUBMIT action, as well as the job to use when submitting the data.

Load submission: sample request.
{
          "action":"SUBMIT",
        "jobName": "INTEGRATE_ALL"
}

Response Format

The response contains the load information, including the load ID, batch ID, and an indication of the status.

Load submission: sample response.
{
          "loadId":"55",
        "loadType": "INTEGRATION_LOAD",
        "batchId":"56",
        "loadStatus":"PENDING",
        ...
}

Canceling a Load using REST

To cancel a load, the URL must contain the Load ID that was returned at load creation time.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains only the CANCEL action.

Load cancellation: sample request.
{
          "action":"CANCEL"
}

Response Format

The response contains load ID as well as an indication of the status.

Load cancellation: sample response.
{
          "loadId":"55",
        "loadType": "INTEGRATION_LOAD",
        "loadStatus":"CANCELED"
        ...
}

Load and Submit Data using REST

Using the REST API, it is possible to create a load, load data and submit the load in a single request.

Method

POST

Base URL

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD_AND_SUBMIT action, as well as the information required to create a new load, load data and submit the load. This includes the the persistOptions and persistRecords elements.

Load and Submit: sample request.
{
          "action":"CREATE_LOAD_AND_SUBMIT",
        "programName": "curl",
        "loadDescription": "Customer Load",
        "jobName": "INTEGRATE_DATA",
        "persistOptions": (1)
        "persistRecords": (2)
}
1 See Configuring Data Loading in REST for more information about the persistOptions.
2 See Loading Data using REST for more information about the persistRecords.

Response Format

The response contains load ID as well as an indication of the status.

Load and Submit: sample response.
{
    "status":  (1)
    "records": (2)
    "load":    (3)
}
1 PERSISTED or PERSIST_CANCELLED, if the data has not been persisted, for example if a match was found.
2 See Loading Data using REST for more information about this payload.
3 Load information, similar to the information returned when Querying a Load by ID using REST.