This tutorial teaches you how to query data in Semarchy xDM using the SQL API. You will learn basic and advanced queries as well as important integration concepts.

This tutorial is based on PostgreSQL and pgAdmin 4. You can run the same queries on SQL Developer for Oracle or SQL Server Management Studio for SQL Server, or use a universal client such as DBeaver.

What you'll learn

Before you start

This tutorial is the second SQL-based unit within the Integration track. Before following this unit, you must:

The estimated duration of this unit is about 1 hour.

Knowing how to query data using SQL is a critical step to build a successful MDM program.

While you ordinarily would rely on a middleware tool or a user-designed program to consume data from xDM, it is still important for you, as the integration developer, to understand the consumption methods described in this tutorial unit, starting with simple queries.

What you'll learn

Get all golden customer records

The most basic query selects Person records from the golden data (GD) table. You will also select specific columns which is a more useful query for analyzing data.

  1. Run the following query in pgAdmin (or another SQL client) to select all columns from the golden person data (GD_PERSON) table:
select
    gd.*
from
    gd_person gd;

These are all the golden customer records, with all available columns.

  1. Run the following query to select only the golden ID, first name, and last name columns:
select
    gd.id,
    gd.first_name,
    gd.last_name
from
    gd_person gd;

These are all the golden customer records with only the golden ID, first name, and last name columns.


Get master and golden records side-by-side

When looking at matched customer records, it is useful to analyze groups of master records alongside the resulting golden record, in order to compare how the master records were de-duplicated. To get this view, you must join the master data (MD) table with the golden data (GD) table:

  1. Run the following query to select the Publisher ID, Source ID,first name and last name from the master records and the golden ID, first name, and last name from the golden records.
select
    md.b_pubid publisherid,
    md.b_sourceid sourceid,
    md.first_name masterfirstname,
    md.last_name masterlastname,
    gd.id goldenid,
    gd.first_name goldenfirstname,
    gd.last_name goldenlastname
from
    md_person md
    inner join gd_person gd on gd.id = md.id
order by gd.id asc;

The join is on the golden ID, which appears on both the master and the golden tables.

  1. Observe the differences between master (in blue) versus golden (in yellow) data.

Have a closer look at customer Jass Ellerbusch (golden ID 10017):

In the next section, you will get a golden record given the source ID.


Query golden records with a given source ID

Sometimes, you may want to look up a golden record but all you have is an ID from the source system. To query the golden records, when you know their source ID, proceed as follows:

  1. Run the following query to select the golden record related to a source record from the CRM system having source ID 1483060:
select
    md.b_pubid publisherid,
    md.b_sourceid sourceid,
    md.first_name masterfirstname,
    md.last_name masterlastname,
    gd.id goldenid,
    gd.first_name goldenfirstname,
    gd.last_name goldenlastname
from
    md_person md
    inner join gd_person gd on (
        gd.id = md.id
    )
where
        md.b_pubid = 'CRM'
    and
        md.b_sourceid = '1483060'; /* SourceID */
  1. Have a look at the result of the query:

Congratulations

You have successfully run basic queries via SQL to get information in xDM.

What we've covered

In the next part, you will execute advanced queries to retrieve information, such as customer records and the products they purchased.

Moving forward from basic queries using SQL, you will learn in this step how to use advanced queries to broaden the analyses you are able to do.

What you'll learn

Query customers and their products

You can query customers and see the products they have purchased. This is useful for analytics and BI tools to answer questions like "which customers have purchased a certain product?" or "who are our top-paying customers?".

  1. Run the following query to select the customer's golden ID, first name, last name, and the products each customer has bought.
select
    pe.id,
    pe.first_name,
    pe.last_name,
    pr.product_name
from gd_person pe
    inner join gd_person_product pp on pe.id = pp.f_person
    inner join gd_product pr on pr.id = pp.f_product
order by pe.first_name desc;
  1. Have a look at results:

In the next section, you will query master records to see the corresponding duplicate masters.


Side-by-side duplicates

In this step, you will learn to query a specific master record and see the corresponding master records that are considered duplicates side-by-side. The query will also show the corresponding golden record that the master records consolidate into:

  1. Run the following query:
select
   md.b_pubid dup1_publisher,
   md.b_sourceid dup1_id,
   md.first_name dup1_first_name,
   md.last_name dup1_last_name,
   md2.b_pubid dup2_publisher,
   md2.b_sourceid dup2_id,
   md2.first_name dup2_first_name,
   md2.last_name dup2_last_name,
   gd.id gold_id,
   gd.first_name golden_first_name,
   gd.last_name golden_last_name
from md_person md
   inner join gd_person gd on (
           gd.id = md.id
   )
   inner join md_person md2 on (
           md.id = md2.id
   )
where   md.b_pubid = 'CRM'           /* Publisher ID */
   and md.b_sourceid = '1419728'    /* Source ID */
/* and md.b_pubid = md2.b_pubid */   /* uncomment this line to restrict the duplicates to those within the CRM application */;
  1. Note that the dup2_publisher and dup2_id columns show the master records that matched with the record of Antonia Mattos with ID 1419728 from CRM.

Congratulations

You have successfully run advanced queries via SQL to get information in xDM.


What we've covered

In the next part, you will learn how integration works to understand the concepts to allow you to write your own queries.

Now you have explored a data location with several SQL queries, it is time to learn more about some integration concepts.

What you'll learn

Publishers

When you load data into xDM, we refer to the source applications where the data came from as publishers.

An application that publishes data into xDM is assigned a publisher code in the Application Builder. This code is known as the Publisher ID and is loaded into the column b_pubid.

The publisher ID is necessary for loading data. It is also useful when you query data to trace which applications the master data came from. xDM displays the publisher ID in the master record's ID in the Customer B2C application.

Types of records

Semarchy xDM deals with several types of records that corresponds to different steps of the integration process:

The transformation process from source records to golden data within Semarchy xDM is called the Data Certification process, or Integration process.

Focus on matching and survivorship

Matching is a key step within the Data Certification process. Its role is to detect duplicates to consolidate them into a golden record. There are 2 different types of matching:

When records are matched together, survivorship rules select for a given attribute the most appropriate value from the various source records.

Entity types

Each entity you design in Semarchy Application Builder has a given entity type that defines the entity capabilities for match/merge and authoring. Entity types are:

In the next steps, we will have a more detailed look at each type.

If you have two records with the same ID in different systems, duplicate data within the same system (common with customer data), or no common identifier across systems, you will need to match records and consolidate them to become a "best-of-breed" golden record.

Fuzzy Matched Entity

The Fuzzy Matched Entity is designed to handle the scenario where there is no common identifier across systems. A golden ID will then be generated when records are consolidated. As such, you will define match and survivorship rules with a Fuzzy Matched Entity.

Let's now look at how primary keys work in Fuzzy Matched Entities.

Fuzzy primary keys

The primary key is very useful for filtering data in your queries, as you learned when querying golden records based on a given source ID.

In practice

Let's look at an example of a Fuzzy Matched Entity:

  1. Navigate to the Customers view under Browse Data. Customer data is modeled using a Fuzzy Matched Entity.
  2. Look for the Person record with the name Callie Aldaba.
  3. Click on this customer to see more information. Note the value of the golden ID (in our example, 10039):

  1. Navigate to the MASTER RECORDS tab.
  2. Note the two master records that contributed to this customer's golden records: "CRM.1386237" and "MKT.1438732". We will retrieve these values in the database later on.

Let's know switch to your SQL client and have a look at the database:

  1. Start by querying the GD_PERSON table:
select
    gd.*
from
    gd_person gd;

The golden data customer table (GD_PERSON) contains the golden ID (ID column).
b_pubid and b_sourceid are the publisher ID and Source ID which won the survivorship process.

  1. Let's carry on our exploration by scrolling right.

b_mastercount is the number of Master records underlying the golden record.
b_confscore is the confidence score of the matching and b_confirmationstatus indicates whether master records have been confirmed within a golden record or not.

  1. Scroll to the right again:

These are the golden data fields of the values that have won the survivorship process.

  1. Let's query the master table MD_PERSON:
select
    md.*
from
    md_person md;

Observe that the master data table stores the Publisher and Source IDs as primary key.

  1. Scroll to the right until you find the ID column: this column provides the golden ID associated with the master record.

Also note that the MD table stores information about the matching process, such as match group (b_matchgroup), confirmation status (b_confirmationstatus), confidence score (b_confscore) and suggested ID for merge (b_suggmergeid):

  1. Now run the following query to retrieve all master records related to the golden customer Callie Aldaba - substitute /* golden ID for Callie Aldaba */ with the ID you retrieved in the Customer B2C application (10039 in our example):
select
   gd.id golden_id,
   gd.first_name golden_first_name,
   gd.last_name golden_last_name,
   md.b_pubid md_pub_id,
   md.b_sourceid md_sourceid,
   md.first_name md_first_name,
   md.last_name md_last_name
from md_person md
   inner join gd_person gd on (
           gd.id = md.id
   )
where gd.id=10039  /* Golden ID */

This query returns the same master records as the ones observed using the user interface:

Fuzzy-matching foreign key

Because Fuzzy Matched Entities separate the publisher and source IDs in their primary keys, the references to fuzzy entities, likewise, have foreign keys that separate publisher and source when it comes to source data.

Let's take the example of the SD_COMM_CHAN_PREF table : the foreign key to Person source data in the SD_COMM_CHAN_PREF table is composed of 2 columns:

For golden data, the foreign key references the xDM system-generated ID and therefore uses a single column for foreign key. For example in SD_COMM_CHAN_PREF, F_PERSON references the Customer golden ID.

To illustrate that, let's start by reviewing references among entities. To do so, open your model diagram.

Now let's compare it to how the references are created in the database foreign keys for Communication Channel Preferences. To do so look at the table GD_COMM_CHAN_PREF in pgAdmin.

In the GD_COMM_CHAN_PREF table scroll to the far right.

The F_PERSON foreign key is the parent Person golden ID. In the same way, FP_PERSON is the parent publisher ID and FS_PERSON is the parent source ID.

Now let's do the same with GD_PERSON_PRODUCT.

Remember that Person Product is a child to fuzzy-matching Person and Basic Entity Product.

You can see that it contains both Product and Person Foreign keys.

Finally, notice that the primary key for Person Product is concatenated from the columns FP_PRODUCT, FP_PERSON, and FS_PERSON.

ID Matched entities

ID Matched entities assume that data comes from several applications that share a common ID.

Records in entities using ID Matching are matched if they have the same ID and then merged into golden records. For such entities, the golden ID is the common source ID (no ID is generated).

This entity type is well suited when there is a true unique identifier for all the applications communicating with the MDM hub. Learn more about ID Matching entities in the Semarchy documentation.

In the next step, you will learn about Basic entities.

Definition

In contrast to Fuzzy Matched Entities, Basic Entities are designed to handle data coming from a unique data source and thus do not support match and merge. This is suitable for simple reference data entities or when data is authored exclusively in the hub. In the latter case, xDM serves as a source system for authoring data.

Querying data in a Basic Entity is simpler than in a Fuzzy Matched Entity because the ID is the same across source and golden (SD and GD) tables. Also, Basic Entities do not have the concept of a Publisher or MD/MI tables. Therefore, you can use the same identifier in the source system to query Basic Entity records in the SD and GD tables and bypass joining with the master data (MD) table.

  1. In order to learn about Basic Entities, let's go to SA_NICKNAME.

  1. Now go to sa_product, another Basic Entity.

SA_PRODUCT uses the default ID physical name for its primary key. The product ID in this table remains the same as in the golden data GD_PRODUCT table.

  1. Let's now see how the Basic Entity primary key is configured in Semarchy xDM Application Builder:

With Physical Column Name you can control the name of the primary key in the database. Manual ID generation means you must load the ID and do not rely on xDM to generate it.

Notice that entity type (Basic, Fuzzy Matched, ID matched) can be selected on this page and that Product type is a Basic entity.

The column name of Primary Key for the Product entity is ID (default name).

Also observe that the ID can be entered manually or generated using a sequence, a Universal Unique Identifier (UUID) or a custom rule via a SemQL expression. The method for ID generation is selected when entity is created.

In practice

Let's look at an example of a Basic Entity:

  1. Navigate to the Products view under Browse Data. Product data is modeled using a Basic Entity.
  2. Look for the product record with the name "Carbonite Helmet".
  3. Click on this product to see more information on the PRODUCT tab.

Note the product record's ID "CARBON4861OK".

  1. Now switch to your SQL client and execute the following query:
select
   gd.id golden_id,
   gd.product_name golden_product_name,
   gd.description golden_desc,
   sa.id source_id,
   sa.product_name sa_product_name
from sa_product sa
   inner join gd_product gd on (
           gd.id = sa.id
   )
where gd.id='CARBON4861OK'  /* Golden ID */

Observe that golden_id and source_id are the same.

The demo data assumes that product IDs are consistent across all systems and therefore is suitable for a Basic Entity. This same ID flows from the source system to the Source Authoring (SA) table and eventually to the Golden Data (GD) table.

Basic foreign keys

In contrast to Fuzzy Matched Entities, Basic Entities do not separate the publisher and source IDs neither at the source nor golden level. Therefore, foreign keys that reference a Basic Entity use that parent entity's original foreign key.

For example, the SA_PERSON_PRODUCT table references both a Fuzzy Matched Entity (Customer) and Basic Entity (Product):


Congratulations

You have successfully learned the basics about entity types in Semarchy xDM.

Here is a cheat sheet to remind you of the differences between fuzzy-matching, ID-matched, and Basic Entities:

It will be handy during development time when you are building your model to understand which entity type to use. Similarly, this cheat sheet will be useful when you query data from the Oracle or PostgreSQL databases to remember the tables available to you and how to use the primary keys.

In the next and last step, you will learn to query errors that were raised during the integration process.

Two types of errors can be raised during the integration process:

The SE, AE, and GE tables serve as error queues and inform you why records failed integration. xDM saves the name of the constraint in b_constraintname and the type of the constraint in the b_constrainttype columns.

Get all customer errors

You can view pre-consolidation errors via the source error (SE) table.

  1. Run this first Query.
select  *
from    se_person;
  1. Have a look at results:

  1. Run this second Query and compare the SE_PERSON view with the SD_PERSON..
select  *
from    sd_person
where   b_error_status = 'ERROR';
  1. Notice that this second query shows you that the SD table has a b_error_status column which provides a simple way to detect records with errors.

The SD_PERSON table flags records as errors and provides customers information to guide troubleshooting. But there is no information explaining why the records are in the error queue..

In the next section, you will learn how to join the SE and SD tables so you can see the data alongside the constraint violation.


Source data and errors

In this step, you will see a useful view when you join the SE and SD tables.

  1. Run this Query and join SD_PERSON and SE_PERSON tables to view the reason for errors and the customer information required for troubleshooting:
select
   se.b_batchid,
   sd.b_loadid,
   se.b_constraintname,
   se.b_constrainttype,
   se.b_pubid,
   se.b_sourceid,
   sd.first_name,
   sd.last_name,
   sd.source_email
from se_person se
inner join sd_person sd on (
       sd.b_pubid = se.b_pubid
   and sd.b_sourceid = se.b_sourceid
   and sd.b_loadid = se.b_loadid
)
where   sd.b_error_status = 'ERROR';
  1. You can now see the record errors with the record's data values (coming from the SD table) alongside the constraint violation information (coming from the SE table):

The records are in the error queue because of the email. Note the source_email column is indeed missing emails. A validation rule in the Customer B2C model requires all customers to have an email address, hence why these records were flagged as errors.

You also obtain Business-relevant fields to aid troubleshooting.


Congratulations

You successfully queried xDM to check for errors and their causes using the SQL API.

You are now familiar with different types of queries for xDM that will prove to be useful in work using the SQL API.

What we've covered

What's next ?

In the next part of the Integration tutorial track, Load Data via SQL, you will learn how to load data in xDM via the SQL API. You will learn basic and advanced loading queries as well as more integration concepts, such as how the integration process works.