This tutorial teaches you how to query data in Semarchy xDM using the SQL API. You will learn basic and advanced queries as well as important integration concepts.
This tutorial is based on PostgreSQL and pgAdmin 4. You can run the same queries on SQL Developer for Oracle or SQL Server Management Studio for SQL Server, or use a universal client such as DBeaver.
This tutorial is the second SQL-based unit within the Integration track. Before following this unit, you must:
The estimated duration of this unit is about 1 hour.
Knowing how to query data using SQL is a critical step to build a successful MDM program.
While you ordinarily would rely on a middleware tool or a user-designed program to consume data from xDM, it is still important for you, as the integration developer, to understand the consumption methods described in this tutorial unit, starting with simple queries.
The most basic query selects Person records from the golden data (GD) table. You will also select specific columns which is a more useful query for analyzing data.
select gd.* from gd_person gd;
These are all the golden customer records, with all available columns.
select gd.id, gd.first_name, gd.last_name from gd_person gd;
These are all the golden customer records with only the golden ID, first name, and last name columns.
When looking at matched customer records, it is useful to analyze groups of master records alongside the resulting golden record, in order to compare how the master records were de-duplicated. To get this view, you must join the master data (MD) table with the golden data (GD) table:
select md.b_pubid publisherid, md.b_sourceid sourceid, md.first_name masterfirstname, md.last_name masterlastname, gd.id goldenid, gd.first_name goldenfirstname, gd.last_name goldenlastname from md_person md inner join gd_person gd on gd.id = md.id order by gd.id asc;
The join is on the golden ID, which appears on both the master and the golden tables.
Have a closer look at customer Jass Ellerbusch (golden ID 10017):
In the next section, you will get a golden record given the source ID.
Sometimes, you may want to look up a golden record but all you have is an ID from the source system. To query the golden records, when you know their source ID, proceed as follows:
select md.b_pubid publisherid, md.b_sourceid sourceid, md.first_name masterfirstname, md.last_name masterlastname, gd.id goldenid, gd.first_name goldenfirstname, gd.last_name goldenlastname from md_person md inner join gd_person gd on ( gd.id = md.id ) where md.b_pubid = 'CRM' and md.b_sourceid = '1483060'; /* SourceID */
You have successfully run basic queries via SQL to get information in xDM.
In the next part, you will execute advanced queries to retrieve information, such as customer records and the products they purchased.
Moving forward from basic queries using SQL, you will learn in this step how to use advanced queries to broaden the analyses you are able to do.
You can query customers and see the products they have purchased. This is useful for analytics and BI tools to answer questions like "which customers have purchased a certain product?" or "who are our top-paying customers?".
select pe.id, pe.first_name, pe.last_name, pr.product_name from gd_person pe inner join gd_person_product pp on pe.id = pp.f_person inner join gd_product pr on pr.id = pp.f_product order by pe.first_name desc;
In the next section, you will query master records to see the corresponding duplicate masters.
In this step, you will learn to query a specific master record and see the corresponding master records that are considered duplicates side-by-side. The query will also show the corresponding golden record that the master records consolidate into:
select md.b_pubid dup1_publisher, md.b_sourceid dup1_id, md.first_name dup1_first_name, md.last_name dup1_last_name, md2.b_pubid dup2_publisher, md2.b_sourceid dup2_id, md2.first_name dup2_first_name, md2.last_name dup2_last_name, gd.id gold_id, gd.first_name golden_first_name, gd.last_name golden_last_name from md_person md inner join gd_person gd on ( gd.id = md.id ) inner join md_person md2 on ( md.id = md2.id ) where md.b_pubid = 'CRM' /* Publisher ID */ and md.b_sourceid = '1419728' /* Source ID */ /* and md.b_pubid = md2.b_pubid */ /* uncomment this line to restrict the duplicates to those within the CRM application */;
1419728from the CRM system.
You have successfully run advanced queries via SQL to get information in xDM.
In the next part, you will learn how integration works to understand the concepts to allow you to write your own queries.
Now you have explored a data location with several SQL queries, it is time to learn more about some integration concepts.
When you load data into xDM, we refer to the source applications where the data came from as publishers.
An application that publishes data into xDM is assigned a publisher code in the Application Builder. This code is known as the Publisher ID and is loaded into the column b_pubid.
The publisher ID is necessary for loading data. It is also useful when you query data to trace which applications the master data came from. xDM displays the publisher ID in the master record's ID in the Customer B2C application.
Semarchy xDM deals with several types of records that corresponds to different steps of the integration process:
The transformation process from source records to golden data within Semarchy xDM is called the Data Certification process, or Integration process.
Matching is a key step within the Data Certification process. Its role is to detect duplicates to consolidate them into a golden record. There are 2 different types of matching:
When records are matched together, survivorship rules select for a given attribute the most appropriate value from the various source records.
Each entity you design in Semarchy Application Builder has a given entity type that defines the entity capabilities for match/merge and authoring. Entity types are:
In the next steps, we will have a more detailed look at each type.
If you have two records with the same ID in different systems, duplicate data within the same system (common with customer data), or no common identifier across systems, you will need to match records and consolidate them to become a "best-of-breed" golden record.
The Fuzzy Matched Entity is designed to handle the scenario where there is no common identifier across systems. A golden ID will then be generated when records are consolidated. As such, you will define match and survivorship rules with a Fuzzy Matched Entity.
Let's now look at how primary keys work in Fuzzy Matched Entities.
The primary key is very useful for filtering data in your queries, as you learned when querying golden records based on a given source ID.
Let's look at an example of a Fuzzy Matched Entity:
Let's know switch to your SQL client and have a look at the database:
select gd.* from gd_person gd;
The golden data customer table (GD_PERSON) contains the golden ID (ID column).
b_pubid and b_sourceid are the publisher ID and Source ID which won the survivorship process.
b_mastercount is the number of Master records underlying the golden record.
b_confscore is the confidence score of the matching and b_confirmationstatus indicates whether master records have been confirmed within a golden record or not.
These are the golden data fields of the values that have won the survivorship process.
select md.* from md_person md;
Observe that the master data table stores the Publisher and Source IDs as primary key.
Also note that the MD table stores information about the matching process, such as match group (b_matchgroup), confirmation status (b_confirmationstatus), confidence score (b_confscore) and suggested ID for merge (b_suggmergeid):
/* golden ID for Callie Aldaba */with the ID you retrieved in the Customer B2C application (10039 in our example):
select gd.id golden_id, gd.first_name golden_first_name, gd.last_name golden_last_name, md.b_pubid md_pub_id, md.b_sourceid md_sourceid, md.first_name md_first_name, md.last_name md_last_name from md_person md inner join gd_person gd on ( gd.id = md.id ) where gd.id=10039 /* Golden ID */
This query returns the same master records as the ones observed using the user interface:
Because Fuzzy Matched Entities separate the publisher and source IDs in their primary keys, the references to fuzzy entities, likewise, have foreign keys that separate publisher and source when it comes to source data.
Let's take the example of the SD_COMM_CHAN_PREF table : the foreign key to Person source data in the SD_COMM_CHAN_PREF table is composed of 2 columns:
For golden data, the foreign key references the xDM system-generated ID and therefore uses a single column for foreign key. For example in SD_COMM_CHAN_PREF, F_PERSON references the Customer golden ID.
To illustrate that, let's start by reviewing references among entities. To do so, open your model diagram.
Now let's compare it to how the references are created in the database foreign keys for Communication Channel Preferences. To do so look at the table GD_COMM_CHAN_PREF in pgAdmin.
In the GD_COMM_CHAN_PREF table scroll to the far right.
The F_PERSON foreign key is the parent Person golden ID. In the same way, FP_PERSON is the parent publisher ID and FS_PERSON is the parent source ID.
Now let's do the same with GD_PERSON_PRODUCT.
Remember that Person Product is a child to fuzzy-matching Person and Basic Entity Product.
You can see that it contains both Product and Person Foreign keys.
Finally, notice that the primary key for Person Product is concatenated from the columns FP_PRODUCT, FP_PERSON, and FS_PERSON.
ID Matched entities assume that data comes from several applications that share a common ID.
Records in entities using ID Matching are matched if they have the same ID and then merged into golden records. For such entities, the golden ID is the common source ID (no ID is generated).
This entity type is well suited when there is a true unique identifier for all the applications communicating with the MDM hub. Learn more about ID Matching entities in the Semarchy documentation.
In the next step, you will learn about Basic entities.
In contrast to Fuzzy Matched Entities, Basic Entities are designed to handle data coming from a unique data source and thus do not support match and merge. This is suitable for simple reference data entities or when data is authored exclusively in the hub. In the latter case, xDM serves as a source system for authoring data.
Querying data in a Basic Entity is simpler than in a Fuzzy Matched Entity because the ID is the same across source and golden (SD and GD) tables. Also, Basic Entities do not have the concept of a Publisher or MD/MI tables. Therefore, you can use the same identifier in the source system to query Basic Entity records in the SD and GD tables and bypass joining with the master data (MD) table.
SA_PRODUCT uses the default ID physical name for its primary key. The product ID in this table remains the same as in the golden data GD_PRODUCT table.
With Physical Column Name you can control the name of the primary key in the database. Manual ID generation means you must load the ID and do not rely on xDM to generate it.
Notice that entity type (Basic, Fuzzy Matched, ID matched) can be selected on this page and that Product type is a Basic entity.
The column name of Primary Key for the Product entity is ID (default name).
Also observe that the ID can be entered manually or generated using a sequence, a Universal Unique Identifier (UUID) or a custom rule via a SemQL expression. The method for ID generation is selected when entity is created.
Let's look at an example of a Basic Entity:
Note the product record's ID "CARBON4861OK".
select gd.id golden_id, gd.product_name golden_product_name, gd.description golden_desc, sa.id source_id, sa.product_name sa_product_name from sa_product sa inner join gd_product gd on ( gd.id = sa.id ) where gd.id='CARBON4861OK' /* Golden ID */
Observe that golden_id and source_id are the same.
The demo data assumes that product IDs are consistent across all systems and therefore is suitable for a Basic Entity. This same ID flows from the source system to the Source Authoring (SA) table and eventually to the Golden Data (GD) table.
In contrast to Fuzzy Matched Entities, Basic Entities do not separate the publisher and source IDs neither at the source nor golden level. Therefore, foreign keys that reference a Basic Entity use that parent entity's original foreign key.
For example, the SA_PERSON_PRODUCT table references both a Fuzzy Matched Entity (Customer) and Basic Entity (Product):
You have successfully learned the basics about entity types in Semarchy xDM.
Here is a cheat sheet to remind you of the differences between fuzzy-matching, ID-matched, and Basic Entities:
It will be handy during development time when you are building your model to understand which entity type to use. Similarly, this cheat sheet will be useful when you query data from the Oracle or PostgreSQL databases to remember the tables available to you and how to use the primary keys.
In the next and last step, you will learn to query errors that were raised during the integration process.
Two types of errors can be raised during the integration process:
The SE, AE, and GE tables serve as error queues and inform you why records failed integration. xDM saves the name of the constraint in b_constraintname and the type of the constraint in the b_constrainttype columns.
You can view pre-consolidation errors via the source error (SE) table.
select * from se_person;
select * from sd_person where b_error_status = 'ERROR';
The SD_PERSON table flags records as errors and provides customers information to guide troubleshooting. But there is no information explaining why the records are in the error queue..
In the next section, you will learn how to join the SE and SD tables so you can see the data alongside the constraint violation.
In this step, you will see a useful view when you join the SE and SD tables.
select se.b_batchid, sd.b_loadid, se.b_constraintname, se.b_constrainttype, se.b_pubid, se.b_sourceid, sd.first_name, sd.last_name, sd.source_email from se_person se inner join sd_person sd on ( sd.b_pubid = se.b_pubid and sd.b_sourceid = se.b_sourceid and sd.b_loadid = se.b_loadid ) where sd.b_error_status = 'ERROR';
The records are in the error queue because of the email. Note the source_email column is indeed missing emails. A validation rule in the Customer B2C model requires all customers to have an email address, hence why these records were flagged as errors.
You also obtain Business-relevant fields to aid troubleshooting.
You successfully queried xDM to check for errors and their causes using the SQL API.
You are now familiar with different types of queries for xDM that will prove to be useful in work using the SQL API.
In the next part of the Integration tutorial track, Load Data via SQL, you will learn how to load data in xDM via the SQL API. You will learn basic and advanced loading queries as well as more integration concepts, such as how the integration process works.