Consume Data Using SQL
Consuming data from the data hub using SQL is the optimal method for outbound batch data integration.
Consuming Data from Semarchy xDM is done in SQL using the tables of the hubs. This document explains the various data consumption patterns.
Overview
Consuming the data mainly involves:
-
The Golden Data. Contains the enriched, consolidated, validated and certified golden records
-
The Master Data. Contains the master records linked to the golden records. Master records contain the references to the source records.
Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.
Consume Golden Data
Golden Data Table Structure
The complete list of system columns available is detailed in Data Hub Table Structures. The following table lists the system columns used when consuming information from the hub:
Column Name | Datatype | Description |
---|---|---|
|
|
ID of the batch that created this record. |
|
|
Class name of this record. |
|
|
Error Status of this record. The |
|
|
Submit date (timestamp) of the batch into which the record was created. |
|
|
Submit date (timestamp) of the batch into which the record was updated. |
|
|
Submitter of the batch into which the record was created. |
|
|
Submitter of the batch into which the record was updated. |
|
Access Golden Data
To access golden data using SQL, you query on the GD table. The queries are filtered to access data for one or more entity classes.
The following code sample gives a query to access golden records in the GD table.
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <Classname_List> ) (3)
and (G.B_ERROR_STATUS is null or G.B_ERROR_STATUS <> 'ERROR') (4)
Explanation of the query:
1 | We select all columns from the golden record. You can select specific columns from the GD record. |
2 | The GD table accessed is named after the entity. Replace <Physical Table Name> with the physical table name defined for the entity. |
3 | Provide the list of entity classes stored in this table you want to access. For example, Person ,Company . Note that if you are not using inheritance, only one class is stored in the table and this clause is not required. |
4 | Use this filter to exclude the records with errors. To query errors, refer to Consume Errors. |
CONTACT_ID
, FIRST_NAME
and LAST_NAME
golden data for the Contact entity.select CONTACT_ID, FIRST_NAME, LAST_NAME from GD_CONTACT G where G.B_CLASSNAME = 'Contact'
Golden Data Primary Key
The primary key for a golden data table depends on the Entity Type and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:
-
For ID Generation - Sequence:
NUMBER(38,0)
-
For ID Generation - UUID_:
RAW
-
For ID Generation - Manual, the datatype is the one defined for the primary key attribute.
Consume Master Data
Accessing master data from fuzzy and ID matched entities uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.
There is no master data for basic entities. Only source data, source errors and golden data are stored for this type of entities. |
Depending on the Entity Type (ID Matched, Fuzzy Matched), access to the master data differs:
ID Matched Entity
With ID Matched Entity, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (<primary_key_column>
in the sample below)
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M inner join GD_<Physical_Table_Name> G on (
G.<primary_key_column> = M.<primary_key_column>
and G.B_CLASSNAME in ( <classname_list> )
)
where
M.B_CLASSNAME in ( <classname_list> )
and M.B_PUBID = '<publisher_code>'
and M.<primary_key_column> = '<searched_source_id>'
In this code, access is filtered with the class name for both the golden data and the master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #9) and ID of the source record (line #10).
EMPLOYEE_NUMBER=100
in the HR
system (ID Matched Entity).select M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME from MD_EMPLOYEE M inner join GD_EMPLOYEE G on (
G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
and G.B_CLASSNAME = 'Employee'
)
where
M.B_CLASSNAME = 'Employee'
and M.B_PUBID = 'HR'
and M.EMPLOYEE_NUMBER = '100'
The resulting information can be used to update the source record with golden data.
Fuzzy Matched Entity
With Fuzzy Matched Entity, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:
-
B_PUBID
(VARCHAR2(30 CHAR)
) contains the code of the publisher that published this record. -
B_SOURCEID
(VARCHAR2(128 CHAR)
) contains the ID of the source record in that publisher
As a consequence, the link between golden and master records is done using the primary key column as in an ID Matched Entity, but the link to the source is done using the B_SOURCEID
column.
select M.B_SOURCEID, G.*
from MD_<entity> M inner join GD_<entity> G on (
G. <primary_key_column> = M. <primary_key_column>
and G.B_CLASSNAME in ( <classname_list> )
)
where
M.B_CLASSNAME in ( <classname_list> )
and M.B_PUBID = '<publisher_code>'
and M.B_SOURCEID = '<searched_source_id>'
In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #9) and ID of the source record (line #10), using the B_SOURCEID
column.
27030
in the CRM
system (Fuzzy Matched Entity).select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME from MD_CONTACT M inner join GD_CONTACT G on (
G.CONTACT_ID = M.CONTACT_ID
and G.B_CLASSNAME = 'Contact'
)
where
M.B_CLASSNAME = 'Contact'
and M.B_PUBID = 'CRM'
and M.B_SOURCEID = '27030'
Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030
in the CRM
system (Fuzzy Matched Entity). In this example, the master data table is used twice (aliased as M
and MM
) to retrieve the two sides of a duplicate pair.
select
M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME from MD_CONTACT M inner join GD_CONTACT G on
(
G.CONTACT_ID = M.CONTACT_ID
and G.B_CLASSNAME = 'Contact'
)
inner join MD_CONTACT MM on
(
MM.CONTACT_ID = M.CONTACT_ID
and MM.B_CLASSNAME = 'Contact'
)
where
M.B_CLASSNAME = 'Contact'
and M.B_PUBID = 'CRM' /* Publisher ID */
and M.B_SOURCEID = '27030' /* Source ID */
/* and M.B_PUBID = MM.B_PUBID */
/* Uncomment the previous line to restrict the duplicates
to those within the CRM application */
Consume Errors
Pre-Consolidation Errors
Pre-consolidation errors can be accessed via the Source Errors (SE, or AE for basic entities) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME
(name of the constraint) and B_CONSTRAINTTYPE
(type of the constraint) columns.
SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.
Note that the B_ERROR_STATUS
column on the SD table provides a simple way to detect the records with errors.
Example: Select the source errors for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER
and FS_CUSTOMER
.
select * from SD_CONTACT where B_ERROR_STATUS = 'ERROR';
select SE.B_BATCHID, SD.B_LOADID,
SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
SE.B_PUBID, SE.B_SOURCEID,
SD.FIRST_NAME, SD.LAST_NAME,
SD.FP_CUSTOMER, SD.FS_CUSTOMER from SE_CONTACT SE, SD_CONTACT SD
where
SE.B_CLASSNAME = 'Contact'
and SD.B_CLASSNAME = 'Contact'
and SD.B_PUBID = SE.B_PUBID
and SD.B_SOURCEID = SE.B_SOURCEID
and SD.B_LOADID = SE.B_LOADID
and SD.B_ERROR_STATUS = 'ERROR';
Post-Consolidation Errors
Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME
(name of the constraint) and B_CONSTRAINTTYPE
(type of the constraint) columns.
GE tables do not store the erroneous data itself, but they contain the identifier to the golden (GD) records in error.
Note that the B_ERROR_STATUS
column on the GD table provides a simple way to detect the records with errors.
Example: Select the errors and data for the golden contacts in error.
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
select GE.B_BATCHID,
GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
GE.CONTACT_ID,
GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER from GE_CONTACT GE, GD_CONTACT GD
where
GE.B_CLASSNAME = 'Contact'
and GD.B_CLASSNAME = 'Contact'
and GD.CONTACT_ID = GE.CONTACT_ID
and GD.B_BATCHID = GE.B_BATCHID
and GD.B_ERROR_STATUS = 'ERROR';