Consuming Data from the Hub

Consuming Data from Semarchy can be done from several interfaces:

SQL Consumption – Overview

Consuming the data mainly involves:

Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.

Tip: The examples provided in this section can be executed on the MDM hub deployed in the demonstration and evaluation environment used with the "Semarchy Convergence for MDM Getting Started Guide".

Golden Data

Golden Data Table Structure

The complete list of system columns available is provided in the Tables Structures section of this guide. The following table lists the system columns used when consuming information from the hub:

Column Name Datatype Description
B_BATCHID NUMBER(38,0) ID of the batch that created this record.
B_BRANCHID NUMBER(38,0) Data branch containing this record.
B_CLASSNAME VARCHAR2(128 CHAR) Class name of this record.
B_FROMEDITION NUMBER(38,0) Edition from which the Master/Golden record exists.
B_TOEDITION NUMBER(38,0) Edition until which the Master/Golden record exists.
B_CREDATE TIMESTAMP(6) Creation date of the record.
B_UPDDATE TIMESTAMP(6) Last update date of the record.
B_CREATOR VARCHAR2(128 CHAR) Creator of the record.
B_UPDATOR VARCHAR2(128 CHAR) Last updater of this record.

Accessing Golden Data Using SQL

To access golden data using SQL, you query on the GD table. The queries are filtered to access data:

Accessing Golden Data in a Given Data Edition

The following code sample gives a query to access golden records in the GD table in a given data edition.

Sample Code: Select the golden data in a given data edition.

select G.*
from GD_<Physical_Table_Name> G
where
	G.B_BRANCHID = <branch_id>
and	G.B_CLASSNAME in 
           ( <classname_list> )
and	G.B_FROMEDITION< = <edition>
and	(
	G.B_TOEDITION is null
	or
	G.B_TOEDITION > <edition>
	)

Explanation of the query:

Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity in the data branch 0 and in the data edition 1.

select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where
	G.B_BRANCHID = 0
and	G.B_CLASSNAME = 'Contact'
and	G.B_FROMEDITION <= 1
and	(
	G.B_TOEDITION is null
	or
	G.B_TOEDITION > 1
	)

Accessing Golden Data in the Latest Data Edition

The following code sample gives a query to access the latest version of the data, that is the data in the latest data edition.

Sample Code: Select the golden data in the latest data edition.

select G.*
from GD_<Physical_Table_Name> G
where
	G.B_BRANCHID = <branch_id>
and	G.B_CLASSNAME in 
           ( <classname_list> )
and	G.B_TOEDITION is null

Explanation of the query:

Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity in latest data edition of data branch 0.

select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where
	G.B_BRANCHID = 0
and	G.B_CLASSNAME = 'Contact'
and	G.B_TOEDITION is null

Golden Data Primary Key

The primary key for a golden data table depends on the matching behavior of the entity and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:

Master Data

Accessing master data uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.

Depending on the entity matching behavior (ID Matching or Fuzzy Matching), access to the master data differs:

Accessing Master Data Using SQL (ID Matching)

With ID Matching, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute ( <primary_key_column> in the sample below)

Sample Code: Select the golden data corresponding to a given source record ID (ID Matching)

select M.B_SOURCEID, G.* 
from MD_<Physical_Table_Name> M
inner join GD_<Physical_Table_Name> G on
  (	G. <primary_key_column> = M. <primary_key_column>
  and	G.B_BRANCHID = <branch_id>
  and	G.B_CLASSNAME in ( <classname_list> )
  and	G.B_FROMEDITION <= <edition>
  and (	G.B_TOEDITION is null or
	G.B_TOEDITION > <edition> )
  )
where
	M.B_BRANCHID = <branch_id>
  and	M.B_CLASSNAME in ( <classname_list> )
  and	M.B_FROMEDITION <= <edition>
  and (	M.B_TOEDITION is null or
	M.B_TOEDITION > <edition> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.<primary_key_column> = '<searched_source_id>'

In this code, access is filtered with the branch ID, the class name and edition number for both the golden data (lines #5 to #9) and the master data (lines #13 to #16). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #17) and ID of the source record (line #18).

Example: Select the golden data for the Employee source record having the EMPLOYEE_NUMBER=100 in the HR system (ID Matching).

select 	M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME
from 	MD_EMPLOYEE M
inner join GD_EMPLOYEE G on
 (	G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and	G.B_BRANCHID = 0
  and	G.B_CLASSNAME = 'Employee'
  and	G.B_FROMEDITION <= 1
  and (	G.B_TOEDITION is null or G.B_TOEDITION > 1 )
  ) 	 
where
	M.B_BRANCHID = 0
  and	M.B_CLASSNAME = 'Employee'
  and	M.B_FROMEDITION <= 1
  and (	M.B_TOEDITION is null or M.B_TOEDITION > 1 )
  and 	M.B_PUBID = 'HR'
  and 	M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Accessing Master Data Using SQL (Fuzzy Matching)

With Fuzzy Matching, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key (for a given edition) consists of two columns:

As a consequence, the link between golden and master records is done using the primary key column as in ID Matching, but the link to the source is done using the B_SOURCEID column.

Sample Code: Select the golden data for a given source record ID (Fuzzy Matching)

select M.B_SOURCEID, G.* 
from MD_<entity> M
inner join GD_<entity> G on
  (	G. <primary_key_column> = M. <primary_key_column>
  and	G.B_BRANCHID = <branch_id>
  and	G.B_CLASSNAME in ( <classname_list> )
  and	G.B_FROMEDITION <= <edition>
  and (	G.B_TOEDITION is null or
	G.B_TOEDITION > <edition> )
  )
where
	M.B_BRANCHID = <branch_id>
  and	M.B_CLASSNAME in ( <classname_list> )
  and	M.B_FROMEDITION <= <edition>
  and (	M.B_TOEDITION is null or
	M.B_TOEDITION > <edition> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #17) and ID of the source record (line #18), using the B_SOURCEID column.

Example: Select the golden values for the source Contact record with the ID 27030 in the CRM system (Fuzzy Matching).

select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME
from 	MD_CONTACT M
inner join GD_CONTACT G on
 (	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_BRANCHID = 0
  and	G.B_CLASSNAME = 'Contact'
  and	G.B_FROMEDITION <= 1
  and (	G.B_TOEDITION is null or G.B_TOEDITION > 1 )
  ) 	 
where
	M.B_BRANCHID = 0
  and	M.B_CLASSNAME = 'Contact'
  and	M.B_FROMEDITION <= 1
  and (	M.B_TOEDITION is null or M.B_TOEDITION > 1 )
  and 	M.B_PUBID = 'CRM'
  and 	M.B_SOURCEID = '27030'

Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030 in the CRM system (Fuzzy Matching). In this example, the master data table is used twice (aliased as M and MM) to retrieve the two sides of a duplicate pair.

select 
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME
from 	MD_CONTACT M
inner join GD_CONTACT G on
 (	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_BRANCHID = 0
  and	G.B_CLASSNAME = 'Contact'
  and	G.B_FROMEDITION <= 1
  and (	G.B_TOEDITION is null or G.B_TOEDITION > 1 )  ) 	 
inner join MD_CONTACT MM on
 (	MM.CONTACT_ID = M.CONTACT_ID
  and	MM.B_BRANCHID = 0
  and	MM.B_CLASSNAME = 'Contact'
  and	MM.B_FROMEDITION <= 1
  and (	MM.B_TOEDITION is null or MM.B_TOEDITION > 1 )  ) 	 
where
	M.B_BRANCHID = 0
  and	M.B_CLASSNAME = 'Contact'
  and	M.B_FROMEDITION <= 1
  and (	M.B_TOEDITION is null or M.B_TOEDITION > 1 )
  and 	M.B_PUBID = 'CRM'       /* Publisher ID */
  and 	M.B_SOURCEID = '27030'  /* Source ID */
  /* and M.B_PUBID = MM.B_PUBID */
  /* Uncomment the previous line to restrict the duplicates 
     to those within the CRM application */

Errors

Accessing Pre-Consolidation Errors Using SQL

Pre-consolidation errors can be accessed via the Source Errors (SE) tables.
These tables store the erroneous source data information, as well as the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

Example: Select the errors for the latest batch in the branch 0 for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

select SE.B_BATCHID, 
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE, 
  SE.B_PUBID, SE.B_SOURCEID, SE.FIRST_NAME, SE.LAST_NAME, 
  SE.FP_CUSTOMER, SE.FS_CUSTOMER
from SE_CONTACT SE  
where
  SE.B_BRANCHID = 0
  and SE.B_CLASSNAME = 'Contact'
  and SE.B_BATCHID = ( select max(B_BATCHID) 
                       from SE_CONTACT 
                       where SE.B_BRANCHID = 0 )

Accessing Post-Consolidation Errors Using SQL

Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the erroneous records that were consolidated but not certified as golden data, as well as the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

Example: Select the errors for the latest batch in the branch 0 for the Employee entity. In this example, wrong values for the CostCenter would appear. We retrieve these values from the F_COST_CENTER column for review.

select GE.B_BATCHID, 
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE, 
  GE.EMPLOYEE_NUMBER, GE.FIRST_NAME, GE.LAST_NAME, 
  GE.F_COST_CENTER, GE.F_MANAGER
from GE_EMPLOYEE GE
where
  GE.B_BRANCHID = 0
  and GE.B_CLASSNAME = 'Employee'
  and GE.B_BATCHID = ( select max(B_BATCHID) 
                       from GE_EMPLOYEE 
                       where GE.B_BRANCHID = 0 );