Consume data using SQL

Consuming data from the data hub using SQL is the optimal method for outbound batch data integration.

Consuming Data from Semarchy xDM is done in SQL using the tables of the hubs. This page explains the various data consumption patterns.

Overview

Consuming the data mainly involves:

  • The Golden Data. Contains the enriched, consolidated, validated and certified golden records

  • The Master Data. Contains the master records linked to the golden records. Master records contain the references to the source records.

Using the golden data in conjunction with the master data allows cross-referencing to source data and re-integrating golden data into source systems.

Consume golden data

Golden data table structure

The complete list of system columns available is detailed in Data hub table structures. The following table lists the system columns used when consuming information from the hub:

Column Name Datatype Description

B_BATCHID

NUMBER(38,0)

ID of the batch that created this record.

B_CLASSNAME

VARCHAR2(128 CHAR)

Class name of this record.

B_ERROR_STATUS

VARCHAR2(30 CHAR)

Error Status of this record. The ERROR value indicates a golden record that has failed post-consolidation validation.

B_CREDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was created.

B_UPDDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was updated.

B_CREATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was created.

B_UPDATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was updated.

  • B_ERROR_STATUS: This value indicates whether the golden record has passed successfully or not post-consolidation validations. Adding a filter on this column allows to include or exclude the records in error.

  • B_CREDATE, B_UPDDATE: These timestamps do not necessarily correspond to the moment when the records were created or updated in a source system or in the hub using a stepper, duplicate manager, etc. They give the moment when the record was submitted to the hub. Note that a record, even unmodified, is considered updated as soon as it is submitted to the hub.

  • B_CREDATOR, B_UPDATOR: For records created or modified using direct authoring or duplicate managers, this corresponds to the user who has performed the changes. For records created or modified in workflows, this corresponds to the user who has submitted the workflow.

Access golden data

To access golden data using SQL, you query on the GD table. The queries are filtered to access data for one or more entity classes.

The following code sample gives a query to access golden records in the GD table.

Example 1. Selecting the golden data, excluding the records having post-consolidation errors.
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <Classname_List> ) (3)
and (G.B_ERROR_STATUS is null or G.B_ERROR_STATUS <> 'ERROR') (4)

Explanation of the query:

1 We select all columns from the golden record. You can select specific columns from the GD record.
2 The GD table accessed is named after the entity. Replace <Physical Table Name> with the physical table name defined for the entity.
3 Provide the list of entity classes stored in this table you want to access. For example, Person,Company. Note that if you are not using inheritance, only one class is stored in the table and this clause is not required.
4 Use this filter to exclude the records with errors. To query errors, refer to Consume Errors.
Example 2. Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity.
select CONTACT_ID, FIRST_NAME, LAST_NAME from GD_CONTACT G where G.B_CLASSNAME = 'Contact'

Golden data primary key

The primary key for a golden data table depends on the Entity Type and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:

  • For ID Generation - Sequence: NUMBER(38,0)

  • For ID Generation - UUID_: RAW

  • For ID Generation - Manual, the datatype is the one defined for the primary key attribute.

Consume master data

Accessing master data from fuzzy and ID matched entities uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.

There is no master data for basic entities. Only source data, source errors and golden data are stored for this type of entities.

Depending on the Entity Type (ID Matched, Fuzzy Matched), access to the master data differs:

ID-matched entity

With ID Matched Entity, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (<primary_key_column> in the sample below)

Example 3. Select the golden data corresponding to a given source record ID (ID Matched Entity)
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M inner join GD_<Physical_Table_Name> G on (
 	G.<primary_key_column> = M.<primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.<primary_key_column> = '<searched_source_id>'

In this code, access is filtered with the class name for both the golden data and the master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #9) and ID of the source record (line #10).

Example 4. Example: Select the golden data for the Employee source record having the EMPLOYEE_NUMBER=100 in the HR system (ID Matched Entity).
select 	M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME from 	MD_EMPLOYEE M inner join GD_EMPLOYEE G on (
 	G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and	G.B_CLASSNAME = 'Employee'
)
where
      	M.B_CLASSNAME = 'Employee'
  and 	M.B_PUBID = 'HR'
  and 	M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Fuzzy-matched entity

With Fuzzy Matched Entity, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:

  • B_PUBID (VARCHAR2(30 CHAR)) contains the code of the publisher that published this record.

  • B_SOURCEID (VARCHAR2(128 CHAR)) contains the ID of the source record in that publisher

As a consequence, the link between golden and master records is done using the primary key column as in an ID Matched Entity, but the link to the source is done using the B_SOURCEID column.

Example 5. Select the golden data for a given source record ID (fuzzy-matched entity)
select M.B_SOURCEID, G.*
from MD_<entity> M inner join GD_<entity> G on (
 	G. <primary_key_column> = M. <primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #9) and ID of the source record (line #10), using the B_SOURCEID column.

Example 6. Select the golden values for the source Contact record with the ID 27030 in the CRM system (fuzzy-matched entity)
select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME from 	MD_CONTACT M inner join GD_CONTACT G on (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
)
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM'
  and 	M.B_SOURCEID = '27030'

Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity). In this example, the master data table is used twice (aliased as M and MM) to retrieve the two sides of a duplicate pair.

Example 7. Side-by-side duplicates
select
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME from 	MD_CONTACT M inner join GD_CONTACT G on
 (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
 )
inner join MD_CONTACT MM on
 (
 	MM.CONTACT_ID = M.CONTACT_ID
  and	MM.B_CLASSNAME = 'Contact'
 )
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM'       /* Publisher ID */
  and 	M.B_SOURCEID = '27030'  /* Source ID */
  /* and M.B_PUBID = MM.B_PUBID */
  /* Uncomment the previous line to restrict the duplicates
     to those within the CRM application */

Consume errors

Pre-consolidation errors

Pre-consolidation errors can be accessed via the Source Errors (SE, or AE for basic entities) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.

Note that the B_ERROR_STATUS column on the SD table provides a simple way to detect the records with errors.

Example: Select the source errors for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

Example 8. Select the contact records in error
  select * from SD_CONTACT where B_ERROR_STATUS = 'ERROR';
Example 9. Select the errors as well as the data for the source contacts in error
select SE.B_BATCHID, SD.B_LOADID,
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
  SE.B_PUBID, SE.B_SOURCEID,
  SD.FIRST_NAME, SD.LAST_NAME,
  SD.FP_CUSTOMER, SD.FS_CUSTOMER from SE_CONTACT SE, SD_CONTACT SD
where
  SE.B_CLASSNAME = 'Contact'
  and SD.B_CLASSNAME = 'Contact'
  and SD.B_PUBID = SE.B_PUBID
  and SD.B_SOURCEID = SE.B_SOURCEID
  and SD.B_LOADID = SE.B_LOADID
  and SD.B_ERROR_STATUS = 'ERROR';

Post-consolidation errors

Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

GE tables do not store the erroneous data itself, but they contain the identifier to the golden (GD) records in error.

Note that the B_ERROR_STATUS column on the GD table provides a simple way to detect the records with errors.

Example: Select the errors and data for the golden contacts in error.

Example 10. Select the golden contacts in error
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
Example 11. Select the errors as well as the data for the golden contacts in error
select GE.B_BATCHID,
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
  GE.CONTACT_ID,
  GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER from GE_CONTACT GE, GD_CONTACT GD
where
  GE.B_CLASSNAME = 'Contact'
  and GD.B_CLASSNAME = 'Contact'
  and GD.CONTACT_ID = GE.CONTACT_ID
  and GD.B_BATCHID = GE.B_BATCHID
  and GD.B_ERROR_STATUS = 'ERROR';