Consuming Data from the Hub | ||
---|---|---|
Previous | Next | |
Publishing Data into the Hub | Web Services |
Consuming Data from Semarchy can be done from several interfaces:
Consuming the data mainly involves:
Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.
Tip: The examples provided in this section can be executed on the MDM hub deployed in the demonstration and evaluation environment used with the "Semarchy Convergence for MDM Getting Started Guide".
The complete list of system columns available is provided in the Tables Structures section of this guide. The following table lists the system columns used when consuming information from the hub:
Column Name | Datatype | Description |
---|---|---|
B_BATCHID
|
NUMBER(38,0)
|
ID of the batch that created this record. |
B_BRANCHID
|
NUMBER(38,0)
|
Data branch containing this record. |
B_CLASSNAME
|
VARCHAR2(128 CHAR)
|
Class name of this record. |
B_FROMEDITION
|
NUMBER(38,0)
|
Edition from which the Master/Golden record exists. |
B_TOEDITION
|
NUMBER(38,0)
|
Edition until which the Master/Golden record exists. |
B_CREDATE
|
TIMESTAMP(6)
|
Creation date of the record. |
B_UPDDATE
|
TIMESTAMP(6)
|
Last update date of the record. |
B_CREATOR
|
VARCHAR2(128 CHAR)
|
Creator of the record. |
B_UPDATOR
|
VARCHAR2(128 CHAR)
|
Last updater of this record. |
To access golden data using SQL, you query on the GD table. The queries are filtered to access data:
The following code sample gives a query to access golden records in the GD table in a given data edition.
Sample Code: Select the golden data in a given data edition.
select G.*
from GD_<Physical_Table_Name> G
where
G.B_BRANCHID = <branch_id>
and G.B_CLASSNAME in
( <classname_list> )
and G.B_FROMEDITION< = <edition>
and (
G.B_TOEDITION is null
or
G.B_TOEDITION > <edition>
)
Explanation of the query:
<Physical Table Name>
with the physical table name defined for the entity.
Person
,
Company
.
B_FROMEDITION
and
B_TOEDITION
.
Example: Select the
CONTACT_ID
,
FIRST_NAME
and
LAST_NAME
golden data for the
Contact entity in the data branch
0
and in the data edition
1
.
select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where
G.B_BRANCHID = 0
and G.B_CLASSNAME = 'Contact'
and G.B_FROMEDITION <= 1
and (
G.B_TOEDITION is null
or
G.B_TOEDITION > 1
)
The following code sample gives a query to access the latest version of the data, that is the data in the latest data edition.
Sample Code: Select the golden data in the latest data edition.
select G.*
from GD_<Physical_Table_Name> G
where
G.B_BRANCHID = <branch_id>
and G.B_CLASSNAME in
( <classname_list> )
and G.B_TOEDITION is null
Explanation of the query:
<Physical Table Name>
with the physical table name defined for the entity.
Person
,
Company
.
B_TOEDITION is null
.
Example: Select the
CONTACT_ID
,
FIRST_NAME
and
LAST_NAME
golden data for the
Contact entity in latest data edition of data branch
0
.
select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where
G.B_BRANCHID = 0
and G.B_CLASSNAME = 'Contact'
and G.B_TOEDITION is null
The primary key for a golden data table depends on the matching behavior of the entity and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:
NUMBER(38,0)
RAW
Accessing master data uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.
Depending on the entity matching behavior (ID Matching or Fuzzy Matching), access to the master data differs:
With ID Matching, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (
<primary_key_column>
in the sample below)
Sample Code: Select the golden data corresponding to a given source record ID (ID Matching)
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M
inner join GD_<Physical_Table_Name> G on
( G. <primary_key_column> = M. <primary_key_column>
and G.B_BRANCHID = <branch_id>
and G.B_CLASSNAME in ( <classname_list> )
and G.B_FROMEDITION <= <edition>
and ( G.B_TOEDITION is null or
G.B_TOEDITION > <edition> )
)
where
M.B_BRANCHID = <branch_id>
and M.B_CLASSNAME in ( <classname_list> )
and M.B_FROMEDITION <= <edition>
and ( M.B_TOEDITION is null or
M.B_TOEDITION > <edition> )
and M.B_PUBID = '<publisher_code>'
and M.<primary_key_column> = '<searched_source_id>'
In this code, access is filtered with the branch ID, the class name and edition number for both the golden data (lines #5 to #9) and the master data (lines #13 to #16). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #17) and ID of the source record (line #18).
Example: Select the golden data for the
Employee source record having the
EMPLOYEE_NUMBER=100
in the
HR
system (ID Matching).
select M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME
from MD_EMPLOYEE M
inner join GD_EMPLOYEE G on
( G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
and G.B_BRANCHID = 0
and G.B_CLASSNAME = 'Employee'
and G.B_FROMEDITION <= 1
and ( G.B_TOEDITION is null or G.B_TOEDITION > 1 )
)
where
M.B_BRANCHID = 0
and M.B_CLASSNAME = 'Employee'
and M.B_FROMEDITION <= 1
and ( M.B_TOEDITION is null or M.B_TOEDITION > 1 )
and M.B_PUBID = 'HR'
and M.EMPLOYEE_NUMBER = '100'
The resulting information can be used to update the source record with golden data.
With Fuzzy Matching, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key (for a given edition) consists of two columns:
B_PUBID
(
VARCHAR2(30 CHAR)
) contains the code of the publisher that published this record.
B_SOURCEID
(
VARCHAR2(128 CHAR)
) contains the ID of the source record in that publisher
As a consequence, the link between golden and master records is done using the primary key column as in ID Matching, but the link to the source is done using the
B_SOURCEID
column.
Sample Code: Select the golden data for a given source record ID (Fuzzy Matching)
select M.B_SOURCEID, G.*
from MD_<entity> M
inner join GD_<entity> G on
( G. <primary_key_column> = M. <primary_key_column>
and G.B_BRANCHID = <branch_id>
and G.B_CLASSNAME in ( <classname_list> )
and G.B_FROMEDITION <= <edition>
and ( G.B_TOEDITION is null or
G.B_TOEDITION > <edition> )
)
where
M.B_BRANCHID = <branch_id>
and M.B_CLASSNAME in ( <classname_list> )
and M.B_FROMEDITION <= <edition>
and ( M.B_TOEDITION is null or
M.B_TOEDITION > <edition> )
and M.B_PUBID = '<publisher_code>'
and M.B_SOURCEID = '<searched_source_id>'
In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #17) and ID of the source record (line #18), using the
B_SOURCEID
column.
Example: Select the golden values for the source
Contact record with the ID
27030
in the
CRM
system (Fuzzy Matching).
select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME
from MD_CONTACT M
inner join GD_CONTACT G on
( G.CONTACT_ID = M.CONTACT_ID
and G.B_BRANCHID = 0
and G.B_CLASSNAME = 'Contact'
and G.B_FROMEDITION <= 1
and ( G.B_TOEDITION is null or G.B_TOEDITION > 1 )
)
where
M.B_BRANCHID = 0
and M.B_CLASSNAME = 'Contact'
and M.B_FROMEDITION <= 1
and ( M.B_TOEDITION is null or M.B_TOEDITION > 1 )
and M.B_PUBID = 'CRM'
and M.B_SOURCEID = '27030'
Example: Select side by side the duplicates detected for a given source
Contact record with the ID
27030
in the
CRM
system (Fuzzy Matching). In this example, the master data table is used twice (aliased as
M
and
MM
) to retrieve the two sides of a duplicate pair.
select
M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME
from MD_CONTACT M
inner join GD_CONTACT G on
( G.CONTACT_ID = M.CONTACT_ID
and G.B_BRANCHID = 0
and G.B_CLASSNAME = 'Contact'
and G.B_FROMEDITION <= 1
and ( G.B_TOEDITION is null or G.B_TOEDITION > 1 ) )
inner join MD_CONTACT MM on
( MM.CONTACT_ID = M.CONTACT_ID
and MM.B_BRANCHID = 0
and MM.B_CLASSNAME = 'Contact'
and MM.B_FROMEDITION <= 1
and ( MM.B_TOEDITION is null or MM.B_TOEDITION > 1 ) )
where
M.B_BRANCHID = 0
and M.B_CLASSNAME = 'Contact'
and M.B_FROMEDITION <= 1
and ( M.B_TOEDITION is null or M.B_TOEDITION > 1 )
and M.B_PUBID = 'CRM' /* Publisher ID */
and M.B_SOURCEID = '27030' /* Source ID */
/* and M.B_PUBID = MM.B_PUBID */
/* Uncomment the previous line to restrict the duplicates
to those within the CRM application */
Pre-consolidation errors can be accessed via the Source Errors (SE) tables.
These tables store the erroneous source data information, as well as the information about the constraints that caused the records to fail the validation. The latter is stored in the
B_CONSTRAINTNAME
(name of the constraint) and
B_CONSTRAINTTYPE
(type of the constraint) columns.
Example: Select the errors for the latest batch in the branch
0
for the
Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from
FP_CUSTOMER
and
FS_CUSTOMER
.
select SE.B_BATCHID,
SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
SE.B_PUBID, SE.B_SOURCEID, SE.FIRST_NAME, SE.LAST_NAME,
SE.FP_CUSTOMER, SE.FS_CUSTOMER
from SE_CONTACT SE
where
SE.B_BRANCHID = 0
and SE.B_CLASSNAME = 'Contact'
and SE.B_BATCHID = ( select max(B_BATCHID)
from SE_CONTACT
where SE.B_BRANCHID = 0 )
Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the erroneous records that were consolidated but not certified as golden data, as well as the information about the constraints that caused the records to fail the validation. The latter is stored in the
B_CONSTRAINTNAME
(name of the constraint) and
B_CONSTRAINTTYPE
(type of the constraint) columns.
Example: Select the errors for the latest batch in the branch
0
for the
Employee entity. In this example, wrong values for the
CostCenter would appear. We retrieve these values from the
F_COST_CENTER
column for review.
select GE.B_BATCHID,
GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
GE.EMPLOYEE_NUMBER, GE.FIRST_NAME, GE.LAST_NAME,
GE.F_COST_CENTER, GE.F_MANAGER
from GE_EMPLOYEE GE
where
GE.B_BRANCHID = 0
and GE.B_CLASSNAME = 'Employee'
and GE.B_BATCHID = ( select max(B_BATCHID)
from GE_EMPLOYEE
where GE.B_BRANCHID = 0 );
Previous | Top | Next |
Publishing Data into the Hub | Web Services |