Integration Job | ||
---|---|---|
Previous | Next | |
External Load | Publishing Data into the Hub |
The integration job processes the data submitted in an external load and runs this data through a series of steps to create and certify golden data out of this source data.
This job is generated from the integration rules defined at design time, and it uses the data structures automatically created in the MDM hub when deploying the model edition.
Tip: Although understanding the details of the process is not needed for publishing source data or consuming golden data, it is necessary to have complete understanding of this process to drill down into the various structures between the source and the golden data. For example, to review the rejects or the duplicates detected by the integration job for a given golden record.
Each integration job is the implementation of the overall certification process template. It may contain all or some of the phases of this process. The following section details the various phases of the certification process.
The following figure describes the Golden Data certification process and the various data structures involved in this process.
For more information about these various phase and how the certification process is designed, refer to the "Integration Process Design" chapter in the "Semarchy Convergence for MDM Developer’s Guide".
This section describes the general structure of the tables involved in the integration process.
For each entity, a pre-defined set of tables is created. These tables are named with the following convention:
<Prefix>_<Physical Table Name>
. The
prefix
is automatically generated depending on the table nature (Source Data, Source Error, etc.), and the
Physical Table Name
is set in the entity definition.
The following list describes the tables created for a given entity.
Table | Name | Description |
---|---|---|
SD_<Physical Table Name>
|
Source Data | This table is loaded by the middleware to publish data into the hub. |
SE_<Physical Table Name>
|
Source Errors | This table contains the errors detected during the pre-consolidation validation phase. |
MI_<Physical Table Name>
|
Master Integration | This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase. |
GI_<Physical Table Name>
|
Golden Integration | This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase. |
GE_<Physical Table Name>
|
Golden Errors | This table contents the errors detected during the post-consolidation validation phase. |
MD_<Physical Table Name>
|
Master Data | This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data. |
GD_<Physical Table Name>
|
Golden Data | This table contains the golden records consolidated from the master records. |
For publishing and consuming data, the MI and GI tables are not needed. The structure of these tables is not explained in this document.
Attributes appear in the tables' structure as follows:
<Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>
For example: The
Country (
COUNTRY
) and
City (
CITY
) definition attributes of the
Geocoded Address complex attribute (Prefix:
GEO
) will be mapped to columns named
GEOCOUNTRY
and
GEOCITY
.
In addition to the attribute columns, built-in columns are added to the tables' structure, and are used in the integration process.
They track for example:
B_BRANCHID
,
B_FROMEDITION
,
B_TOEDITION
B_BATCHID
B_CONSTRAINTNAME
,
B_CONSTRAINTTYPE
B_CLASSNAME
The following list describes these columns.
Column Name | Datatype | Exists in Tables | Description |
---|---|---|---|
B_LOADID
|
NUMBER(38,0)
|
SD | ID of the load that created this record. |
B_BATCHID
|
NUMBER(38,0)
|
SD, SE, MD, GD, GE | ID of the batch that created this record. |
B_PUBID
|
VARCHAR2(30 CHAR)
|
SD, SD, MD | Code of the publisher that published this record. |
B_SOURCEID
|
VARCHAR2(128 CHAR)
|
SD, SD, MD | ID of the source record in the source publisher system (Fuzzy Matching entities only). See below for a detailed explanation of the primary key options. |
B_BRANCHID
|
NUMBER(38,0)
|
SD, SE, MD, GD, GE | Data Branch containing the data edition(s) storing this record. |
B_CLASSNAME
|
VARCHAR2(128 CHAR)
|
SD, SE, MD, GD, GE | Class name of this record. See below for a detailed explanation of class name and inheritance. |
B_CONSTRAINTNAME
|
VARCHAR2(128 CHAR)
|
SE, GE | Name of the constraint causing this error (Error Records) |
B_CONSTRAINTTYPE
|
VARCHAR2(30 CHAR)
|
SE, GE | Type of the constraint causing this error (Error Records) |
B_FROMEDITION
|
NUMBER(38,0)
|
MD, GD | Edition from which the Master/Golden record exists. |
B_TOEDITION
|
NUMBER(38,0)
|
MD, GD | Edition until which the Master/Golden record exists. If it is null, then the record exists until the latest data edition. |
B_MATCHGRP
|
NUMBER(38,0)
|
MD | ID of the match group for the master record. This column is set when matching takes place. |
B_MASTERSCOUNT
|
NUMBER(38,0)
|
GD | Number of master records contributing to this golden record. |
B_IS_VALIDATED
|
VARCHAR2(1)
|
GD | Flag indicated whether this golden record has been validated and the matching confirmed by a user (Fuzzy Matching entities only). |
B_CREDATE
|
TIMESTAMP(6)
|
SD, SE, MD, GD, GE | Creation date of the record. |
B_UPDDATE
|
TIMESTAMP(6)
|
SD, SE, MD, GD, GE | Last update date of the record. |
B_CREATOR
|
VARCHAR2(128 CHAR)
|
SD, SE, MD, GD, GE | Creator of the record. |
B_UPDATOR
|
VARCHAR2(128 CHAR)
|
SD, SE, MD, GD, GE | Last updater of this record. |
The primary key to load depends on the type of the matching behavior of the entity:
When using ID Matching, we assume a common identifier across all systems. In this case, this common identifier is stored in a column named after the Physical Column Name specified in the primary key attribute definition.
This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.
When using Fuzzy Matching, we assume no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.
In this case, the source identifier is stored in the column named
B_SOURCEID
, along with the publisher code stored in the
B_PUBID
. This is the case for the SD, SE and MD tables.
When publishing data into the hub, the middleware loads this
B_SOURCEID
column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into
B_SOURCEID
.
When the records are consolidated in a golden record (GD and GE tables), a System Defined Primary Key is generated and stored in a column named after the Physical Column Name specified in the primary key attribute definition. This key is referred to as the Golden Record ID.
The MD table makes the bridge between the Source ID and the Golden ID as it contains both these values.
When a reference exists in the source publisher and need to be expressed in the landing table, this reference mapping in the table structure depends on the matching behavior of the referenced entity.
For a reference to an
ID Matching entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the
Physical Name provided in the reference definition and is prefixed with
F_
. For example,
F_EMPLOYEE
.
For example: if
Customer references
Employee and this entity uses ID Matching,
SD_CUSTOMER
will contain the following information.
CUSTOMER_NAME | F_EMPLOYEE |
---|---|
Gadgetron | 11 |
Roxxon | 56 |
This information means that the customer “Gadgetron” references the employee number 11, and “Roxxon” references employee number 56. This employee number is the same in all systems.
For a reference to a Fuzzy Matching entity, the referenced key is per publisher.
As a consequence:
FS_
(for Foreign Source ID). For example,
FS_CUSTOMER
.
FP_
(for Foreign Publisher). For example,
FP_CUSTOMER
.
For example: if
Contact references
Customer and this entity uses Fuzzy Matching,
SD_CONTACT
will contain the following information.
FIRST_NAME | LAST_NAME | FP_CUSTOMER | FS_CUSTOMER |
---|---|---|---|
John | Doe | CRM | 1235 |
Jane | Smith | MKT | A3251 |
This information means that the contact “John Doe” references the customer with the primary key “1235” in the “CRM” publisher, and that “Jane Smith” references the customer with the primary key “A3251” in the “MKT” publisher.
Several entities involved in an inheritance relation have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The
B_CLASSNAME
column is used to identify the class (entity) of a record in a table.
For example, when
Person and
Company inherit from the
Party entity, the resulting table is named after the parent entity (
Party), and will contain all the attributes of
Person and
Company as well. In the
GD_PARTY
table for example, records representing persons will have
B_CLASSNAME='Person'
. When publishing person or company information in the
SD_PARTY
table, the middleware must set
B_CLASSNAME='Person'
or
B_CLASSNAME='Company'
accordingly.
The only constraints enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.
For example, on an SD_ table, the following constraints are enforced:
B_LOADID
,
B_PUBID
,
B_SOURCEID
(or the primary key column for ID Matching entities)
B_LOADID
,
B_CLASSNAME
,
B_BRANCHID
,
B_PUBID
and
B_SOURCEID
(or the primary key column for ID Matching entities) are not null
Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model but checked during the validation phase.
Previous | Top | Next |
External Load | Publishing Data into the Hub |