Data hub table structures

This page describes the general structure of the tables involved in the certification process.

Tables

For each entity, a predefined set of tables is created. These tables are named with the following convention: <Prefix>_<Physical Table Name>

The prefix is automatically generated depending on the table nature (source data, source error, etc.), and the Physical Table Name is set in the entity definition.

The following list describes the tables created for a given entity.

Table Name Applicable entity type Access Description

SD_<Physical Table Name>

Source Data

Fuzzy- and ID-matched

Read/Write

This table contains source data loaded by publishers, using the middleware.

SA_<Physical Table Name>

Source Authoring

All

Read (Write for basic entities)

For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated with an SF_<Physical Table Name> table, storing flags that indicate which field was overridden.
For basic entities, it contains the records created by the user, and can be used to load data into the hub instead of the SD table. This table is also used to trace deletion operations.

SE_<Physical Table Name>
AE_<Physical Table Name>

Source Errors
Source Authoring Error

All

Read

This table contains the errors detected during the pre-consolidation validation phase. It is named AE_<Physical Table Name> for basic entities.

DU_<Physical Table Name>

Duplicates

Fuzzy-matched

Read

This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score.

UM_<Physical Table Name>

User Matches

Fuzzy-matched

Read/Write

This table contains the master records that have been manually re-grouped by users using duplicate managers. You can use this table to perform similar operations programmatically. For examples of duplicate management operations you can perform using this table, see Manage duplicate records using SQL.

MI_<Physical Table Name>

Master Integration

Fuzzy- and ID-matched

Internal

This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase.

MD_<Physical Table Name>

Master Data

Fuzzy- and ID-matched

Read

This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data.

MX_<Physical Table Name>

Deleted Master

Fuzzy- and ID-matched

Read

This table contains the deleted master records logs, plus the deleted data for soft deletes.

MH_<Physical Table Name>

Master History

Fuzzy- and ID-matched

Read

This table contains the master records history.

GI_<Physical Table Name>

Golden Integration

Fuzzy- and ID-matched

Internal

This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase.

GA_<Physical Table Name>

Golden Authoring

Fuzzy- and ID-matched

Internal

This table contains data overrides performed on golden records . It is associated with a GF_<Physical Table Name>, storing flags that indicate which field was overridden, and a GP_<Physical Table Name> table storing values consolidated before an override.

GX_<Physical Table Name>

Deleted Golden

All

Read

This table contains the deleted golden records logs, plus the deleted data for soft deletes.

GH_<Physical Table Name>

Golden History

All

Read

This table contains the golden records history.

GD_<Physical Table Name>

Golden Data

All

Read

This table contains the golden records, including those flagged as erroneous by post-consolidation validations.

GE_<Physical Table Name>

Golden Errors

Fuzzy- and ID-matched

Read

This table contains the errors detected during the post-consolidation validation phase on golden records.

For publishing and consuming data, internal tables are not needed. The structure of these tables is not explained in this document.

Columns

Attribute columns

Attributes appear in the tables' structure as follows:

  • Attributes using lists of values, built-in and user-defined types are mapped to single columns with a database type and length corresponding to the attribute type. The column name for such a simple attribute is the Physical Column Name value specified in the simple attribute definition.

  • Complex attributes are mapped on columns named with the following convention: <Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>

For example, the Country (COUNTRY) and City (CITY) definition attributes of the Geocoded Address complex attribute (prefix: GEO) will be mapped to columns named GEOCOUNTRY and GEOCITY.

Built-in columns

In addition to the attribute columns, built-in columns are added to the table’s structure and are used in the certification process.

They track for example:

  • Batch information: B_BATCHID

  • Cause of a rejected record: B_CONSTRAINTNAME, B_CONSTRAINTTYPE

  • Class information: B_CLASSNAME

  • Match/Merge information: B_MATCHGRP, B_CONFSCORE, B_HASSUGGMERGE, B_SUGGMERGEID, B_SUGGMERGECONFSCORE, B_SUGGMERGEMASTERSCOUNT, B_CONFIRMATIONSTATUS, B_MASTERSCOUNT, B_ISCONFIRMED

  • Delete operation information: B_DELETETYPE, B_DELETEDATE, B_DELETEAUTHOR, B_DELETEOPERATION,

The following list describes these columns.

Column name Exists in tables Datatype Description

B_AUTHORINGTYPE

GA, SA

VARCHAR2(30 CHAR)

For matched entities, type of authoring operation:

  • OVERRIDE for override

  • DATA_ENTRY for data creation

B_BATCHID

AE, GA, GD, GE, GF, GI, GP, GX, MD, MI, MX, SE

NUMBER(38, 0)

ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected.

B_CHECKOUTINCLUSION

UM

VARCHAR2(30 CHAR)

Cause that made the record part of the duplicate management transaction. Possible causes are:

  • User (USER): the record was checked out by the user.

  • Same golden record (SAME_GOLDEN): the record is part of the same match group as another master that was checked out.

  • Same suggestion (SAME_SUGG): the record is part of the same suggestion as another master that was checked out.

  • Same exclusion group (SAME_XGRP): the record is part of the same exclusion group as another master that was checked out.

  • Other (OTHER): Another reason.

B_CLASSNAME

AE, GA, GD, GE, GF, GH, GI, GP, GX, MD, MH, MI, MX, SA, SD, SE, SF, UM

VARCHAR2(128 CHAR)

Unique name of class/entity to which this record belongs. For more information, see Class name.

B_CONFIRMATIONSTATUS

GD, GH, GI, GX, MD, MH, MI, MX, UM

VARCHAR2(30 CHAR)

Confirmation status for duplicate management:

  • Confirmed (CONFIRMED): indicates that a master record is confirmed in a golden record or that a golden record has all its master records confirmed.

  • Not-confirmed (NOT_CONFIRMED): indicates that a master record is not confirmed or that a golden record is entirely made of unconfirmed master records.

  • Historically confirmed (WAS_CONFIRMED, for master records): if a master record was confirmed into a golden record, but this golden record was fused into another golden record.

  • Partially confirmed (PARTIALLY_CONFIRMED for golden records only): indicates that a golden record has part of its master records confirmed.

B_CONFIRMEDSDPK

MD, MI, UM

Varies

ID of the confirmed golden record to which this master record is attached.

B_CONFSCORE

GD, GH, GI, GX, UM

NUMBER(38, 0)

Confidence score of the golden record. It is the average of the match scores in the match group.

B_CONFSCORETYPE

GD, GH, GI, GX, UM

VARCHAR2(30 CHAR)

Algorithm used to compute the confidence score:

  • Direct scoring (AVG_DIRECT_MATCH_SCORE): this method only takes into account the direct matches found with the rules. This group score is the average of these match scores. Pairs in the group that have not been directly matched by any rule are considered to have a score of zero.

  • Transitive match score (AVG_TRANS_MATCH_SCORE): this method takes into account the direct matches found with the rules, plus indirect transitive matches, which are computed. The group score is also the average of the match scores in the group.

B_CONSTRAINTNAME

AE, GE, SE

VARCHAR2(128 CHAR)

For error records, name of the constraint causing the error.

B_CONSTRAINTTYPE

AE, GE, SE

VARCHAR2(30 CHAR)

For error records, type of the constraint causing the error.

B_COPIEDFROM

SA

NUMBER(38, 0)

ID of the original record copied into this record.

B_CREATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was created. For source and master data, the user who has created the record in a workflow, stepper, duplicate manager, or a user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_CREDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit timestamp of the batch into which the record was created. For source data (SD, SA, UM), the timestamp at which the source record was created.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.

B_DELETEAUTHOR

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Author of the delete operation.

B_DELETEDATE

GX, MX, SA, SD

TIMESTAMP(6)

Deletion timestamp of a record

B_DELETEOPERATION

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Delete operation ID.

B_DELETEPATH

GX, MX, SA, SD

VARCHAR2(4000 CHAR)

Cascade path through which the record was reached during a delete. Null for record directly selected for deletion.

B_DELETETYPE

GX, MX, SA, SD

VARCHAR2(30 CHAR)

Delete type (SOFT_DELETE or HARD_DELETE for golden and master deletion). The type of delete can be LEGLESS_DELETE for golden records deleted when they lose all their master records.

B_ERROR_STATUS

GD, GH, GX, SA, SD

VARCHAR2(30 CHAR)

Error status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:

  • VALID if the record has no error.

  • ERROR if the record has errors

  • RECYCLED if the record was recycled and considered valid

  • OBSOLETE_ERROR if the record had errors but a newer version of the record fixes them.

  • a <NULL> value also indicates a record with no error.

B_FROMBATCHID

GH, MH

NUMBER(38, 0)

Batch at which the history record was created.

B_GOLDENTYPE

GD, GH, GX

VARCHAR2(30 CHAR)

For fuzzy-matching and ID-matching entities, indicates the type of golden record—i.e., whether the golden record was created and authored only in the data hub (DE_BASED), or consolidated from master data and possibly overridden (MASTER_BASED).

B_HASOVERRIDE

GD, GH, GX

CHAR(1 CHAR)

For fuzzy-matching and ID-matching entities, this flag (0 or 1) indicates whether the golden record has override values.

B_HASSUGGMERGE

GD, GH, GI, GX, MD, MH, MI, MX, UM

CHAR(1 CHAR)

Flag (0 or 1) indicating that match and merge suggestions are available for this record.

B_ISCONFIRMED

GD, GH, GI, GX

CHAR(1 CHAR)

Flag (0 or 1) indicated whether this golden record has been confirmed (fuzzy-matched entities only).

B_LOADID

AE, SA, SD, SE, SF, UM

NUMBER(38, 0)

Load identifier used as the unique transaction ID for external application pushing data to the platform

B_MASTERSCOUNT

GD, GH, GI, GX, UM

NUMBER(38, 0)

Number of master records contributing to the golden record.

B_MATCHGRP

MD, MH, MI, MX

NUMBER(38, 0)

ID of the match group for the master record. This column is set when matching takes place.

B_MATCHRULE

DU

VARCHAR2(128 CHAR)

Match rule having caused the pair of records to match.

B_MATCHSCORE

DU

NUMBER(38, 0)

Score of the matched pair.

B_OLDMATCHGRP

MI

NUMBER(38, 0)

Previous identifier of the match group for the master record. This column is deprecated.

B_OLDSDPK

MD, MH, MI, MX

Varies

ID of the previous golden record the master record was attached to.

B_ORIGINALBATCHID

SA, SD, UM

NUMBER(38, 0)

Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.

B_ORIGINALCONFIRMATIONSTATUS

UM

VARCHAR2(30 CHAR)

Original confirmation status in a duplicate management operation.

B_ORIGINALCONFIRMEDSDPK

UM

Varies

ID of the original confirmed golden record to which the master record was attached.

B_ORIGINALCONFSCORE

UM

NUMBER(38, 0)

Confidence score of the original golden in a duplicate management operation.

B_ORIGINALCONFSCORETYPE

UM

VARCHAR2(30 CHAR)

Algorithm used to compute the original confidence Ssore:

  • Direct scoring (AVG_DIRECT_MATCH_SCORE): this method only takes into account the direct matches found with the rules. This group score is the average of these match scores. Pairs in the group that have not been directly matched by any rule are considered to have a score of zero.

  • Transitive match score (AVG_TRANS_MATCH_SCORE): this method takes into account the direct matches found with the rules, plus indirect transitive matches, which are computed. The group score is also the average of the match scores in the group.

B_ORIGINALMASTERSCOUNT

UM

NUMBER(38, 0)

Number of master records in the original golden in a duplicate management operation.

B_ORIGINALSDPK

UM

Varies

ID of the original golden record to which the master record was attached.

B_ORIGINALXGRP

UM

RAW(16)

Original exclusion group to which the master record belonged.

B_PUBID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(30 CHAR)

For matching entities, code of the publisher that published the record.

B_PUBID2

DU

VARCHAR2(30 CHAR)

For duplicate pairs, code of the publisher that published the second record.

B_SOURCEID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(128 CHAR)

ID of the source record in the source publisher system (fuzzy-matched entities only). See below for a detailed explanation of the primary key options.

B_SOURCEID2

DU

VARCHAR2(128 CHAR)

For duplicate pairs, ID of the second source record in the source publisher system (fuzzy-matched entities only).

B_SUGGMERGECONFSCORE

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Confidence Score for the suggested match group.

B_SUGGMERGECONFSCORETYPE

GD, GH, GI, GX, MD, MH, MI, MX, UM

VARCHAR2(30 CHAR)

Algorithm used to compute Confidence Score for the suggested match group:

  • Direct scoring (AVG_DIRECT_MATCH_SCORE): this method only takes into account the direct matches found with the rules. This group score is the average of these match scores. Pairs in the group that have not been directly matched by any rule are considered to have a score of zero.

  • Transitive match score (AVG_TRANS_MATCH_SCORE): this method takes into account the direct matches found with the rules, plus indirect transitive matches, which are computed. The group score is also the average of the match scores in the group.

B_SUGGMERGEID

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

ID of the merge suggested by the automated matching.

B_SUGGMERGEMASTERSCOUNT

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Number of master records in the suggested merge.

B_TOBATCHID

GH, MH

NUMBER(38, 0)

Batch at which history record stopped being current or null if the records are still current.

B_UPDATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was updated. For source and master data, the latest user who has updated the record in a workflow, stepper, duplicate manager or the user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_UPDDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit timestamp of the batch into which the record was updated. For source data (SD, SA, UM), the timestamp at which the source record was created or updated.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.

B_XGRP

MD, MH, MI, MX, UM

RAW(16)

Exclusion group ID. An exclusion group represents a group of records for which a user has made split decisions.

Primary key columns

The primary key to load depends on the entity type.

Basic entities

For basic entities, Semarchy xDM uses a single identifier, stored in a column named after the physical column name specified in the primary key attribute definition. This column will exist in all tables. The identifier is simply propagated into the hub from the source records.

ID-matched entities

When using ID matching, Semarchy xDM assumes a common identifier across all systems. In this case, this common identifier is stored in a column named after the physical column name specified in the primary key attribute definition.

This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.

Fuzzy-matched entities

When using fuzzy matching, Semarchy xDM assumes no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.

In this case, the source identifier is stored in the column named B_SOURCEID, along with the publisher code stored in the B_PUBID. This is the case for the SD, SE and MD tables.

When publishing data into the hub, the middleware loads this B_SOURCEID column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID.

When the records are consolidated in a golden record (GD and GE tables), a system-defined primary key is generated and stored in a column named after the physical column name specified in the primary key attribute definition. This key is referred to as the golden record ID.

The MD table makes the bridge between the source ID and the golden record ID, as it contains both these values.

Reference columns

When a reference exists in the source publisher and needs to be expressed in the landing table, this reference mapping in the table structure depends on the entity type of the referenced entity.

Reference to a basic entity

For a reference to a basic entity, the referenced key stored is simply the primary key of the basic entity. As a consequence, the referenced value is stored in a single column. This column is named after the physical name provided in the reference definition and is prefixed with F_. For example, F_COUNTRIES.

Reference to an ID-matched entity

For a reference to an ID-matched entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the physical name provided in the reference definition and is prefixed with F_ (e.g., F_EMPLOYEE).

For example, if the Customer entity references Employee and this entity uses ID matching, SD_CUSTOMER will contain the following information.

CUSTOMER_NAME F_EMPLOYEE

Gadgetron

11

Roxxon

56

This information means that the customer Gadgetron references employee number 11, and Roxxon references employee number 56. This employee number is the same in all systems.

Reference to a fuzzy-matched entity

For a reference to a fuzzy-matched entity, the referenced may point to a master record from a publisher, or directly to a golden record in the hub:

  • For a reference to a master record:

    • The referenced value is stored in a column named after the physical name provided in the definition of the reference and prefixed with FS_ (for foreign source ID). For example, FS_CUSTOMER.

    • The referenced publisher is also stored, in a column named after the physical name provided in the definition of the reference and prefixed with FP_ (for foreign publisher). For example, FP_CUSTOMER.

  • For a reference to a golden record:

    • The reference value is stored in a column named after the physical name provided in the definition of the reference and prefixed with F_ (for foreign ID).

Reference to a master record

The Contact entity references Customer and this entity uses fuzzy matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME FP_CUSTOMER FS_CUSTOMER

John

Doe

CRM

1235

Jane

Smith

MKT

A3251

This information means that the contact John Doe references the customer with the primary key 1235 in the CRM publisher, and that Jane Smith references the customer with the primary key A3251 in the MKT publisher.

Reference to a golden record

Contact references Customer and this entity uses fuzzy matching, SD_CONTACT contains the following information.

FIRST_NAME LAST_NAME F_CUSTOMER

John

Doe

6598

Jane

Smith

6556

This information means that the contact John Doe references the customer golden record with the golden record ID 6598, and that Jane Smith references the customer golden record with the golden record ID 6556.

If both the reference to the master and golden records are loaded, only the reference to the golden record is taken into account.
To clear a reference, you must push a null value in the three FS_, FP_ and F_ columns.

Class name

Several entities involved in an inheritance relationship have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The B_CLASSNAME column is used to identify the class (entity) of a record in a table.

For example, when Person and Company inherit from the Party entity, the resulting table is named after the parent entity (i.e., Party), and will contain all the attributes of Person and Company as well. In the GD_PARTY table, records representing persons will have B_CLASSNAME='Person'. When publishing person or company information in the SD_PARTY table, the middleware must set B_CLASSNAME='Person' or B_CLASSNAME='Company' accordingly.

Constraints

The only constraints physically enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.

For example, on an SD_ table, the following constraints are enforced:

  • primary key on B_LOADID, B_PUBID, B_SOURCEID (or the primary key column for ID-matched entities)

  • B_LOADID, B_CLASSNAME, B_PUBID and B_SOURCEID (or the primary key column for ID-matched entities) are not null

Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model, but verified during the validation phase.