Publish Data Using SQL
Publishing data in the data hub using SQL is the optimal method for inbound batch data integration.
Publishing data is done using function calls to initialize/submit the external load and DML statements to load the data into the landing (SD for matching entities, SA for basic entities) tables.
Overview
In this approach, external loads are handled using the SQL interface. It works as follows:
-
The external load is initialized using a function call to the
GET_NEW_LOADID
function on the repository schema. This function call returns a Load ID. -
SD (SA for Basic Entities) tables are loaded using SQL inserts issued on the data location schema.
-
The external load is submitted or cancelled using function calls on the repository schema:
-
SUBMIT_LOAD
to submit a load identified by its Load ID. This function call returns a Batch ID. -
CANCEL_LOAD
to cancel a load identified by its Load ID.
-
This approach is recommended when the middleware is an ETL or data integration product. |
Alternately, you can use continuous loads to load data into the SD tables with the Load ID of the Continuous Load, without having to explicitly initialize and submit individual external loads.
Initialize a Load
Initializing an external load uses the following parameters:
-
data_location_name
: name of the data location. This data location is the one hosting the data hub into which the external load is performed. -
program_name
: This variable is for information only. It is used to identify the middleware performing the external load. For example 'ETL Custom Script'. -
load_description
: This variable is for information only. It is used to describe the nature of this load. For example: 'Daily updates for Customers and Contacts'. -
user_name
: name of the user initializing the external load. This user may or may not be a user defined in the security realm of the application server.
This function call to GET_NEW_LOADID
initializes an external load and returns a Load ID. It is performed on the repository database schema.
In Oracle, the integration load functions are part of a INTEGRATION_LOAD package. You must prefix the function name with the name of this package.
|
In SQL Server, the integration load functions are procedures in the repository database that return values such as the Load ID. |
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
'<data_location_name>', /* Data Location Name (as in the UI) */
'<program_name>', /* Informational. Identifies the Middleware*/
'<load_description>', /* Informational. Describes the load. */
'<user_name>' /* User initializing the load. */
);
vLoad_id := <repository_schema>.get_new_loadid(
'<data_location_name>', /* Data Location Name (as in the UI) */
'<program_name>', /* Informational. Identifies the Middleware*/
'<load_description>', /* Informational. Describes the load. */
'<user_name>' /* User initializing the load. */
);
DECLARE
@v_load_id int
BEGIN
EXEC dbo.GET_NEW_LOADID
@return_value_argument=@v_load_id OUTPUT,
@p_data_location_name = 'SEMARCHY_53DL', /* Data Location Name (as in the UI) */
@p_program_name = 'manual_etl_script', /* Informational. Identifies the Middleware */
@p_load_description = N'load_data_update',/* Informational. Describes the load. */
/* Note the N to allow accented characters */
@p_username = 'semadmin' /* User initializing the load. */
PRINT @v_load_id
END
The following example performs a function call initializing an external load on the CustomerMDMHub data location. The repository schema is REPO
. The returned Load ID is stored in the vLoad_id
variable.
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
'CustomerMDMHub',
'Custom ETL',
'Initial Load for the Hub',
'John Doe' );
Load Data
Loading data consists of inserting new source records in the SD (and SA for Basic Entities) tables for the set of entities taken into account in the integration job.
When loading data in the SD or SA tables:
-
You use SQL insert commands or your regular ETL/Data Integration Platform.
-
The insert commands are issued on the SD or SA tables stored in the data location schema.
Tables to Load
Make sure to load all the SD and SA tables for the entities that will be taken into account in the integration job. For example, if the integration job processes the Customer and Contact entities, then you should load the SD_CUSTOMER
and SD_CONTACT
tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded the SD_EMPLOYEE
table will not be taken into account by the integration job to certify golden records for the Employee entity.
Referential Integrity and Load Order
There is no required order to load the SD and SA tables, as no foreign keys are implemented on these tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the Columns to Load section.
Using Transactions
It is recommended to use a database transaction when writing to the SD and SA tables.
-
For external loads: Canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables then canceling the load leaves useless information in the tables.
Using a transaction gives you the capability to rollback all changes on the tables when canceling a load. -
For continuous loads, it is strongly recommended to use a transaction and commit only when all tables are loaded: as the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is completed.
Columns to Load
This section provides some guidance for loading the columns of the SD and SA tables.
System Columns (Mandatory)
The following system columns must be loaded as indicated:
-
B_LOADID
: Load this column with the Load ID provided byGET_NEW_LOADID
function call, of with the ID of the Continuous Load. Note that you can use theGET_CONTINUOUS_LOADID
function to retrieve a continuous load ID from the continuous load name. -
B_CLASSNAME
: Name of the entity (or class) being loaded. When inheritance is used, the same table stores data for all parent and child classes. Set explicitlyB_CLASSNAME
to the name of the entity for which data is being published. For example:Person
,Party
,Company
. See Class Name for more details. -
B_PUBID
: Load this column with a Publisher Code for ID Matched and Fuzzy Matched entities. For example:CRM
,MKT
, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the Publishers node in the Model Design view in the Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a Preferred Publisher consolidation strategy.
Basic entities assume a single source, and do not use the B_PUBID column.
|
Publisher codes are case sensitive. Make sure to load B_PUBID with the publisher code as defined in the model. Publisher codes may contain uppercase letters, digits and underscores.
|
Within a single load, you can load the records from various publishers, using the B_PUBID column to identify each publisher
|
Primary Keys (Mandatory)
For matching entities, the primary key that you load into the SD table allows identifying the source record from the publisher (identified by Publisher Code B_PUDIB
). This primary key will allow taking into account the creation or update of a source record and report through the consolidation process the corresponding data changes to the golden record.
For basic entities, the primary key that you provide in the SA table is the one of the golden record and will allow updating this record.
The primary key column to load depends on the Entity Type of the entity.
Basic Entity
If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.
ID Matched Entity
If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.
Fuzzy Matched Entity
If the entity uses Fuzzy Matching, then you must load into the B_SOURCEID
column the value of the primary key from the source system.
If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the B_SOURCEID
column.
The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.
|
References (Mandatory)
When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the Entity Type of the referenced entity.
Reference to a Basic Entity
If the referenced entity is an Basic entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>
.
Reference to an ID Matched Entity
If the referenced entity is an ID Matched entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>
.
For example, if Customer references Employee and this entity uses ID Matching, then you must load into SD_CUSTOMER
the F_EMPLOYEE
column with the primary key of the source employee record referenced by each customer record.
Reference to a Fuzzy Matched Entity
If the referenced entity is a Fuzzy Matched entity, then you need to load two columns:
-
FS_<Physical Name of the Referenced To Role Name>
: Load this column with the Source ID of the referenced record. -
FP_<Physical Name of the Referenced To Role Name>
: Code of the publisher of the referenced record.
Note that these columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.
|
For example, if Contact references Customer and this entity use Fuzzy Matching, you must load the following columns into the SD_CONTACT
table:
-
FP_CUSTOMER
: Code of the publisher providing the customer referenced by the given contact, e.g.,MKT
. -
FS_CUSTOMER
: Source ID of the customer referenced by the given contact, e.g.,81239
.
Attribute Columns
You should load the attribute columns relevant for the entity you are loading.
Make sure to load:
-
The attribute columns that make sense for the entity class (
B_CLASSNAME
) you are loading. -
The mandatory attribute columns. If not, pre-consolidation validation may reject source records with null values.
-
The columns for attributes using a list of values type. If these are loaded with values out of the LOV range of values, pre-consolidation validation may reject source records.
Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation. |
Loading Date and Timestamp Columns Date attributes convert to Date columns in the database. The behavior of these date columns differ depending on the database:
Timestamp attributes convert to Timestamp columns that contain a timezone portion. Semarchy xDM stores and expects in the database timestamps in the timezone of the application server hosting the Semarchy xDM application. When loading timestamps, integration specialists should take into account the timezone differences and possible timezone conversion between their integration component and the data location’s database server. |
Other Columns (Optional)
The following columns do not need to be loaded or can be optionally loaded:
-
B_ORIGINALBATCHID
: This column is not used for external loads and should not be loaded. -
B_CREATOR
,B_UPDATOR
: These columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch. -
B_CREDATE
,B_UPDDATE
: These columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. If left null, these columns are set to the submit date (timestamp) of the batch. Note that values loaded here are not propagated beyond the SD table.
Submit a Load
Submitting an external load uses the following parameters:
-
load_id
: Load ID returned by the load initialization. -
integration_job
: Name of the integration job to process this load. -
user_name
: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.
The SUBMIT_LOAD
function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
<load_id> /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
'<integration_job>' /* Name of the Integration Job to trigger. */
'<user_name>' /* User who has initialized the load. */
);
vBatch_id := <repository_schema>.submit_load(
<load_id> /* Load ID returned by get_new_loadid */
'<integration_job>' /* Name of the Integration Job to trigger. */
'<user_name>' /* User who has initialized the load. */
);
DECLARE
@return_value int,
@return_value_argument numeric (18,0)
EXEC @return_value = dbo.SUBMIT_LOAD
@return_value_argument=@return_value_argument OUTPUT,
@P_LOAD_ID=<LoadID from GetNewLoad>,
@P_INTEGRATION_JOB_NAME ='JobName',
@P_USERNAME ='semadmin'
SELECT @return_value_argument
The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name INTEGRATE_DATA
. The repository schema is REPO
. The returned Batch ID is stored in the vBatch_id
variable.
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
22,
'INTEGRATE_DATA',
'John Doe' );
Cancel a Load
Canceling a load is performed using the CANCEL_LOAD
function with the following parameters:
-
load_id
: Load ID returned by the load initialization. -
user_name
: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.
The CANCEL_LOAD
procedure cancels an external load identified by its Load ID. It is performed on the repository database schema.
This procedure does not flush the content of the SD_% tables loaded during the external load. This must be taken care of separately.
|
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
<load_id> /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
'<user_name>' /* User who has initialized the load. */
);
<repository_schema>.cancel_load(
<load_id> /* Load ID returned by get_new_loadid */
'<user_name>' /* User who has initialized the load. */
);
EXEC <repository_database>.dbo.CANCEL_LOAD
@P_LOAD_ID = @LOAD_ID,
@P_USERNAME = '<user_name>' /* User who has initialized the load. */
INTEGRATION_LOAD.CANCEL_LOAD
to cancel an external load identified by the Load ID 22. (Oracle)REPO.INTEGRATION_LOAD.CANCEL_LOAD(
22,
'John Doe' );