Publish data using SQL

Publishing data in the data hub using SQL is the optimal method for inbound batch data integration.

Publishing data is done using function calls to initialize/submit the external load and DML statements to load the data into the landing (SD for matching entities, SA for basic entities) tables.

Overview

In this approach, external loads are handled using the SQL interface. It works as follows:

  1. The external load is initialized using a function call to the GET_NEW_LOADID function on the repository schema. This function call returns a Load ID.

  2. SD (SA for Basic Entities) tables are loaded using SQL inserts issued on the data location schema.

  3. The external load is submitted or cancelled using function calls on the repository schema:

    • SUBMIT_LOAD to submit a load identified by its Load ID. This function call returns a Batch ID.

    • CANCEL_LOAD to cancel a load identified by its Load ID.

This approach is recommended when the middleware is an ETL or data integration product.

Alternately, you can use continuous loads to load data into the SD tables with the Load ID of the Continuous Load, without having to explicitly initialize and submit individual external loads.

Initialize a load

Initializing an external load uses the following parameters:

  • data_location_name: name of the data location. This data location is the one hosting the data hub into which the external load is performed.

  • program_name: This variable is for information only. It is used to identify the middleware performing the external load. For example 'ETL Custom Script'.

  • load_description: This variable is for information only. It is used to describe the nature of this load. For example: 'Daily updates for Customers and Contacts'.

  • user_name: name of the user initializing the external load. This user may or may not be a user-defined in the security realm of the application server.

This function call to GET_NEW_LOADID initializes an external load and returns a Load ID. It is performed on the repository database schema.

In Oracle, the integration load functions are part of a INTEGRATION_LOAD package. You must prefix the function name with the name of this package.
In SQL Server, the integration load functions are procedures in the repository database that return values such as the Load ID.
Example 1. Syntax to initialize an external load for Oracle
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
	'<data_location_name>',  /* Data Location Name (as in the UI) */
	'<program_name>',        /* Informational. Identifies the Middleware*/
	'<load_description>',    /* Informational. Describes the load. */
	'<user_name>'            /* User initializing the load. */
	  );
Example 2. Syntax to initialize an external load for PostgreSQL
vLoad_id := <repository_schema>.get_new_loadid(
	'<data_location_name>',  /* Data Location Name (as in the UI) */
	'<program_name>',        /* Informational. Identifies the Middleware*/
	'<load_description>',    /* Informational. Describes the load. */
	'<user_name>'            /* User initializing the load. */
	  );
Example 3. Syntax to initialize an external load for SQL Server
EXEC <repository_database>.dbo.GET_NEW_LOADID
    @RETURN_VALUE_ARGUMENT OUTPUT,
    @P_DATA_LOCATION_NAME = '<data_location_name>', /* Data Location Name (as in the UI) */
    @P_PROGRAM_NAME = '<program_name>',		/*  Informational. Identifies the Middleware */
    @P_LOAD_DESCRIPTION = N'<load_description>',/* Informational. Describes the load. */
    						/* Note the N to allow accented characters */
    @P_USERNAME = '<user_name>' 		/* User initializing the load. */

The following example performs a function call initializing an external load on the CustomerMDMHub data location. The repository schema is REPO. The returned Load ID is stored in the vLoad_id variable.

Example 4. Initialize an external load (Oracle)
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
	'CustomerMDMHub',
	'Custom ETL',
	'Initial Load for the Hub',
	'John Doe' );

Load data

Loading data consists of inserting new source records in the SD (and SA for Basic Entities) tables for the set of entities taken into account in the integration job.

When loading data in the SD or SA tables:

  • You use SQL insert commands or your regular ETL/Data Integration Platform.

  • The insert commands are issued on the SD or SA tables stored in the data location schema.

Tables to load

Make sure to load all the SD and SA tables for the entities that will be taken into account in the integration job. For example, if the integration job processes the Customer and Contact entities, then you should load the SD_CUSTOMER and SD_CONTACT tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded the SD_EMPLOYEE table will not be taken into account by the integration job to certify golden records for the Employee entity.

Referential integrity and load order

There is no required order to load the SD and SA tables, as no foreign keys are implemented on these tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the Columns to load section.

Using transactions

It is recommended to use a database transaction when writing to the SD and SA tables.

  • For external loads: Canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables then canceling the load leaves useless information in the tables.
    Using a transaction gives you the capability to rollback all changes on the tables when canceling a load.

  • For continuous loads, it is strongly recommended to use a transaction and commit only when all tables are loaded: as the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is completed.

Columns to load

This section provides some guidance for loading the columns of the SD and SA tables.

System columns (mandatory)

The following system columns must be loaded as indicated:

  • B_LOADID: Load this column with the Load ID provided by GET_NEW_LOADID function call, of with the ID of the Continuous Load. Note that you can use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID from the continuous load name.

  • B_CLASSNAME: Name of the entity (or class) being loaded. When inheritance is used, the same table stores data for all parent and child classes. Set explicitly B_CLASSNAME to the name of the entity for which data is being published. For example: Person, Party, Company. See Class Name for more details.

  • B_PUBID: Load this column with a Publisher Code for ID Matched and Fuzzy Matched entities. For example: CRM, MKT, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the Publishers node in the Model Design view in the Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a Preferred Publisher consolidation strategy.

Basic entities assume a single source, and do not use the B_PUBID column.
Publisher codes are case-sensitive. Make sure to load B_PUBID with the publisher code as defined in the model. Publisher codes may contain uppercase letters, digits and underscores.
Within a single load, you can load the records from various publishers, using the B_PUBID column to identify each publisher

Primary keys (mandatory)

For matching entities, the primary key that you load into the SD table allows identifying the source record from the publisher (identified by Publisher Code B_PUDIB). This primary key will allow taking into account the creation or update of a source record and report through the consolidation process the corresponding data changes to the golden record.

For basic entities, the primary key that you provide in the SA table is the one of the golden records and will allow updating this record.

The primary key column to load depends on the Entity Type of the entity.

Basic entity

If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

ID-matched entity

If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

Fuzzy-matched entity

If the entity uses Fuzzy Matching, then you must load into the B_SOURCEID column the value of the primary key from the source system. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the B_SOURCEID column.

The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.

References (mandatory)

When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the Entity Type of the referenced entity.

Reference to a basic entity

If the referenced entity is an Basic entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

Reference to an ID-matched entity

If the referenced entity is an ID Matched entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

For example, if Customer references Employee and this entity uses ID Matching, then you must load into SD_CUSTOMER the F_EMPLOYEE column with the primary key of the source employee record referenced by each customer record.

Reference to a fuzzy-matched entity

If the referenced entity is a fuzzy-matched entity, then you need to load two columns:

  • FS_<Physical Name of the Referenced To Role Name>: Load this column with the Source ID of the referenced record.

  • FP_<Physical Name of the Referenced To Role Name>: Code of the publisher of the referenced record.

Note that these columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.

For example, if Contact references Customer and this entity use fuzzy-matching, you must load the following columns into the SD_CONTACT table:

  • FP_CUSTOMER: Code of the publisher providing the customer referenced by the given contact, e.g., MKT.

  • FS_CUSTOMER: Source ID of the customer referenced by the given contact, e.g., 81239.

Attribute columns

You should load the attribute columns relevant for the entity you are loading.

Make sure to load:

  • The attribute columns that make sense for the entity class (B_CLASSNAME) you are loading.

  • The mandatory attribute columns. If not, pre-consolidation validation may reject source records with null values.

  • The columns for attributes using a list of values type. If these are loaded with values out of the LOV range of values, pre-consolidation validation may reject source records.

Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation.

Loading Date and Timestamp Columns

Date attributes convert to Date columns in the database. The behavior of these date columns differ depending on the database:

  • Dates in PostgreSQL and SQL Server do not contain a time or timestamp portion and do not require specific transformation.

  • Oracle’s Dates include a time portion. This portion is automatically truncated by the certification job using the TRUNC(date) function. Integration specialists should be aware of this automated transformation when loading a date and may consider performing this truncation in their integration flows.

Timestamp attributes convert to Timestamp columns that contain a timezone portion. Semarchy xDM stores and expects in the database timestamps in the timezone of the application server hosting the Semarchy xDM application. When loading timestamps, integration specialists should take into account the timezone differences and possible timezone conversion between their integration component and the data location’s database server.

Other columns (optional)

The following columns do not need to be loaded or can be optionally loaded:

  • B_ORIGINALBATCHID: This column is not used for external loads and should not be loaded.

  • B_CREATOR, B_UPDATOR: These columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch.

  • B_CREDATE, B_UPDDATE: These columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. If left null, these columns are set to the submit date (timestamp) of the batch. Note that values loaded here are not propagated beyond the SD table.

Submit a load

Submitting an external load uses the following parameters:

  • load_id: Load ID returned by the load initialization.

  • integration_job: Name of the integration job to process this load.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user-defined in the security realm of the application server.

The SUBMIT_LOAD function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.

Example 5. Syntax to submit a load for Oracle
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<integration_job>'	/* Name of the Integration Job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 6. Syntax to submit a load for PostgreSQL
vBatch_id := <repository_schema>.submit_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<integration_job>'	/* Name of the Integration Job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 7. Syntax to submit a load for SQL Server
EXEC <repository_database>.dbo.SUBMIT_LOAD
	@RETURN_VALUE_ARGUMENT OUTPUT,
	@P_LOAD_ID = @LOAD_ID,
	@P_INTEGRATION_JOB_NAME = '<integration_job>',/* Name of the Integration Job to trigger. */
	@P_USERNAME '<user_name>'			/* User who has initialized the load. */

The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name INTEGRATE_DATA. The repository schema is REPO. The returned Batch ID is stored in the vBatch_id variable.

Example 8. Submit an external load identified by the load ID 22 (Oracle)
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
	22,
	'INTEGRATE_DATA',
	'John Doe' );

Cancel a load

Canceling a load is performed using the CANCEL_LOAD function with the following parameters:

  • load_id: Load ID returned by the load initialization.

  • user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The CANCEL_LOAD procedure cancels an external load identified by its Load ID. It is performed on the repository database schema.

This procedure does not flush the content of the SD_% tables loaded during the external load. This must be taken care of separately.
Example 9. Syntax to cancel a load for Oracle
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 10. Syntax to cancel a load for PostgreSQL
<repository_schema>.cancel_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 11. Syntax to cancel a load for SQL Server
EXEC <repository_database>.dbo.CANCEL_LOAD
	@P_LOAD_ID = @LOAD_ID,
	@P_USERNAME = '<user_name>'	/* User who has initialized the load. */
Example 12. Call INTEGRATION_LOAD.CANCEL_LOAD to cancel an external load identified by the load ID 22 (Oracle)
REPO.INTEGRATION_LOAD.CANCEL_LOAD(
	22,
	'John Doe' );