This tutorial teaches you how to load data in Semarchy xDM using the SQL. You will learn basic and advanced queries as well as important integration concepts. This tutorial is based on the Customer B2C demo application, so you will get experience dealing with loading fuzzy-matching entities and learning how integration jobs work.

What you'll learn

The tutorial videos use Oracle and SQL Developer. If you are using PostgreSQL, you can run the same queries in pgAdmin 4.

Enjoy this tutorial!

Before you start

Before following this tutorial, complete the first two tutorials in Integration track: "01. Configuring Jobs, Continuous Loads" and "02. Query Data via SQL". You can find these tutorials from the main menu.

GO TO TUTORIALS

If you have not already done so, you can click on the DOWNLOAD TUTORIAL RESOURCES button to download resources, such as SQL helper files, used within this tutorial.

DOWNLOAD TUTORIAL RESOURCES

The estimated duration of this unit is about 1 hour.

You can watch a video demonstrating the steps you will follow in this tutorial by clicking the "Watch video" link below:

Watch video: Load Data via SQL

In each section of this tutorial, you will find links that take you directly to the part of the video that demonstrates a step you will take. Even though the videos jump to the pertinent section, the video will continue playing to the end. Stop the video after each segment to avoid running to the end of the video.

Knowing how to load data using SQL is a critical step to building a successful MDM program. While you ordinarily would rely on a middleware tool or a user-designed program to load data from xDM, it is still important for you, as the integration developer, to understand the underlying methods described in this tutorial unit.

A core understanding of how to load data by running scripts manually will inform how you set up your ETL tool to load data into xDM.


What you'll learn

Review original customer data

Before loading updated customer data, let's review the existing customer data in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Look for the person record with the name Abe Corbett. You may need to sort the First Name column from A-Z. Note this customer's first and last name.

    You should see "Abe Corbett".
  3. Look for the person record with the name Carroll Spark. Note this record's Person Type.

    You should see Carroll's Person Type is "Prospect".

Next, you will set up your staging environment to load customer updates. Ensure you have access to the database and a SQL client.

Video: Review original customer data

Set up staging data to load

Let's set up the staging table with the original customer data and customer updates. You can think of this staging table as a simulation of a live data feed where you typically get new data from source systems to load into xDM for updating customer information.

Let's set up the staging table in the SEMARCHY_STG schema and load the dataset.

  1. Connect to the SEMARCHY_STG schema.
  2. Run the create table statement to set up a landing table, SOURCE_PERSON in the staging schema SEMARCHY_STG.

Oracle

2-update-1-create-source-person-staging-table.sql

CREATE TABLE SEMARCHY_STG.SOURCE_PERSON (
     B_CLASSNAME varchar(128)
    ,B_PUBID varchar(128)
    ,B_SOURCEID varchar(128)
    ,FIRST_NAME varchar(128)
    ,LAST_NAME varchar(128)
    ,DATE_OF_BIRTH date
    ,ADDSTREET varchar(128)
    ,ADDCITY varchar(128)
    ,ADDSTATE varchar(128)
    ,ADDPOSTAL_CODE varchar(128)
    ,ADDCOUNTRY varchar(128)
    ,SOURCE_EMAIL varchar(128)
    ,SOURCE_PHONE varchar(128)
    ,PERSON_TYPE varchar(128)
    ,VALUE_STATUS varchar(128)
)
;

PostgreSQL

2-update-1-create-source-person-staging-table.sql

create table semarchy_stg.source_person (
     b_classname varchar(128)
    ,b_pubid varchar(128)
    ,b_sourceid varchar(128)
    ,first_name varchar(128)
    ,last_name varchar(128)
    ,date_of_birth date
    ,addstreet varchar(128)
    ,addcity varchar(128)
    ,addstate varchar(128)
    ,addpostal_code varchar(128)
    ,addcountry varchar(128)
    ,source_email varchar(128)
    ,source_phone varchar(128)
    ,person_type varchar(128)
    ,value_status varchar(128)
)
;
  1. Run the insert statement to load records into the landing table SOURCE_PERSON created in the previous step. Click the link to get the full insert statement.

Oracle

2-update-2-insert-source-person-staging-table.sql

INSERT INTO SEMARCHY_STG.SOURCE_PERSON (B_CLASSNAME, B_PUBID, B_SOURCEID, FIRST_NAME, LAST_NAME, DATE_OF_BIRTH, ADDSTREET, ADDCITY, ADDSTATE, ADDPOSTAL_CODE, ADDCOUNTRY, SOURCE_EMAIL, SOURCE_PHONE, PERSON_TYPE, VALUE_STATUS) VALUES ('Person', 'CRM', '1388210', 'Abe', 'Corbett-Lopéz', to_date('10/06/1986', 'MM-DD-YYYY'), '12 S 12th St', 'Philadelphia', 'PA', '19107', 'US', 'abraham@corbett.com', '215-627-9067', 'PROSPECT', '');

INSERT INTO SEMARCHY_STG.SOURCE_PERSON (B_CLASSNAME, B_PUBID, B_SOURCEID, FIRST_NAME, LAST_NAME, DATE_OF_BIRTH, ADDSTREET, ADDCITY, ADDSTATE, ADDPOSTAL_CODE, ADDCOUNTRY, SOURCE_EMAIL, SOURCE_PHONE, PERSON_TYPE, VALUE_STATUS) VALUES ('Person', 'CRM', '1392270', 'Adele', 'Richardson', to_date('01/09/1981', 'MM-DD-YYYY'), '15 Pleasant Ave E', 'Paramus', 'NJ', '07652', 'US', 'adelaide@mackson.com', '201-845-4885', 'CUSTOMER', 'NORMAL');

/* Get the full insert statement at https://github.com/semarchy/xdm-tutorials/blob/master/Track3-Resources/HelperFiles/Track3-Unit3/Oracle/2-update-2-insert-source-person-staging-table.sql */

COMMIT;

PostgreSQL

2-update-2-insert-source-person-staging-table.sql

insert into semarchy_stg.source_person (b_classname, b_pubid, b_sourceid, first_name, last_name, date_of_birth, addstreet, addcity, addstate, addpostal_code, addcountry, source_email, source_phone, person_type, value_status) values ('Person', 'CRM', '1388210', 'Abe', 'Corbett-Lopéz', to_date('10/06/1986', 'MM-DD-YYYY'), '12 S 12th St', 'Philadelphia', 'PA', '19107', 'US', 'abraham@corbett.com', '215-627-9067', 'PROSPECT', NULL);

insert into semarchy_stg.source_person (b_classname, b_pubid, b_sourceid, first_name, last_name, date_of_birth, addstreet, addcity, addstate, addpostal_code, addcountry, source_email, source_phone, person_type, value_status) values ('Person', 'CRM', '1392270', 'Adele', 'Richardson', to_date('01/09/1981', 'MM-DD-YYYY'), '15 Pleasant Ave E', 'Paramus', 'NJ', '07652', 'US', 'adelaide@mackson.com', '201-845-4885', 'CUSTOMER', 'NORMAL');


/* Get the full insert statement at https://github.com/semarchy/xdm-tutorials/blob/master/Track3-Resources/HelperFiles/Track3-Unit3/PostgreSQL/2-update-2-insert-source-person-staging-table.sql */

commit;
  1. Be sure to commit the insert statement.
  2. Check that you inserted 255 records into the SOURCE_PERSON table.

In the next step, you will grant the Customer B2C data location schema permission to select the records from the staging SOURCE_PERSON table.

Video: Set up staging data to load

Grant permission to load into xDM

Because your staging data is in the SEMARCHY_STG schema and you intend to load data into the SEMARCHY_CUSTOMER_B2C_MDM xDM data location schema, you must grant SELECT privileges to SEMARCHY_CUSTOMER_B2C_MDM. Without the grants, the insert will fail because of lack of privileges.

To perform this grant, you will need to connect as the admin user, typically the DBA or another user with the rights to perform grants. If you followed the Quick Install tutorials, you will use the same user that created the initial schemas during setup.

To review the Quick Install tutorials, visit the Tutorials main menu.

GO TO TUTORIALS

  1. If you followed the Quick Install on-premises tutorial for Oracle database, then you should connect as the SYSTEM user.
  2. Run the following grant to allow the SEMARCHY_CUSTOMER_B2C_MDM schema to select from the SOURCE_PERSON table in the SEMARCHY_STG schema.

Oracle

2-update-3-grant-select-source-person.sql

grant SELECT on SEMARCHY_STG.SOURCE_PERSON to SEMARCHY_CUSTOMER_B2C_MDM;

PostgreSQL

2-update-3-grant-select-source-person.sql

grant select on semarchy_stg.source_person to semarchy_customer_b2c_mdm;

In the next step, you will insert the data from the SOURCE_PERSON staging table into the SD_PERSON xDM table.

Video: Grant permission to load into xDM

Load only new customer data using delta detection

Now that you prepared the staging environment, you will load updated customer data into xDM.

Your staging table, SOURCE_PERSON, contains all the original records as well as the updates. This is a common scenario in the real world where the source system captures all records. Therefore, it's important to learn how to detect changes to only load the updated records.

  1. Connect to the SEMARCHY_CUSTOMER_B2C_MDM schema.
  2. Run the insert statements to load records into the SD_PERSON table.

Oracle

2-update-4-insert-sd-person.sql

INSERT into SEMARCHY_CUSTOMER_B2C_MDM.SD_PERSON (
     B_LOADID      /* Semarchy system fields */
    ,B_CLASSNAME   /* Semarchy system fields */
    ,B_PUBID       /* Semarchy system fields */
    ,B_SOURCEID
    ,FIRST_NAME
    ,LAST_NAME
    ,DATE_OF_BIRTH
    ,ADDSTREET
    ,ADDCITY
    ,ADDSTATE
    ,ADDPOSTAL_CODE
    ,ADDCOUNTRY
    ,SOURCE_EMAIL
    ,SOURCE_PHONE
    ,PERSON_TYPE
    ,VALUE_STATUS
)  
SELECT    SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_CONTINUOUS_LOADID('CONTINUOUS_LOAD_INTEGRATE_ALL') as B_LOADID /* Call to get Load ID */
    ,stg.B_CLASSNAME /* Entity name. Found in MDM Workbench UI. */
    ,stg.B_PUBID
    ,stg.B_SOURCEID
    ,stg.FIRST_NAME
    ,stg.LAST_NAME
    ,stg.DATE_OF_BIRTH
    ,stg.ADDSTREET
    ,stg.ADDCITY
    ,stg.ADDSTATE
    ,stg.ADDPOSTAL_CODE
    ,stg.ADDCOUNTRY
    ,stg.SOURCE_EMAIL
    ,stg.SOURCE_PHONE
    ,stg.PERSON_TYPE
    ,stg.VALUE_STATUS
FROM SEMARCHY_STG.SOURCE_PERSON stg
WHERE NOT EXISTS (
  select 1
  from SEMARCHY_CUSTOMER_B2C_MDM.MI_PERSON mi
  where mi.B_SOURCEID = stg.B_SOURCEID
    and mi.B_PUBID = stg.B_PUBID
    /* duplicate data detection */
    and ( mi.FIRST_NAME = stg.FIRST_NAME or (mi.FIRST_NAME is null and stg.FIRST_NAME is null) )
    and ( mi.LAST_NAME = stg.LAST_NAME or (mi.LAST_NAME is null and stg.LAST_NAME is null) )
    and ( mi.DATE_OF_BIRTH = stg.DATE_OF_BIRTH or (mi.DATE_OF_BIRTH is null and stg.DATE_OF_BIRTH is null) )
    and ( mi.ADDSTREET = stg.ADDSTREET or (mi.ADDSTREET is null and stg.ADDSTREET is null) )
    and ( mi.ADDCITY = stg.ADDCITY or (mi.ADDCITY is null and stg.ADDCITY is null) )
    and ( mi.ADDSTATE = stg.ADDSTATE or (mi.ADDSTATE is null and stg.ADDSTATE is null) )
    and ( mi.ADDCOUNTRY = stg.ADDCOUNTRY or (mi.ADDCOUNTRY is null and stg.ADDCOUNTRY is null) )
    and ( mi.SOURCE_EMAIL = stg.SOURCE_EMAIL or (mi.SOURCE_EMAIL is null and stg.SOURCE_EMAIL is null) )
    and ( mi.SOURCE_PHONE = stg.SOURCE_PHONE or (mi.SOURCE_PHONE is null and stg.SOURCE_PHONE is null) )
    and ( mi.PERSON_TYPE = stg.PERSON_TYPE or (mi.PERSON_TYPE is null and stg.PERSON_TYPE is null) )
    and ( mi.VALUE_STATUS = stg.VALUE_STATUS or (mi.VALUE_STATUS is null and stg.VALUE_STATUS is null) )
);
COMMIT;

PostgreSQL

2-update-4-insert-sd-person.sql

insert into semarchy_customer_b2c_mdm.sd_person (
     b_loadid      /* Semarchy system fields */
    ,b_classname   /* Semarchy system fields */
    ,b_pubid       /* Semarchy system fields */
    ,b_sourceid
    ,first_name
    ,last_name
    ,date_of_birth
    ,addstreet
    ,addcity
    ,addstate
    ,addpostal_code
    ,addcountry
    ,source_email
    ,source_phone
    ,person_type
    ,value_status
)
select
     semarchy_repository.get_continuous_loadid('CONTINUOUS_LOAD_INTEGRATE_ALL') as b_loadid /* Call to get Load ID */
    ,stg.b_classname /* Entity name. Found in MDM Workbench UI. */
    ,stg.b_pubid
    ,stg.b_sourceid
    ,stg.first_name
    ,stg.last_name
    ,stg.date_of_birth
    ,stg.addstreet
    ,stg.addcity
    ,stg.addstate
    ,stg.addpostal_code
    ,stg.addcountry
    ,stg.source_email
    ,stg.source_phone
    ,stg.person_type
    ,stg.value_status
from semarchy_stg.source_person stg
where not exists (
    select 1
    from semarchy_customer_b2c_mdm.mi_person mi
    where mi.b_sourceid = stg.b_sourceid
        and mi.b_pubid = stg.b_pubid
        /* duplicate data detection */
        and ( mi.first_name = stg.first_name or (mi.first_name is null and stg.first_name is null) )
        and ( mi.last_name = stg.last_name or (mi.last_name is null and stg.last_name is null) )
        and ( mi.date_of_birth = stg.date_of_birth or (mi.date_of_birth is null and stg.date_of_birth is null) )
        and ( mi.addstreet = stg.addstreet or (mi.addstreet is null and stg.addstreet is null) )
        and ( mi.addcity = stg.addcity or (mi.addcity is null and stg.addcity is null) )
        and ( mi.addstate = stg.addstate or (mi.addstate is null and stg.addstate is null) )
        and ( mi.addcountry = stg.addcountry or (mi.addcountry is null and stg.addcountry is null) )
        and ( mi.source_email = stg.source_email or (mi.source_email is null and stg.source_email is null) )
        and ( mi.source_phone = stg.source_phone or (mi.source_phone is null and stg.source_phone is null) )
        and ( mi.person_type = stg.person_type or (mi.person_type is null and stg.person_type is null) )
        and ( mi.value_status = stg.value_status or (mi.value_status is null and stg.value_status is null) )
 );
commit;
  1. Be sure to commit the insert statement.
  2. Check that you inserted 24 records into the SD_PERSON table in the SEMARCHY_CUSTOMER_B2C_MDM data location.

After loading the customer updates, switch to the Customer B2C application to check the updates in xDM.

Video: Load only new customer data using delta detection

Browse updated customer data

After loading the updated customer data, let's review the changes to customer data in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Click the menu button.
  3. Select Refresh.
  4. Look for the person record with the first name "Abe". Note this customer's last name.

    You should see "Abe Corbett-Lopéz". The customer's last name was updated.
  5. Look for the person record with the name "Carroll Spark". Note this record's Person Type.

    You should see Carroll's Person Type is now "Customer". The customer was converted from a prospect to a customer and the record is updated accordingly.

Video: Browse updated customer data

Congratulations

You have successfully loaded data updates using SQL. Let's summarize:

In the next section, you will learn the background concepts to understand how Integration works in xDM. This information should prepare you for developing the integration jobs necessary for loading data into xDM using SQL.

An integration job takes records from source systems and runs them through a sequence of tasks resulting in golden records.

This job is created from the business rules defined in the Application Builder, including validation rules and enrichers.

This section covers the background concepts to understand how the sequences of tasks work in integration jobs.

What you'll learn

Fuzzy integration process

The fuzzy integration process involves matching master records to create golden records. This is what distinguishes fuzzy-matching entities from basic entities.

The fuzzy matching integration process runs through the following process:

1 -- After loading the source data (or authoring source data directly in xDM), enrichers run. These are called pre-consolidation enrichers because the enrichers run on records that have not yet gone through the matching and consolidation process.

2 -- Validations run after enrichers. These pre-consolidation validation rules check the data for errors. If records have violated validation rules, they are placed in the Source Error (SE) table. These records do not proceed downstream and require human intervention.

3&4 -- Records that proceed end up in the Master Integration (MI) table where matching and consolidation takes place. Match rules and thresholds determine whether master records should match and merge. Survivorship rules determine the winning values in the golden records.

5 -- Golden records move to the Golden Integration (GI) table where post-consolidation enrichers run on the golden records.

6 -- xDM publishes the master records to the Master Data (MD) table and the golden records to Golden Data (GD) table.

7 -- The post-consolidation validation rules check for errors in the golden data. Records with errors are placed in the Golden Error (GE) table and removed from proceeding downstream.

8 -- If historization is turned on, xDM captures record history in the Master History (MH) and Golden History (GH) tables.

In the next part, you will learn about publishers and primary keys on fuzzy-matching entities. This is important for understanding how to load fuzzy-matching data.

Fuzzy entity publishers and primary key

You use fuzzy-matching entities when you want to de-duplicate records. This means Semarchy xDM assumes these records have no common identifiers in the source systems.

Let's look at an example of a fuzzy-matching entity:

  1. Navigate to the Customers view under Browse Data. Customer data is modeled using a fuzzy-matching entity.
  2. Look for the person record with the name "Callie Aldaba".
  3. Click on this customer to see more information. Navigate to the MASTER RECORDS tab.
  4. Note the two master records that contributed to this customer's golden records: "CRM.1386237" and "MKT.1438732". We will refer to these records to explain the integration concepts.

Primary key in source data tables

This same customer "Callie Aldaba" appears in the Customer relationship management (CRM) and the Marketing automation (MKT) systems with different IDs ("1386237" from CRM and "1438732" from MKT).

You need to match these two records from CRM and MKT. Then merge them under one golden record.

Source Data

When you load source data into the SD table into Semarchy xDM, you store these IDs in the column named B_SOURCEID.

When publishing data into the hub, the middleware loads this B_SOURCEID column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID.

Golden Data

Semarchy xDM will generate a primary key for the golden record. xDM defines this golden primary key but you can choose between a sequence or a Universally Unique Identifier (UUID) when building your model.

Publisher code in source data tables

Equally important is the publisher code. This code indicates which source system the record came from. The publisher code is defined in the Publishers section of the Application Builder.

In the example above, there are two publisher codes, CRM and MKT.

Semarchy xDM stores the publisher code in the column named B_PUBID. But xDM does not supply the publisher codes from the source system. This is your responsibility as the integration developer to ensure that the middleware inserting data correctly loads the publisher code into B_PUBID.

Basic integration process

The basic entity integration process contains no matching or survivorship processes. This is the major difference compared with fuzzy-matching entities. Therefore, basic entities are designed for simple data structures, including reference data and join tables.

The fuzzy matching integration process runs through the following process:

1 -- When a business user is entering data, xDM enriches and validates the source data live. If loading data, the integration job enriches the source data in the Source Authoring (SA) tables.

2 -- For data loaded into xDM, the integration job validates the data and removes any records that violate validation rules. These errors are placed in the Authoring Errors (AE) tables.

3 -- Data then proceeds to become golden records which are stored in the Golden Data (GD).

4 -- If historization is turned on, xDM captures record history in the Golden History (GH) tables.

Basic entity primary key

You use basic entities when your data does not have duplication. This means these records have single identifiers across all systems, such as reference data. Basic entities are also useful when IDs do not matter, such as in a join table to model a many-to-many relationship. In the latter case, xDM can generate a primary key for you.

Let's look at an example of a basic entity:

  1. Navigate to the Products view under Browse Data. Product data is modeled using a basic entity.
  2. Look for the product record with the name "Carbonite Helmet".
  3. Click on this product to see more information on the PRODUCT tab.
  4. Note the product record's ID "CARBON4861OK".

The demo data assumes that product IDs are consistent across all systems and therefore is suitable for a basic entity. This same ID flows from the source system to the Source Authoring (SA) table and eventually to the Golden Data (GD) table.

Congratulations

You have successfully learned the background concepts behind how integration jobs work in Semarchy xDM. Let's summarize:

In the next section, you will learn how loading data into xDM works. This information is critical for understanding how to kick off integration jobs and how Continuous Loads work.

Semarchy xDM processes data -- running enrichers, validation and match rules, and so on -- in an integration job.

To get an integration job running, you must load data correctly to notify xDM to kick off the integration process.

This section covers how to correctly load data. This is commonly called a load job, an external load, or publishing data using SQL in the Semarchy documentation.

External loads require function calls to initialize and then submit the external load to Semarchy xDM. It is only after submitting the external load that xDM knows to kick off an integration job.

What you'll learn

Let's dive deeper to understand how the process works.

Step 1: Get load ID

The first step in every external load is to get a load ID by calling a function in the repository schema, SEMARCHY_REPOSITORY.

When you call the function GET_NEW_LOADID, your call initializes an external load and returns a Load ID which you can use to track your load in the Data Location logs. See the code snippet below to learn which parameters GET_NEW_LOADID takes.

Initialize load example

vLoad_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_NEW_LOADID(
        '<data_location_name>'  /* Data Location Name (as defined in the UI) */
        '<program_name>'        /* Informational. Identifies the Middleware*/
        '<load_description>'    /* Informational. Describes the load. */
        '<user_name>'           /* User initializing the load. */
          );

Step 2: Insert records

Inserting data into the SD tables (and SA tables for basic entities) is a straightforward insert statement in SQL.

Below are a few tips:

Step 3: Submit or cancel job

Once you have completed the inserts and you want xDM to process the data, you must call the function SUBMIT_LOAD in the repository schema, SEMARCHY_REPOSITORY, to launch the integration job.

When you call the function SUBMIT_LOAD, your call submits the external load. The function returns a Batch ID, which allows you to track the integration job in the Data Location logs. See the code snippet below to learn which parameters SUBMIT_LOAD takes.

Submit load example

vBatch_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.SUBMIT_LOAD(
        <load_id>                  /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<integration_job>'        /* Name of the Integration Job to trigger. */
        '<user_name>'              /* User who has initialized the load. */
          );

In the case where you make a mistake or want to cancel the load job for any reason, you can call the function CANCEL_LOAD in the repository schema, SEMARCHY_REPOSITORY.

Cancel load example

SEMARCHY_REPOSITORY.INTEGRATION_LOAD.CANCEL_LOAD(
        <load_id>                    /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        '<user_name>'                /* User who has initialized the load. */
          );

The CANCEL_LOAD function does not delete the data you inserted into the SD or SA tables from Step 2. It simply marks the Load ID has canceled so that it can no longer take new records, which is important if you decide to cancel. This means you have to delete the records from the SD or SA tables separately.

To summarize, you learned that an external load has 3 components:

  1. Get load ID
  2. Insert records
  3. Submit or cancel the load

In the next part, you will learn the benefit of Continuous Loads which give you a shortcut so you can focus only on inserting records with a pre-determined Load ID, and xDM takes care of the submission for you.

Continuous loads

Continuous loads is a wonderful feature that allows you to load data into Semarchy xDM in an uninterrupted way because you don't have to get a load ID or submit the load. Instead, you configure the Continuous Load in the Data Location and get a reserved Load ID to use. You must include this continuous load ID in your insert but you don't have to call the GET_NEW_LOADID function, so it saves you some overhead.

Semarchy xDM automatically submits an external load at the interval that you define on the continuous load job and processes the data.

The Continuous Loads feature was designed for xDM users that relied heavily on the REST API and needed near-real-time processing of integration jobs. Removing the first step of getting a load ID and the last time of submitting a load streamlined the process of loading data.

Now, we recommend that all users take advantage of continuous loads as the first choice because it is efficient and reduces the overhead required to process records.

In the next step, you will see an example of an Extract, Transform, Load (ETL) tool in action. This gives you an idea of how middleware can follow the external load process to insert and kick off an integration job in Semarchy xDM.

Example ETL tool

While it is possible for your organization to write custom PL/SQL scripts to load data, it is more common to have a middleware tool -- be it ETL or an Enterprise Service Bus (ESB) tool -- load the data. These tools come with many benefits, including scheduling load jobs and robust error handling.

When using a middleware tool, the process to load data follows the external load steps. That means that you map the insert from the source to the target tables. You must also call the GET_NEW_LOADID function to get the load ID, map this load ID in the insert statement, and call the SUBMIT_LOAD or CANCEL_LOAD function to complete the load job. You can also set up a continuous loads job that allows you to only insert the records with the continuous load ID and bypass getting a load ID and submitting a load.

Semarchy xDM is agnostic to middleware tools, which means you can use any ETL tools at your disposal. Below is a view of Semarchy Integrator Designer, an ETL tool, to give you an example of how mapping the source data into the SD_Person table looks if you use an ETL tool to load your data.

This mapping shows you an example of how a typical ETL tool maps from the source to the target table and includes the SEM_LOAD_ID as a variable to insert the mandatory Load ID.

This process shows you an example of how a typical ETL tool runs through the load job to submit or cancel a load and ultimately launch an integration job in Semarchy xDM.

Congratulations

You have successfully learned the background concepts behind how integration jobs work in Semarchy xDM. Let's summarize:

In the next section, you will kick off an external load to insert data into the Customers' Product entity. This will give you practical experience running SQL scripts to load data into xDM.

To give you practical experience loading data and launching an integration job, this section guides you executing through SQL scripts to load data into the SA_PERSON_PRODUCT table (the source data table for the Customers' Product entity).

What you'll learn

Let's start running an external load.

Review new customers missing data

When you loaded the updated customer data, you updated several prospects who became customers. Let's review these new customers in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Click the filter button.
  3. Enter this SemQL expression as a SemQL filter and click APPLY.

5-load-1-semql-customers-without-products

PersonType = 'CUSTOMER' AND NOT ANY PersonProducts HAVE ( ID is not null )
  1. See the customers who currently do not have any products in xDM.

Next, you will load the products that these customers purchased. To proceed, ensure you have access to the database and a SQL client.

Video: Review new customers missing data

Set up staging data to load

Let's set up the staging table with the customers' product data. Similar to the customer data updates you prepared earlier in this tutorial, you are setting up this staging table as a simulation of a live data feed.

Let's set up the staging table in the SEMARCHY_STG schema and load the dataset.

  1. Connect to the SEMARCHY_STG schema.
  2. Run the create table statement to set up the landing table, SOURCE_PERSON_PRODUCT.

Oracle

5-load-2-create-source-person-product-staging-table.sql

CREATE TABLE SEMARCHY_STG.SOURCE_PERSON_PRODUCT (
     B_CLASSNAME varchar(128)
    ,ID varchar(128)
    ,PURCHASE_DATE date
    ,REGISTRATION_DATE date
    ,F_PRODUCT varchar(128)
    ,FP_PERSON varchar(128)
    ,FS_PERSON varchar(128)
)
;

PostgreSQL

5-load-2-create-source-person-product-staging-table.sql

create table semarchy_stg.source_person_product (
     b_classname varchar(128)
    ,id varchar(128)
    ,purchase_date date
    ,registration_date date
    ,f_product varchar(128)
    ,fp_person varchar(128)
    ,fs_person varchar(128)
)
;
  1. Run the insert statements to load records into the table SOURCE_PERSON_PRODUCT created in the previous step

Oracle

5-load-3-insert-source-person-product-staging-table.sql

INSERT INTO SOURCE_PERSON_PRODUCT (B_CLASSNAME, ID, PURCHASE_DATE, REGISTRATION_DATE, F_PRODUCT, FP_PERSON, FS_PERSON) VALUES ('PersonProduct', 'CARBON6061WD_CRM1409489', to_date('12/23/2017', 'MM-DD-YYYY'), to_date('01/13/2018', 'MM-DD-YYYY'), 'CARBON6061WD', 'CRM', '1409489');

INSERT INTO SOURCE_PERSON_PRODUCT (B_CLASSNAME, ID, PURCHASE_DATE, REGISTRATION_DATE, F_PRODUCT, FP_PERSON, FS_PERSON) VALUES ('PersonProduct', 'STREAM2061RS_CRM1409489', to_date('12/23/2017', 'MM-DD-YYYY'), to_date('01/13/2018', 'MM-DD-YYYY'), 'STREAM2061RS', 'CRM', '1409489');

/* Abbreviated for brevity. Get the full insert statement at https://github.com/semarchy/xdm-tutorials/blob/master/Track3-Resources/HelperFiles/Track3-Unit3/Oracle/5-load-3-insert-source-person-product-staging-table.sql */


COMMIT;

PostgreSQL

5-load-3-insert-source-person-product-staging-table.sql

insert into semarchy_stg.source_person_product (b_classname, id, purchase_date, registration_date, f_product, fp_person, fs_person) values ('PersonProduct', 'CARBON6061WD_CRM1409489', to_date('12/23/2017', 'MM-DD-YYYY'), to_date('01/13/2018', 'MM-DD-YYYY'), 'CARBON6061WD', 'CRM', '1409489');

insert into semarchy_stg.source_person_product (b_classname, id, purchase_date, registration_date, f_product, fp_person, fs_person) values ('PersonProduct', 'STREAM2061RS_CRM1409489', to_date('12/23/2017', 'MM-DD-YYYY'), to_date('01/13/2018', 'MM-DD-YYYY'), 'STREAM2061RS', 'CRM', '1409489');


/* Abbreviated for brevity. Get the full insert statement at https://github.com/semarchy/xdm-tutorials/blob/master/Track3-Resources/HelperFiles/Track3-Unit3/PostgreSQL/5-load-3-insert-source-person-product-staging-table.sql */


commit;
  1. Be sure to commit the insert statement.
  2. Check that you inserted 163 records into the SOURCE_PERSON_PRODUCT table.

In the next step, you will grant the Customer B2C data location schema permission to select the records from the staging SOURCE_PERSON_PRODUCT table.

Video: Set up staging data to load

Grant permission to load into xDM

Again to load your staging data in the SEMARCHY_STG schema, you must grant SELECT privileges to SEMARCHY_CUSTOMER_B2C_MDM.

To perform this grant, you will need to connect as the admin user again. If you followed the Quick Install tutorials, you will use the same user that created the initial schemas during setup.

To review the Quick Install tutorials, visit the Tutorials main menu.

GO TO TUTORIALS

  1. If you followed the Quick Install on-premises tutorial for Oracle database, then you should connect as the SYSTEM user.
  2. Run the following grant to allow the SEMARCHY_CUSTOMER_B2C_MDM schema to select from the SOURCE_PERSON_PRODUCT table in the SEMARCHY_STG schema.

Oracle

5-load-4-grant-select-source-person-product.sql

grant SELECT on SEMARCHY_STG.SOURCE_PERSON_PRODUCT to SEMARCHY_CUSTOMER_B2C_MDM;

PostgreSQL

5-load-4-grant-select-source-person-product.sql

grant select on semarchy_stg.source_person_product to semarchy_customer_b2c_mdm;

In the next step, you will begin the external load.

Video: Grant permission to load into xDM

Insert customers' product data

In this part, you will start an external load by manually calling a function to get a load ID from the SEMARCHY_REPOSITORY schema, then insert data with the load ID, and finally, submit it by the submit function.

Step 1: Get load ID

To initialize a load, call the GET_NEW_LOADID function and pass in the correct arguments. Before executing the function, remember to turn on the Dbms Output to see the returned Load ID. In Oracle using SQL Developer, you can access the Dbms Output from the View menu.

Connect to the SEMARCHY_REPOSITORY to view the Load ID.

Note that the SQL function below to initialize a load ID calls the DBMS_OUTPUT package to output the Load ID. This is useful for an instructional tutorial but you will likely not include this in your development or production ETL systems as you won't be following this manual process for loading data.

Let's get started.

  1. Connect as SEMARCHY_REPOSITORY.
  2. Call the function GET_NEW_LOADID to start an external load.

Oracle

5-load-5-get-load-id.sql

DECLARE
  v_load_id int;
  BEGIN
   v_load_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_NEW_LOADID (
        'CustomerB2CDemo'       /* Data Location Name. Case sensitive. Found in MDM Workbench UI. */
       ,'MANUAL_ETL_SCRIPT'     /* Informational program name. Identifies the middleware.         */
       ,'LOAD_DATA_UPDATE'      /* Informational. Describes the load.                             */
       ,'MANUAL_ETL_USER'       /* User initializing the load. (Not found in MDM Workbench UI.)   */
   );
   dbms_output.put_line(v_load_id);
END;
/

PostgreSQL

5-load-5-get-load-id.sql

do
$$

declare
  v_load_id int;
  begin
    v_load_id := semarchy_repository.get_new_loadid (
         'CustomerB2CDemo'       /* Data Location name. case sensitive. Found in MDM workbench UI. */
        ,'manual_etl_script'     /* Informational program name. Identifies the middleware.         */
        ,'load_data_update'      /* Informational. Describes the load.                             */
        ,'manual_etl_user'       /* User initializing the load. (Not found in MDM workbench UI.)   */
    );
    raise notice '%', v_load_id;
end;
$$;
  1. Copy the value returned in the Dbms Output window. In the next step, you will add this Load ID to the insert statement and run it to load data.

Video: Step 1: Get load ID

Step 2: Insert data

In this step, you will load the source data into the SA_PERSON_PRODUCT table (in contrast to the SD_PERSON table) because PersonProduct is a basic entity.

  1. Connect as SEMARCHY_CUSTOMER_B2C_MDM.
  2. Paste the Load ID from Step 1: Get Load ID into the insert statement below. Find the line /* load_id */ as B_LOADID and substitute the /* load_id */ for the actual Load ID (in this tutorial example, it is 162).
  3. Insert the data from the SOURCE_PERSON_PRODUCT staging table into the SD_PERSON_PRODUCT xDM table with delta detection.

Oracle

5-load-6-insert-sa-person-product.sql

INSERT into SEMARCHY_CUSTOMER_B2C_MDM.SA_PERSON_PRODUCT (
     B_LOADID      /* Semarchy system fields */
    ,B_CLASSNAME   /* Semarchy system fields */
    ,ID
    ,PURCHASE_DATE
    ,REGISTRATION_DATE
    ,F_PRODUCT
    ,FP_PERSON
    ,FS_PERSON
)
SELECT
     /* load_id */ as B_LOADID    /* Insert the load ID here that was returned from Step 4 */
    ,stg.B_CLASSNAME /* Entity name. Found in MDM Workbench UI. */
    ,stg.ID
    ,stg.PURCHASE_DATE
    ,stg.REGISTRATION_DATE
    ,stg.F_PRODUCT
    ,stg.FP_PERSON
    ,stg.FS_PERSON
 FROM SEMARCHY_STG.SOURCE_PERSON_PRODUCT stg
 WHERE NOT EXISTS (
   SELECT 1
   FROM SEMARCHY_CUSTOMER_B2C_MDM.GD_PERSON_PRODUCT gd
   WHERE 1=1
     /* match on IDs */
     AND gd.ID = stg.ID
     /* duplicate data detection */
     and ( gd.PURCHASE_DATE = stg.PURCHASE_DATE or (gd.PURCHASE_DATE is null and stg.PURCHASE_DATE is null) )
     and ( gd.REGISTRATION_DATE = stg.REGISTRATION_DATE or (gd.REGISTRATION_DATE is null and stg.REGISTRATION_DATE is null) )
     and ( gd.F_PRODUCT = stg.F_PRODUCT or (gd.F_PRODUCT is null and stg.F_PRODUCT is null) )
     and ( gd.FP_PERSON = stg.FP_PERSON or (gd.FP_PERSON is null and stg.FP_PERSON is null) )
     and ( gd.FS_PERSON = stg.FS_PERSON or (gd.FS_PERSON is null and stg.FS_PERSON is null) )
 );
 COMMIT;

PostgreSQL

5-load-6-insert-sa-person-product.sql

insert into semarchy_customer_b2c_mdm.sa_person_product (
     b_loadid      /* Semarchy system fields */
    ,b_classname   /* Semarchy system fields */
    ,id
    ,purchase_date
    ,registration_date
    ,f_product
    ,fp_person
    ,fs_person
)
select
     /* load_id */ as b_loadid    /* Insert the load ID here that was returned from Step 1 */
    ,stg.b_classname /* Entity name. Found in MDM Workbench UI. */
    ,stg.id
    ,stg.purchase_date
    ,stg.registration_date
    ,stg.f_product
    ,stg.fp_person
    ,stg.fs_person
 from semarchy_stg.source_person_product stg
 where not exists (
   select 1
   from semarchy_customer_b2c_mdm.gd_person_product gd
   where /* match on IDs */
         gd.id = stg.id
     /* duplicate data detection */
     and ( gd.purchase_date = stg.purchase_date or (gd.purchase_date is null and stg.purchase_date is null) )
     and ( gd.registration_date = stg.registration_date or (gd.registration_date is null and stg.registration_date is null) )
     and ( gd.f_product = stg.f_product or (gd.f_product is null and stg.f_product is null) )
     and ( gd.fp_person = stg.fp_person or (gd.fp_person is null and stg.fp_person is null) )
     and ( gd.fs_person = stg.fs_person or (gd.fs_person is null and stg.fs_person is null) )
 );
commit;
  1. Remember to commit the inserts.
  2. You should see 18 records were inserted into the SA_PERSON_PRODUCT table.

Video: Step 2: Insert records

Step 3: Submit data

  1. Connect as SEMARCHY_REPOSITORY again.
  2. Paste the Load ID from Step 1: Get Load ID into the SUBMIT_LOAD function below. Find the line /* load_id */ /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID from Step 4 */ and substitute the /* load_id */ for the actual Load ID (in this tutorial example, it is 162).
  3. Run the following function SUBMIT_LOAD to submit the load and kick off an integration job. Note the Batch ID that the SUBMIT_LOAD function returns. You will use this Batch ID to track the integration job as it proceeds through the data processing steps in xDM.

Oracle

5-load-7-submit-load.sql

DECLARE
  v_batch_id int;
BEGIN
  v_batch_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.SUBMIT_LOAD (
             /* load_id */                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID from Step 4 */
            ,'INTEGRATE_ALL'    /* Must exactly match a defined integration job. Found in MDM Workbench UI. */
            ,'MANUAL_ETL_USER'  /* Must exactly match the value used in MTA_GET_NEW_LOADID in Step 4. */
  );
  dbms_output.put_line(v_batch_id);
END;
/

PostgreSQL

5-load-7-submit-load.sql

do
$$

declare
  v_load_id int;
  begin
    v_load_id := semarchy_repository.get_new_loadid (
         'CustomerB2CDemo'       /* Data Location name. case sensitive. Found in MDM workbench UI. */
        ,'manual_etl_script'     /* Informational program name. Identifies the middleware.         */
        ,'load_data_update'      /* Informational. Describes the load.                             */
        ,'manual_etl_user'       /* User initializing the load. (Not found in MDM workbench UI.)   */
    );
    raise notice '%', v_load_id;
end;
$$;
  1. Record the Batch ID that the function returns. In this tutorial example, the batch ID is 78. It's common for batch IDs to differ from Load IDs especially when you initialize several loads and do not submit them.

Well done! You learned how to complete a load and launch an integration job in xDM. In the next part, you will learn to check the data location logs to see the progress of load jobs and integration job.

Video: Step 3: Submit data

Check data location logs

Once you have a load ID, you can check the status of the load in the data location logs. This is useful to know which loads are open as well as which loads have been submitted and canceled.

To view the data location logs:

  1. Go to your application. On the welcome page, click the Application Builder. Go to the Management tab. Open the Data Locations > CustomerB2CDemo > Latest Loads node.

  1. Notice that the latest load 162 has a lightning bolt icon which means it is running and waiting for your inserts. The load 161 was canceled, as the red warning indicates. And the load 126 was submitted, as the green checkmark indicates.
  2. Once you submit the load, you can use the Batch ID to trace the integration job. Back in the Management tab, open the Data Locations > CustomerB2CDemo > Latest Integration Batches node.

  1. You can drill down the batch to see more details about each step.

Video: Check data location logs

Browse new customers' products data

Once the integration job batch is complete, let's review the updated customers' products data in the Customer B2C application:

  1. Navigate back to the Customers view in the Customer B2C application.
  2. Click the menu button again.
  3. Select Refresh.

  1. You will see there are no more customer records meeting this filter.
PersonType = 'CUSTOMER' AND NOT ANY PersonProducts HAVE ( ID is not null )

The reason is that the new customers' product data means there are no more customers missing their associated product data.

  1. Turn off the filter by clicking RESET on the filter.
  2. Look for the person record with the name Carroll Spark. Click on the PRODUCTS tab.

  1. If you see products under Carroll Spark and the other customers we identified as missing products, then you successfully loaded data.

Video: Browse new customers' products data

How to cancel a running load job

There are times when you grabbed a load ID, and you later decide not to load data. Or you load data, hit an error, and decide you want to cancel the load. Or your ETL tool initialized many loads but all the inserts failed, and now you have too many running external loads. These are the most common reasons why you may want to cancel a load.

This part will guide you through how to cancel an external load.

  1. Connect to the repository user, SEMARCHY_REPOSITORY.
  2. Take the CANCEL_LOAD function and substitute /* load_id */ with the load ID (in this tutorial example, it is 162).

Oracle

5-load-8-optional-cancel-load.sql

BEGIN
  SEMARCHY_REPOSITORY.INTEGRATION_LOAD.CANCEL_LOAD (
       /* load_id */          /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
      ,'MANUAL_ETL_USER'      /* Must exactly match the value used in GET_NEW_LOADID */
  );
END;
/

PostgreSQL

5-load-8-optional-cancel-load.sql

begin
  perform semarchy_repository.cancel_load (
       /* load_id */          /* Load ID returned by get_new_loadid */
      ,'manual_etl_user'      /* Must exactly match the value used in get_new_loadid */
  );
end;
$$;
  1. To cancel a running external load, execute the function CANCEL_LOAD.
  2. Check that you successfully canceled the load in the Data Location logs.

Video: How to cancel a running load job

Congratulations

You learned how to load data in xDM using SQL.

Great job going through this tutorial!

You will find more information about Integration jobs, such as how to query data using SQL and how to use the REST API.

Well done! You have completed your first data management project with Semarchy xDM.

In this tutorial, you learned:

Keep this checklist handy while you are undergoing Integration development to help you remember the required steps:

You will find more Integration information, including how to query data using SQL and how to use the REST API in the Integration tutorial series.

There are also tutorials about how to design applications in the Data Authoring and Data Consolidation tutorial tracks.

Go back to the main menu to see more tutorials.

GO TO TUTORIALS

Go further with xDM

This tutorial taught you how to load data into xDM using SQL. xDM is capable of addressing all types of data, including customer data, location data, reference data, organization data, supplier data, and other domains.

Visit our website to learn about the different solutions we provide.