This tutorial teaches you how to load data in Semarchy xDM using the SQL API. You will learn basic and advanced queries as well as important integration concepts. This tutorial is based on the Customer B2C demo application, so you will get experience dealing with loading fuzzy-matching entities and learning how integration jobs work.

What you'll learn

Before you start

This tutorial is the last SQL-based unit within the Integration track.

Before following this unit, you must:

If you have not yet, follow these tutorials from the main tutorials menu.

GO TO TUTORIALS

If you have not already done so, click the button below to download resources, such as SQL helper files, used within this tutorial.

DOWNLOAD TUTORIAL RESOURCES

The estimated duration of this unit is about 1 hour and 30 minutes.

Enjoy this tutorial!

This step covers the background concepts to understand how the sequences of tasks work in integration jobs.

What you'll learn

Definitions: Integration job and Certification Process

Let's start by explaining the base concepts that you will need to load data in Semarchy xDM.

Integration job

An integration job takes records from source systems and runs them through the Certification Process.

This job is generated from the business rules defined in the Application Builder, including validation rules and enrichers. It uses the data structures automatically created in the MDM hub when deploying the model edition.

Certification process

The Certification Process is a sequence of automated tasks to create and certify golden data out of various sources:

The certification process takes these various sources, applies the rules and constraints defined in the model in order to create, update or delete the golden data.

Fuzzy integration process

The fuzzy integration process involves matching master records to create golden records. This is what distinguishes ID and fuzzy-matched entities from basic entities.

The fuzzy matching integration process runs through the following process:

1 - After loading the source data (or authoring source data directly in xDM), enrichers run. These are called pre-consolidation enrichers because the enrichers run on records that have not yet gone through the matching and consolidation process.

2 - Validations run after enrichers. These pre-consolidation validation rules check the data for errors. If records have violated validation rules, they are placed in the Source Error (SE) table. These records do not proceed downstream and require human intervention.

3&4 - Records that proceed end up in the Master Integration (MI) table where matching and consolidation takes place. Match rules and thresholds determine whether master records should match and merge. Survivorship rules determine the winning values in the golden records.

5 - Golden records move to the Golden Integration (GI) table where post-consolidation enrichers run on the golden records.

6 - xDM publishes the master records to the Master Data (MD) table and the golden records to Golden Data (GD) table.

7 - The post-consolidation validation rules check for errors in the golden data. Records with errors are placed in the Golden Error (GE) table and removed from proceeding downstream.

8 - If historization is turned on, xDM captures record history in the Master History (MH) and Golden History (GH) tables.

Basic integration process

The basic entity integration process contains no matching or survivorship processes. This is the major difference compared with fuzzy-matching entities. Therefore, basic entities are designed for simple data structures, including reference data and join tables.

The fuzzy matching integration process runs through the following process:

1 - When a business user is entering data, xDM enriches and validates the source data live. If loading data, the integration job enriches the source data in the Source Authoring (SA) tables.

2 - For data loaded into xDM, the integration job validates the data and removes any records that violate validation rules. These errors are placed in the Authoring Errors (AE) tables.

3 - Data then proceeds to become golden records which are stored in the Golden Data (GD).

4 - If historization is turned on, xDM captures record history in the Golden History (GH) tables.

Basic entity primary key

You use basic entities when your data does not have duplication. This means these records have single identifiers across all systems, such as reference data. Basic entities are also useful when IDs do not matter, such as in a join table to model a many-to-many relationship. In the latter case, xDM can generate a primary key for you.

Congratulations

You have successfully learned the background concepts behind how integration jobs work in Semarchy xDM. Let's summarize:

In the next step, you will learn how loading data into xDM works. This information is critical for understanding how to kick off integration jobs and how Continuous Loads work.

Semarchy xDM processes data - running enrichers, validation and match rules, and so on - in an integration job.

To get an integration job running, you must load data correctly to notify xDM to kick off the integration process.

This step covers how to correctly load data. This is commonly called a load job or an external load in the Semarchy documentation.

External loads require function calls to initialize and then submit the external load to Semarchy xDM. It is only after submitting the external load that xDM knows to kick off an integration job.

What you'll learn

External loads

External Loads work following a 3-step process:

Step 1: Get a load ID

The first step in every external load is to get a load ID by calling a function in the repository schema, SEMARCHY_REPOSITORY.

When you call the function GET_NEW_LOADID, your call initializes an external load and returns a Load ID which you can use to track your load in the Data Location logs. See the code snippet below to learn which parameters GET_NEW_LOADID takes.

vLoad_id := <repository_schema>.get_new_loadid(
        '<data_location_name>'  /* Data Location Name (as in the UI) */
        '<program_name>'        /* Informational. Identifies the Middleware*/
        '<load_description>'    /* Informational. Describes the load. */
        '<user_name>'           /* User initializing the load. */
          );

Note: this sample code above applies to PostgreSQL. Sample code for each database is available in the Semarchy documentation.

Step 2: Insert records

Inserting data into the SD tables (and SA tables for basic entities) is a straightforward insert statement in SQL.

Below are a few tips:

  1. When the developer designs the xDM model, ensure that you configure the integration job to contain the entities you want to process. If the entity is missing, you can successfully load the data into the source tables but xDM will not process the data.
  2. Use a database transaction when writing to the source data tables
  3. There are mandatory system columns you must insert:
  1. Remember to commit after you finish inserting all your data

Step 3: Submit or cancel the job

Once you have completed the inserts and you want xDM to process the data, you must call the function SUBMIT_LOAD in the repository schema, SEMARCHY_REPOSITORY, to launch the integration job.

When you call the function SUBMIT_LOAD, your call submits the external load. The function returns a Batch ID, which allows you to track the integration job in the Data Location logs. See the code snippet below to learn which parameters SUBMIT_LOAD takes.

Submit load example

vBatch_id := <repository_schema>.submit_load(
        <load_id>                /* Load ID returned by get_new_loadid */
        '<integration_job>'        /* Name of the Integration Job to trigger. */
        '<user_name>'                /* User who has initialized the load. */
          );

In the case where you make a mistake or want to cancel the load job for any reason, you can call the function CANCEL_LOAD in the repository schema, SEMARCHY_REPOSITORY.

Cancel load example

<repository_schema>.cancel_load(
        <load_id>                /* Load ID returned by get_new_loadid */
        '<user_name>'                /* User who has initialized the load. */
          );

To summarize, you learned that an external load is composed of 3 major steps:

  1. Get load ID
  2. Insert records
  3. Submit or cancel the load

In the next part, you will learn the benefits of Continuous Loads which give you a shortcut so you can focus only on inserting records.

Continuous loads

Continuous loads is a very convenient feature that allows you to load data into Semarchy xDM in an uninterrupted way, without having to get a load ID or submit the load.

With continuous loads:

Semarchy xDM automatically submits an external load at the interval that you define on the continuous load job and processes the data.

The Continuous Loads feature was designed for xDM users that relied heavily on the REST API and needed near-real-time processing of integration jobs. Removing the first step of getting a load ID and the last time of submitting a load streamlined the process of loading data.

Now, we recommend that all users take advantage of continuous loads as the first choice because it is efficient and reduces the overhead required to process records.

In the next step, you will see an example of an Extract, Transform, Load (ETL) tool in action. This gives you an idea of how middleware can follow the external load process to insert and kick off an integration job in Semarchy xDM.

Congratulations

You have successfully learned the background concepts behind how integration jobs work in Semarchy xDM. Let's summarize:

In the next step, you will configure an integration job.

In this step, you will learn how to create an integration job. The role of the integration job is to process data submitted in an external load and to run it through xDM's certification process. This is the minimum setup required to load data into xDM.

As the integration job will be set up upon the data model used by the Customer B2C tutorial, you must follow it entirely before running the current unit, if you have not done it yet.

Create the integration job

You will now create an integration job and add all entities in the correct order.

You will also set a job parameter to ensure that statistics do not run and therefore speed up processing time.

  1. In the navigation menu, right-click on CustomerB2CDemo>Jobs and select "Add Job".
  2. Enter INTEGRATE_ALL as Name and click Next.

  1. Add the entities to the integration job following the same order as displayed below and click Finish.

Entities are processed in the order that you specify. When entities are related by references, you must order parent entities before the child entities. For example, since Person refers to Nickname, it must be ordered afterward.

  1. Click on Job Parameters.

  1. Click the Add Parameter button as shown below to add a parameter to this job.

Enter the following parameter values and click Finish:

  1. Save your work.

Deploy the model changes

In this section, you will deploy your model changes to the CustomerB2CDemo data location to make your integration job live.

  1. Switch to the Management perspective.

  1. Right-click the CustomerB2CDemo data location and select Deploy Model Edition.

  1. Do not modify parameters on the first page and click Finish.

  1. Click Finish.

Congratulations

In this part, you learned how to set up an integration job.

In the next step, you will use this integration job to load data using an external load.

To give you practical experience loading data and launching an integration job, this step guides you executing through SQL scripts to load data into the SA_PERSON_PRODUCT table (the source data table for the Customers' Product entity).

What you'll learn

Let's start running an external load.

Review new customers missing data

When you loaded the updated customer data, you updated several prospects who became customers. Let's review these new customers in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Click the filter button.

  1. Select the SemQL Search type and enter the following SemQL expression:

PersonType = 'CUSTOMER' AND NOT ANY PersonProducts HAVE ( ID is not null )
  1. Click Apply.
  2. See the customers who currently do not have any products in xDM.

Next, you will load the products that these customers purchased. To proceed, ensure you have access to the database and a SQL client.

Set up the staging data to load

Let's set up the staging table with the customers' product data and load a dataset. You can think of this staging table as a simulation of a live data feed where you typically get new data from source systems to load into xDM for updating customer information.

You will create the staging table in the SEMARCHY_STG schema. This schema is created by default as part of the Install on-premises, Install on AWS and Install on Azure quick-install tutorials.

  1. Connect to the SEMARCHY_STG schema using your SQL client and open a SQL editor. You can follow the same procedure as the one described in the Set up a SQL client for Semarchy xDM unit for other schemas.
  2. Copy the create table statement from the appropriate tutorial resources file to set up the landing table, SOURCE_PERSON_PRODUCT.

postgresql/5-load-2-create-source-person-product-staging-table.sql

create table semarchy_stg.source_person_product (
     b_classname varchar(128)
    ,id varchar(128)
    ,purchase_date date
    ,registration_date date
    ,f_product varchar(128)
    ,fp_person varchar(128)
    ,fs_person varchar(128)
)
;

oracle/5-load-2-create-source-person-product-staging-table.sql

create table semarchy_stg.source_person_product (
     b_classname varchar(128)
    ,id varchar(128)
    ,purchase_date date
    ,registration_date date
    ,f_product varchar(128)
    ,fp_person varchar(128)
    ,fs_person varchar(128)
)
;

sqlserver/5-load-2-create-source-person-product-staging-table.sql

create table semarchy_stg.dbo.source_person_product (
     b_classname varchar(128)
    ,id varchar(128)
    ,purchase_date date
    ,registration_date date
    ,f_product varchar(128)
    ,fp_person varchar(128)
    ,fs_person varchar(128)
)
;
  1. Paste and run the statement:

  1. Check that the SEMARCHY_STG table was created under the SEMARCHY_STG schema:

  1. Copy the insert statements from the appropriate tutorial resources file to load records into the table SOURCE_PERSON_PRODUCT:

postgresql/5-load-3-insert-source-person-product-staging-table.sql

oracle/5-load-3-insert-source-person-product-staging-table.sql

sqlserver/5-load-3-insert-source-person-product-staging-table.sql

  1. Paste and run the insert statements in your SQL client:

  1. Be sure to commit the insert statement.
  2. Execute the following SQL query to check that you inserted 163 records into the SOURCE_PERSON_PRODUCT table:
SELECT COUNT(*) FROM semarchy_stg.source_person_product
SELECT COUNT(*) FROM semarchy_stg.dbo.source_person_product

In the next step, you will grant the Customer B2C data location schema permission to select the records from the staging SOURCE_PERSON_PRODUCT table.

Grant permission to load into xDM

Because your staging data is in the SEMARCHY_STG schema and you intend to load data into the SEMARCHY_CUSTOMER_B2C_MDM xDM data location schema, you must grant SELECT privileges to SEMARCHY_CUSTOMER_B2C_MDM.

To perform this grant, you will need to connect as the admin user, typically the DBA or another user with the rights to perform grants. If you followed the Quick Install tutorials, you will use the same user that created the initial schemas during setup.

  1. Using your SQL client, connect to your database as the admin user:
  1. Copy the following grant statement to allow the SEMARCHY_CUSTOMER_B2C_MDM schema to select from the SOURCE_PERSON_PRODUCT table in the SEMARCHY_STG schema.

postgresql/5-load-4-grant-select-source-person-product.sql

oracle/5-load-4-grant-select-source-person-product.sql

grant select on semarchy_stg.source_person_product to semarchy_customer_b2c_mdm;

sqlserver/5-load-4-grant-select-source-person-product.sql

grant select on semarchy_stg.dbo.source_person_product to SEMARCHY_CUSTOMER_B2C_MDM;
  1. Paste and execute the grant statement in your SQL editor:

In the next step, you will begin the external load.

Insert customers' product data

In this part, you will start an external load by manually calling a function to get a load ID from the SEMARCHY_REPOSITORY schema, then insert data with the load ID, and finally, submit it.

Step 1: Get a load ID

To initialize a load, you will call the GET_NEW_LOADID function and pass in the correct arguments.

  1. Connect as SEMARCHY_REPOSITORY.
  2. Call the function GET_NEW_LOADID to start an external load.

postgresql/5-load-5-get-load-id.sql

do
$$

declare
  v_load_id int;
  begin
    v_load_id := semarchy_repository.get_new_loadid (
         'CustomerB2CDemo'       /* Data Location name. case sensitive. Found in Application Builder. */
        ,'manual_etl_script'     /* Informational program name. Identifies the middleware.         */
        ,'load_data_update'      /* Informational. Describes the load.                             */
        ,'manual_etl_user'       /* User initializing the load. (Not found in Application Builder.)   */
    );
    raise notice '%', v_load_id;
end;
$$;

Copy the value returned in the Messages tab. In the example below, the returned Load ID is 36:

In the next step, you will add this Load ID to the insert statement and run it to load data.

oracle/5-load-5-get-load-id.sql

DECLARE
  v_load_id int;
  BEGIN
   v_load_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_NEW_LOADID (
        'CustomerB2CDemo'       /* Data Location Name. Case sensitive. Found in Application Builder. */
       ,'MANUAL_ETL_SCRIPT'     /* Informational program name. Identifies the middleware.         */
       ,'LOAD_DATA_UPDATE'      /* Informational. Describes the load.                             */
       ,'MANUAL_ETL_USER'       /* User initializing the load. (Not found in Application Builder.)   */
   );
   dbms_output.put_line(v_load_id);
END;
/

Note that the SQL function below to initialize a load ID calls the DBMS_OUTPUT package to output the Load ID. This is useful for an instructional tutorial but you will likely not include this in your development or production ETL systems as you won't be following this manual process for loading data.

If you are using SQL Developer: before executing the function, activate the Dbms Output to see the returned Load ID.

Execute the script and copy the value returned in the Dbms Output window.

sqlserver/5-load-5-get-load-id.sql

declare
  @v_load_id int
  begin
   exec semarchy_repository.dbo.get_new_loadid
    @return_value_argument=@v_load_id OUTPUT,
    @p_data_location_name = 'CustomerB2CDemo', /* Data Location Name (as in the UI) */
    @p_program_name = 'manual_etl_script',                /*  Informational. Identifies the Middleware */
    @p_load_description = N'load_data_update',/* Informational. Describes the load. */
                                                    /* Note the N to allow accented characters */
    @p_username = 'semadmin'                /* User initializing the load. */
    print @v_load_id
end

In the next step, you will add this Load ID to the insert statement and run it to load data.

Step 2: Insert data

In this step, you will load the source data into the SA_PERSON_PRODUCT table.

  1. Connect as SEMARCHY_CUSTOMER_B2C_MDM.
  2. Retrieve the insert statement that corresponds to your database.

postgresql/5-load-6-insert-sa-person-product.sql

insert into semarchy_customer_b2c_mdm.sa_person_product (
     b_loadid      /* Semarchy system fields */
    ,b_classname   /* Semarchy system fields */
    ,id
    ,purchase_date
    ,registration_date
    ,f_product
    ,fp_person
    ,fs_person
)
select
     /* load_id */ as b_loadid    /* Insert the load ID here that was returned from Step 1 */
    ,stg.b_classname /* Entity name. Found in MDM Workbench UI. */
    ,stg.id
    ,stg.purchase_date
    ,stg.registration_date
    ,stg.f_product
    ,stg.fp_person
    ,stg.fs_person
 from semarchy_stg.source_person_product stg
 where not exists (
   select 1
   from semarchy_customer_b2c_mdm.gd_person_product gd
   where /* match on IDs */
         gd.id = stg.id
     /* duplicate data detection */
     and ( gd.purchase_date = stg.purchase_date or (gd.purchase_date is null and stg.purchase_date is null) )
     and ( gd.registration_date = stg.registration_date or (gd.registration_date is null and stg.registration_date is null) )
     and ( gd.f_product = stg.f_product or (gd.f_product is null and stg.f_product is null) )
     and ( gd.fp_person = stg.fp_person or (gd.fp_person is null and stg.fp_person is null) )
     and ( gd.fs_person = stg.fs_person or (gd.fs_person is null and stg.fs_person is null) )
 );
commit;

oracle/5-load-6-insert-sa-person-product.sql

INSERT into SEMARCHY_CUSTOMER_B2C_MDM.SA_PERSON_PRODUCT (
     B_LOADID      /* Semarchy system fields */
    ,B_CLASSNAME   /* Semarchy system fields */
    ,ID
    ,PURCHASE_DATE
    ,REGISTRATION_DATE
    ,F_PRODUCT
    ,FP_PERSON
    ,FS_PERSON
)
SELECT
     /* load_id */ as B_LOADID    /* Insert the load ID here that was returned from Step 4 */
    ,stg.B_CLASSNAME /* Entity name. Found in MDM Workbench UI. */
    ,stg.ID
    ,stg.PURCHASE_DATE
    ,stg.REGISTRATION_DATE
    ,stg.F_PRODUCT
    ,stg.FP_PERSON
    ,stg.FS_PERSON
 FROM SEMARCHY_STG.SOURCE_PERSON_PRODUCT stg
 WHERE NOT EXISTS (
   SELECT 1
   FROM SEMARCHY_CUSTOMER_B2C_MDM.GD_PERSON_PRODUCT gd
   WHERE 1=1
     /* match on IDs */
     AND gd.ID = stg.ID
     /* duplicate data detection */
     and ( gd.PURCHASE_DATE = stg.PURCHASE_DATE or (gd.PURCHASE_DATE is null and stg.PURCHASE_DATE is null) )
     and ( gd.REGISTRATION_DATE = stg.REGISTRATION_DATE or (gd.REGISTRATION_DATE is null and stg.REGISTRATION_DATE is null) )
     and ( gd.F_PRODUCT = stg.F_PRODUCT or (gd.F_PRODUCT is null and stg.F_PRODUCT is null) )
     and ( gd.FP_PERSON = stg.FP_PERSON or (gd.FP_PERSON is null and stg.FP_PERSON is null) )
     and ( gd.FS_PERSON = stg.FS_PERSON or (gd.FS_PERSON is null and stg.FS_PERSON is null) )
 );
 COMMIT;

sqlserver/5-load-6-insert-sa-person-product.sql

insert into semarchy_customer_b2c_mdm.dbo.sa_person_product (
     b_loadid      /* Semarchy system fields */
    ,b_classname   /* Semarchy system fields */
    ,id
    ,purchase_date
    ,registration_date
    ,f_product
    ,fp_person
    ,fs_person
)
select
     /* load_id */ as b_loadid    /* Insert the load ID here that was returned from Step 1 */
    ,stg.b_classname /* Entity name. Found in Application Builder. */
    ,stg.id
    ,stg.purchase_date
    ,stg.registration_date
    ,stg.f_product
    ,stg.fp_person
    ,stg.fs_person
 from semarchy_stg.dbo.source_person_product stg
 where not exists (
   select 1
   from semarchy_customer_b2c_mdm.dbo.gd_person_product gd
   where /* match on IDs */
         gd.id = stg.id
     /* duplicate data detection */
     and ( gd.purchase_date = stg.purchase_date or (gd.purchase_date is null and stg.purchase_date is null) )
     and ( gd.registration_date = stg.registration_date or (gd.registration_date is null and stg.registration_date is null) )
     and ( gd.f_product = stg.f_product or (gd.f_product is null and stg.f_product is null) )
     and ( gd.fp_person = stg.fp_person or (gd.fp_person is null and stg.fp_person is null) )
     and ( gd.fs_person = stg.fs_person or (gd.fs_person is null and stg.fs_person is null) )
 );
  1. Observe the content of this statement and note it includes delta detection: it will insert into the SA_PERSON_PRODUCT table only records from the SOURCE_PERSON_PRODUCT staging table which have the same ID but different data.
  2. Find the line /* load_id */ as B_LOADID in the insert statement and substitute the /* load_id */ for the actual Load ID retrieved during Step 1: Get Load ID (in this tutorial example, it is 36).
  3. Start by executing only the SELECT statement in your SQL editor, without the surrounding INSERT statement. Note that 21 records are returned:

  1. Paste and execute the entire INSERT statement in your SQL editor. Remember to commit the inserts if auto-commit is not activated.

  1. Execute the following query and check that 21 records were inserted into the SA_PERSON_PRODUCT table:
select count(*) from semarchy_stg.source_person_product
select count(*) from semarchy_stg.dbo.source_person_product

Step 3: Submit data

  1. Connect as SEMARCHY_REPOSITORY.
  2. Retrieve the insert statement that corresponds to your database.

postgresql/5-load-7-submit-load.sql

do
$$

declare
  v_batch_id int;
begin
  v_batch_id := semarchy_repository.submit_load (
             /* load_id */      /* Load ID returned by get_new_loadid from Step 1 */
            ,'INTEGRATE_ALL'    /* Must exactly match a defined integration job. Found in Application Builder. */
            ,'manual_etl_user'  /* Must exactly match the value used in get_new_loadid in Step 1. */
  );
  raise notice '%', v_batch_id;
end;
$$;

oracle/5-load-7-submit-load.sql

DECLARE
  v_batch_id int;
BEGIN
  v_batch_id := SEMARCHY_REPOSITORY.INTEGRATION_LOAD.SUBMIT_LOAD (
             /* load_id */                /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID from Step 4 */
            ,'INTEGRATE_ALL'    /* Must exactly match a defined integration job. Found in Application Builder. */
            ,'MANUAL_ETL_USER'  /* Must exactly match the value used in MTA_GET_NEW_LOADID in Step 4. */
  );
  dbms_output.put_line(v_batch_id);
END;
/

sqlserver/5-load-7-submit-load.sql

declare
  @v_batch_id int
  begin
   exec semarchy_repository.dbo.submit_load
    @return_value_argument=@v_batch_id OUTPUT,
    @p_load_id = /* load_id */, /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID from Step 4 */
        @p_integration_job_name = 'INTEGRATE_ALL',/* Name of the Integration Job to trigger. */
        @p_username = 'semadmin' /* User who has initialized the load. */
        print @v_batch_id 
end
  1. Find in the statement the line /* load_id */ /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID from Step 4 */ and substitute the /* load_id */ for the actual Load ID (in this tutorial example, it is 36).
  2. Run the amended statement to submit the load and kick off the integration job.
  3. Record the Batch ID that the function returns. In this tutorial example, the batch ID is 33.

It's common for batch IDs to differ from Load IDs especially when you initialize several loads and do not submit them.

Well done! You learned how to complete a load and launch an integration job in xDM. In the next part, you will learn to check the data location logs to see the progress of load jobs and integration job.

Check the data location logs

Data location logs allow you to know the status of data loads.

To view the data location logs:

  1. Go to your application. On the welcome page, click the Application Builder. Go to the Management tab. Open the Data Locations > CustomerB2CDemo > Latest Loads node.
  2. Notice that the latest load 36 is marked as Finished:

  1. Once you submit the load, you can use the Batch ID to trace the integration job. Back in the Management tab, open the Data Locations > CustomerB2CDemo > Latest Integration Batches node.

  1. Drill down the batch to see more details about each step.

Browse the new customers' products data

Once the integration job batch is complete, let's review the updated customers' products data in the Customer B2C application:

  1. Navigate back to the Customers view in the Customer B2C application.
  2. Click the menu button again.
  3. Select Refresh.

  1. You will see there are no more customer records meeting this filter.
PersonType = 'CUSTOMER' AND NOT ANY PersonProducts HAVE ( ID is not null )

The reason is that, following your external load, there are no more customers missing their associated product data.

  1. Turn off the filter by clicking RESET on the filter.

  1. Look for the person record with the name Carroll Spark. Click on the PRODUCTS tab.

  1. If you see products under Carroll Spark and the other customers we identified as missing products, then you successfully loaded data.

Cancel a running load job

You may need to cancel a load in different situations:

This part will guide you through how to cancel an external load.

  1. Connect to the repository user, SEMARCHY_REPOSITORY.
  2. Initialize a new load (follow again the procedure described in Step 1: Get Load ID in the previous part) and retrieve the Load ID (in this tutorial example, it is 37)
  3. Take the CANCEL_LOAD function and substitute /* load_id */ with the load ID (in this tutorial example, it is 37).

postgresql/5-load-8-cancel-load.sql

begin
  perform semarchy_repository.cancel_load (
       /* load_id */          /* Load ID returned by get_new_loadid */
      ,'manual_etl_user'      /* Must exactly match the value used in get_new_loadid */
  );
end;
$$;

oracle/5-load-8-cancel-load.sql

BEGIN
  SEMARCHY_REPOSITORY.INTEGRATION_LOAD.CANCEL_LOAD (
       /* load_id */          /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
      ,'MANUAL_ETL_USER'      /* Must exactly match the value used in GET_NEW_LOADID */
  );
END;
/

sqlserver/5-load-8-cancel-load.sql

begin
exec semarchy_repository.dbo.cancel_load
        @P_LOAD_ID = /* load_id */   /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
        ,@P_USERNAME = 'semadmin'     /* User who has initialized the load. */
end
  1. Execute the function CANCEL_LOAD (with the correct load ID) in your SQL editor.
  2. Check that you successfully canceled the load in the Data Location logs.

Congratulations

You have successfully loaded data using an external load, and have also learned how to cancel a load.

In the next step, you will configure a Continuous Load that will automatically load data into Semarchy xDM.

You are now going to create a continuous load that will automatically detect and process source data loaded into xDM via the SQL API.

In the Management tree view, right-click on the CustomerB2CDemo data location and select New Continuous Load.

  1. Enter the following values:

  1. Click Finish.

In the next step, you will use this Continuous Load to update customer data.

You are now going to update data with the Continuous Load you have set up in the previous step and see the benefits of Continuous Loads.

What you'll learn

Review the original customer data

Before loading updated customer data, let's review the existing customer data in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Look for the person record with the name Abe Corbett. You may need to sort the First Name column from A-Z. Note this customer's first and last name.

    You should see "Abe Corbett".
  3. Look for the person record with the name Carroll Spark. Note this record's Person Type.

    You should see Carroll's Person Type is "Prospect".

Next, you will set up your staging environment to load customer updates. Ensure you have access to the database and a SQL client.

Set up the staging data to load

Let's set up the staging table with the original customer data and customer updates, as a simulation of a live data feed.

Like for the external load, you will create the staging table SEMARCHY_STG schema. This schema is created by default as part of the Install on-premises, Install on AWS and Install on Azure quick-install tutorials.

  1. Using your SQL client, connect to the SEMARCHY_STG schema with the SEMARCHY_STG user (password: same as username by default).
  2. Run the create table statement to set up a landing table, SOURCE_PERSON in the staging schema SEMARCHY_STG.

postgresql/7-update-1-create-source-person-staging-table.sql

create table semarchy_stg.source_person (
     b_classname varchar(128)
    ,b_pubid varchar(128)
    ,b_sourceid varchar(128)
    ,first_name varchar(128)
    ,last_name varchar(128)
    ,date_of_birth date
    ,addstreet varchar(128)
    ,addcity varchar(128)
    ,addstate varchar(128)
    ,addpostal_code varchar(128)
    ,addcountry varchar(128)
    ,source_email varchar(128)
    ,source_phone varchar(128)
    ,person_type varchar(128)
    ,value_status varchar(128)
)
;

oracle/7-update-1-create-source-person-staging-table.sql

CREATE TABLE SEMARCHY_STG.SOURCE_PERSON (
     B_CLASSNAME varchar(128)
    ,B_PUBID varchar(128)
    ,B_SOURCEID varchar(128)
    ,FIRST_NAME varchar(128)
    ,LAST_NAME varchar(128)
    ,DATE_OF_BIRTH date
    ,ADDSTREET varchar(128)
    ,ADDCITY varchar(128)
    ,ADDSTATE varchar(128)
    ,ADDPOSTAL_CODE varchar(128)
    ,ADDCOUNTRY varchar(128)
    ,SOURCE_EMAIL varchar(128)
    ,SOURCE_PHONE varchar(128)
    ,PERSON_TYPE varchar(128)
    ,VALUE_STATUS varchar(128)
)
;

sqlserver/7-update-1-create-source-person-staging-table.sql

create table semarchy_stg.dbo.source_person (
     b_classname varchar(128)
    ,b_pubid varchar(128)
    ,b_sourceid varchar(128)
    ,first_name varchar(128)
    ,last_name varchar(128)
    ,date_of_birth date
    ,addstreet varchar(128)
    ,addcity varchar(128)
    ,addstate varchar(128)
    ,addpostal_code varchar(128)
    ,addcountry varchar(128)
    ,source_email varchar(128)
    ,source_phone varchar(128)
    ,person_type varchar(128)
    ,value_status varchar(128)
)
;
  1. Retrieve the insert statements to load records into the landing table SOURCE_PERSON you just created: click the link below or retrieve them from the previously downloaded Semarchy xDM tutorial resources.

postgresql/7-update-2-insert-source-person-staging-table.sql

oracle/7-update-2-insert-source-person-staging-table.sql

sqlserver/7-update-2-insert-source-person-staging-table.sql

  1. Paste and run the inserts statements in your SQL editor. Be sure to commit.
  2. Run the following query to check that you inserted 255 records into the SOURCE_PERSON table:
select count(*)
from source_person
select count(*)
from dbo.source_person

In the next step, you will grant the Customer B2C data location schema permission to select the records from the staging SOURCE_PERSON table.

Grant permission to load into xDM

Because your staging data is in the SEMARCHY_STG schema and you intend to load data into the SEMARCHY_CUSTOMER_B2C_MDM xDM data location schema, you must grant SELECT privileges to SEMARCHY_CUSTOMER_B2C_MDM.

  1. Using your SQL client, connect to your database as the admin user:
  1. Run the following grant to allow the SEMARCHY_CUSTOMER_B2C_MDM schema to select from the SOURCE_PERSON table in the SEMARCHY_STG schema.

postgresql/7-update-3-grant-select-source-person.sql

grant select on semarchy_stg.source_person to semarchy_customer_b2c_mdm;

oracle/7-update-3-grant-select-source-person.sql

grant select on semarchy_stg.source_person to semarchy_customer_b2c_mdm;

sqlserver/7-update-3-grant-select-source-person.sql

use semarchy_stg;
grant select on dbo.source_person to semarchy_customer_b2c_mdm;

In the next step, you will insert the data from the SOURCE_PERSON staging table into the SD_PERSON xDM table.

Load only new customer data using delta detection

Now that you have prepared the staging environment, you will load updated customer data into xDM.

Your staging table, SOURCE_PERSON, contains all the original records as well as the updates. This is a common scenario in the real world where the source system captures all records. Therefore, it's important to learn how to detect changes to only load the updated records.

  1. Connect to the SEMARCHY_CUSTOMER_B2C_MDM schema.
  2. Copy and paste the insert statements to load records into the SD_PERSON table.

postgresql/7-update-4-insert-sd-person.sql

insert into semarchy_customer_b2c_mdm.sd_person (
     b_loadid      /* Semarchy system fields */
    ,b_classname   /* Semarchy system fields */
    ,b_pubid       /* Semarchy system fields */
    ,b_sourceid
    ,first_name
    ,last_name
    ,date_of_birth
    ,addstreet
    ,addcity
    ,addstate
    ,addpostal_code
    ,addcountry
    ,source_email
    ,source_phone
    ,person_type
    ,value_status
)
select
     semarchy_repository.get_continuous_loadid('CONTINUOUS_LOAD_INTEGRATE_ALL') as b_loadid /* Call to get Load ID */
    ,stg.b_classname /* Entity name. Found in Application Builder. */
    ,stg.b_pubid
    ,stg.b_sourceid
    ,stg.first_name
    ,stg.last_name
    ,stg.date_of_birth
    ,stg.addstreet
    ,stg.addcity
    ,stg.addstate
    ,stg.addpostal_code
    ,stg.addcountry
    ,stg.source_email
    ,stg.source_phone
    ,stg.person_type
    ,stg.value_status
from semarchy_stg.source_person stg
where not exists (
    select 1
    from semarchy_customer_b2c_mdm.mi_person mi
    where mi.b_sourceid = stg.b_sourceid
        and mi.b_pubid = stg.b_pubid
        /* duplicate data detection */
        and ( mi.first_name = stg.first_name or (mi.first_name is null and stg.first_name is null) )
        and ( mi.last_name = stg.last_name or (mi.last_name is null and stg.last_name is null) )
        and ( mi.date_of_birth = stg.date_of_birth or (mi.date_of_birth is null and stg.date_of_birth is null) )
        and ( mi.addstreet = stg.addstreet or (mi.addstreet is null and stg.addstreet is null) )
        and ( mi.addcity = stg.addcity or (mi.addcity is null and stg.addcity is null) )
        and ( mi.addstate = stg.addstate or (mi.addstate is null and stg.addstate is null) )
        and ( mi.addcountry = stg.addcountry or (mi.addcountry is null and stg.addcountry is null) )
        and ( mi.source_email = stg.source_email or (mi.source_email is null and stg.source_email is null) )
        and ( mi.source_phone = stg.source_phone or (mi.source_phone is null and stg.source_phone is null) )
        and ( mi.person_type = stg.person_type or (mi.person_type is null and stg.person_type is null) )
        and ( mi.value_status = stg.value_status or (mi.value_status is null and stg.value_status is null) )
 );
commit;

oracle/7-update-4-insert-sd-person.sql

INSERT into SEMARCHY_CUSTOMER_B2C_MDM.SD_PERSON (
     B_LOADID      /* Semarchy system fields */
    ,B_CLASSNAME   /* Semarchy system fields */
    ,B_PUBID       /* Semarchy system fields */
    ,B_SOURCEID
    ,FIRST_NAME
    ,LAST_NAME
    ,DATE_OF_BIRTH
    ,ADDSTREET
    ,ADDCITY
    ,ADDSTATE
    ,ADDPOSTAL_CODE
    ,ADDCOUNTRY
    ,SOURCE_EMAIL
    ,SOURCE_PHONE
    ,PERSON_TYPE
    ,VALUE_STATUS
)  
SELECT    SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_CONTINUOUS_LOADID('CONTINUOUS_LOAD_INTEGRATE_ALL') as B_LOADID /* Call to get Load ID */
    ,stg.B_CLASSNAME /* Entity name. Found in Application Builder. */
    ,stg.B_PUBID
    ,stg.B_SOURCEID
    ,stg.FIRST_NAME
    ,stg.LAST_NAME
    ,stg.DATE_OF_BIRTH
    ,stg.ADDSTREET
    ,stg.ADDCITY
    ,stg.ADDSTATE
    ,stg.ADDPOSTAL_CODE
    ,stg.ADDCOUNTRY
    ,stg.SOURCE_EMAIL
    ,stg.SOURCE_PHONE
    ,stg.PERSON_TYPE
    ,stg.VALUE_STATUS
FROM SEMARCHY_STG.SOURCE_PERSON stg
WHERE NOT EXISTS (
  select 1
  from SEMARCHY_CUSTOMER_B2C_MDM.MI_PERSON mi
  where mi.B_SOURCEID = stg.B_SOURCEID
    and mi.B_PUBID = stg.B_PUBID
    /* duplicate data detection */
    and ( mi.FIRST_NAME = stg.FIRST_NAME or (mi.FIRST_NAME is null and stg.FIRST_NAME is null) )
    and ( mi.LAST_NAME = stg.LAST_NAME or (mi.LAST_NAME is null and stg.LAST_NAME is null) )
    and ( mi.DATE_OF_BIRTH = stg.DATE_OF_BIRTH or (mi.DATE_OF_BIRTH is null and stg.DATE_OF_BIRTH is null) )
    and ( mi.ADDSTREET = stg.ADDSTREET or (mi.ADDSTREET is null and stg.ADDSTREET is null) )
    and ( mi.ADDCITY = stg.ADDCITY or (mi.ADDCITY is null and stg.ADDCITY is null) )
    and ( mi.ADDSTATE = stg.ADDSTATE or (mi.ADDSTATE is null and stg.ADDSTATE is null) )
    and ( mi.ADDCOUNTRY = stg.ADDCOUNTRY or (mi.ADDCOUNTRY is null and stg.ADDCOUNTRY is null) )
    and ( mi.SOURCE_EMAIL = stg.SOURCE_EMAIL or (mi.SOURCE_EMAIL is null and stg.SOURCE_EMAIL is null) )
    and ( mi.SOURCE_PHONE = stg.SOURCE_PHONE or (mi.SOURCE_PHONE is null and stg.SOURCE_PHONE is null) )
    and ( mi.PERSON_TYPE = stg.PERSON_TYPE or (mi.PERSON_TYPE is null and stg.PERSON_TYPE is null) )
    and ( mi.VALUE_STATUS = stg.VALUE_STATUS or (mi.VALUE_STATUS is null and stg.VALUE_STATUS is null) )
);
COMMIT;

sqlserver/2-update-4-insert-sd-person.sql

declare
        @v_load_id int
begin
        /* The continuous load id has to be retrieved out of the select statement
     * Its value is stored in the @v_load_id
     */
    exec semarchy_repository.dbo.get_continuous_loadid
                @RETURN_VALUE_ARGUMENT=@v_load_id OUTPUT, 
                @P_CONTINUOUS_LOAD_NAME='CONTINUOUS_LOAD_INTEGRATE_ALL'

    use semarchy_customer_b2c_mdm
    insert into semarchy_customer_b2c_mdm.dbo.sd_person (
        b_loadid      /* Semarchy system fields */
        ,b_classname   /* Semarchy system fields */
        ,b_pubid       /* Semarchy system fields */
        ,b_sourceid
        ,first_name
        ,last_name
        ,date_of_birth
        ,addstreet
        ,addcity
        ,addstate
        ,addpostal_code
        ,addcountry
        ,source_email
        ,source_phone
        ,person_type
        ,value_status
    )
    select
        @v_load_id
        ,stg.b_classname /* Entity name. Found in MDM Workbench UI. */
        ,stg.b_pubid
        ,stg.b_sourceid
        ,stg.first_name
        ,stg.last_name
        ,stg.date_of_birth
        ,stg.addstreet
        ,stg.addcity
        ,stg.addstate
        ,stg.addpostal_code
        ,stg.addcountry
        ,stg.source_email
        ,stg.source_phone
        ,stg.person_type
        ,stg.value_status
    from semarchy_stg.dbo.source_person stg
    where not exists (
        select 1
        from semarchy_customer_b2c_mdm.dbo.mi_person mi
        where mi.b_sourceid = stg.b_sourceid
            and mi.b_pubid = stg.b_pubid
            /* duplicate data detection */
            and ( mi.first_name = stg.first_name or (mi.first_name is null and stg.first_name is null) )
            and ( mi.last_name = stg.last_name or (mi.last_name is null and stg.last_name is null) )
            and ( mi.date_of_birth = stg.date_of_birth or (mi.date_of_birth is null and stg.date_of_birth is null) )
            and ( mi.addstreet = stg.addstreet or (mi.addstreet is null and stg.addstreet is null) )
            and ( mi.addcity = stg.addcity or (mi.addcity is null and stg.addcity is null) )
            and ( mi.addstate = stg.addstate or (mi.addstate is null and stg.addstate is null) )
            and ( mi.addcountry = stg.addcountry or (mi.addcountry is null and stg.addcountry is null) )
            and ( mi.source_email = stg.source_email or (mi.source_email is null and stg.source_email is null) )
            and ( mi.source_phone = stg.source_phone or (mi.source_phone is null and stg.source_phone is null) )
            and ( mi.person_type = stg.person_type or (mi.person_type is null and stg.person_type is null) )
            and ( mi.value_status = stg.value_status or (mi.value_status is null and stg.value_status is null) )
    )
end
  1. Observe the following points on the SQL insert statement:

On PostgreSQL:

semarchy_repository.get_continuous_loadid('CONTINUOUS_LOAD_INTEGRATE_ALL') 

On Oracle:

SEMARCHY_REPOSITORY.INTEGRATION_LOAD.GET_CONTINUOUS_LOADID('CONTINUOUS_LOAD_INTEGRATE_ALL')

On SQL Server:

exec semarchy_repository.dbo.get_continuous_loadid
        @RETURN_VALUE_ARGUMENT=@v_load_id OUTPUT, 
        @P_CONTINUOUS_LOAD_NAME='CONTINUOUS_LOAD_INTEGRATE_ALL'
  1. Select and run the SELECT portion of the statement and check that 72 lines are returned. Those corresponds to the delta between the staging table and the target table SD_PERSON:

  1. Run the full insert statement. Be sure to commit.

  1. Go back to the Management perspective of the Application builder and retrieve the latest load under Data Locations>CustomerB2CDemo>Latest Loads. This is the load that was automatically triggered by the Continuous Load. Note the value of the generated Load ID (in the example below: 15)

  1. Find the line /* load_id */ as B_LOADID in the insert statement and substitute the /* load_id */ for the actual Load ID retrieved during Step 1: Get Load ID (in this tutorial example, it is 36).
  2. Substitute the /* load_id */ for the actual Load ID retrieved during the previous step (15 in our example) in the query below:
select count(*) 
from sd_person
where b_loadid= /* load_id */ 
select count(*) 
from dbo.sd_person
where b_loadid= /* load_id */ 
  1. Execute this query and check that it returns 72 records. This is the number of records inserted or updated during the latest load.

After loading the customer updates, switch to the Customer B2C application to check the updates in xDM.

Browse the updated customer data

After loading the updated customer data, let's review the changes to customer data in the Customer B2C application:

  1. Navigate to the Customers view under Browse Data.
  2. Click the menu button.
  3. Select Refresh.
  4. Look for the person record with the first name "Abe". Note this customer's last name.

    You should see "Abe Corbett-Lop├ęz". The customer's last name was updated.
  5. Look for the person record with the name "Carroll Spark". Note this record's Person Type.

    You should see Carroll's Person Type is now "Customer". The customer was converted from a prospect to a customer and the record is updated accordingly.

Congratulations

You have successfully loaded data updates using SQL with a continuous load. Let's summarize:

In the next step, you will learn how to truncate data from a data location.

Truncating all data from tables can be very useful during development time. This step explains how to generate and execute the statements to do that.

Truncate data on PostgreSQL

  1. Connect to the semarchy_customer_b2c_mdm schema using your SQL client.
  2. Open a SQL editor in your SQL client.
  3. Run the following SQL statement. This statement will generate all the SQL truncate statements you need to run.

postgresql/8-truncate-1-truncate-data.sql

select 'truncate table SEMARCHY_CUSTOMER_B2C_MDM.' || tablename || ';'
from pg_catalog.PG_TABLES
where schemaname = 'SEMARCHY_CUSTOMER_B2C_MDM' /* set this to your data location schema */
  and tablename not like 'dl_%'                  /* do not truncate these system tables   */
  and tablename not like 'ext_%'                 /* do not truncate these system tables   */
  and tablename like '%'                         /* add filters as needed for entities    */
order by substr(tablename,3), tablename;
  1. Copy the generated statements to the clipboard

  1. Paste the generated statements in your SQL editor window and execute them.

  1. Refresh the data view and notice that the data has disappeared.

Truncate data on Oracle

  1. Connect to semarchy_customer_b2c_mdm.
  2. Run the truncate generation statement. This statement will generate all the SQL truncate statements you need to run.

oracle/8-truncate-1-truncate-data.sql

select 'truncate table SEMARCHY_CUSTOMER_B2C_MDM.' || table_name || ';'
from all_tables
where owner='SEMARCHY_CUSTOMER_B2C_MDM'  /* set this to your data location schema */
  and table_name not like 'dl_%'     /* do not truncate these system tables   */
  and table_name not like 'ext_%'    /* do not truncate these system tables   */
  and table_name like '%'            /* add filters as needed for entities    */
order by substr(table_name,3), table_name;
  1. Copy these generated statements to the clipboard.

  1. Paste and run the truncate statements.

  1. Refresh the data view and notice that the data has disappeared.

You have successfully truncated data.

Truncate data on SQL Server

  1. Connect to semarchy_customer_b2c_mdm.
  2. Run the truncate generation statement. This statement will generate all the SQL truncate statements you need to run.

sqlserver/8-truncate-1-truncate-data.sql

SELECT 'truncate table ' + TABLE_NAME +';' + char(10) + 'go'
FROM SEMARCHY_PRODUCT_RETAIL_MDM.INFORMATION_SCHEMA.TABLES   /* set this to your data location schema */
WHERE TABLE_NAME NOT LIKE 'DL_%'        /* do not truncate these system tables   */
AND   TABLE_NAME NOT LIKE 'EXT_%'       /* do not truncate these system tables   */
AND   TABLE_NAME LIKE '%'               /* add filters as needed for entities    */
ORDER BY SUBSTRING(TABLE_NAME, 1, 3), TABLE_NAME;
  1. Copy these generated statements to the clipboard.

  1. Paste and run the truncate statements.

  1. Refresh the data view and notice that the data has disappeared.

Well done! You have completed the SQL-based Integration tutorials for Semarchy xDM.

In this tutorial, you learned:

What's next?

You will find more Integration information, including how to use the REST API in the Integration tutorial series.

There are also tutorials about how to design applications in the Data Authoring and Data Consolidation tutorial tracks.

Go back to the main menu to see more tutorials.

GO TO TUTORIALS