In this tutorial, you will learn how to use some advanced mapping features available in Semarchy xDI.

What you will learn

Before you start

In this tutorial, we assume that you have already installed Semarchy xDI Designer and followed the Create Your First Data Flow With Semarchy xDI tutorial.

If you have not installed the Designer yet, follow the Install Semarchy xDI Designer tutorial from the main tutorials page. Additional setup information can also be found in the installation guide.

GO TO TUTORIALS

Enjoy this tutorial!

You must first create the metadata that you will use in this unit.

Reverse engineer the Hotel source database

In this tutorial, multiple mappings use a Hypersonic SQL database as a source.

Follow the same steps as in the Create Your First Data Flow With Semarchy xDI tutorial to create that metadata and run the reverse engineering, using the following parameters:

Property

Value

Location

Tutorial – Fundamentals

Technology

Hypersonic SQL

Metadata Model name

Hotel

URL

jdbc:hsqldb:hsql://localhost:62210

User

sa

Password

Schema

HOTEL_MANAGEMENT

Use default values for all the other properties.

Once you have created the Hotel metadata, review the content of the reverse engineered tables: in the Project Explorer, expand Hotel > Hotel > HOTEL_MANAGEMENT, right-click each datastore under the newly created Hotel Metadata and select Actions > Consult data.

Reverse engineer the other delimited files

In the Create Your First Data Flow With Semarchy xDI tutorial, you created the Reference Files metadata corresponding to a folder that contains files used in the tutorials. You also reverse engineered the DiscountRanges.txt file. You will now add another delimited file from the same folder to this metadata.

Add and reverse engineer the Time.csv file

Declare and reverse engineer the Time.csv file from the metadata editor:

  1. In the Project Explorer, double-click the Reference Files metadata.
  2. In the metadata editor, expand the Server node, right-click the Reference Files folder, and then select Actions > Launch File wizard.

  1. On the Properties page of the File Wizard, enter the following properties:

Property

Value

Name

Time

Physical Name

Time.csv

Header Line Position

0

  1. Click Refresh to check the content of the file, and then click Next
  2. On the Reverse page of the wizard, click Reverse to launch the reverse engineering process. The following content appears in the results table:

Name

Position

Size

Type

F1

1

66

String

  1. Click the F1 column and rename it to DAY_DATE.

  1. Click Finish to close the wizard.
  2. Click the icon in the Designer toolbar or press CTRL+S to save the metadata.
  3. Right-click the Time datastore in the metadata editor and select Actions > Consult Data to view the file's content.

Add and reverse engineer the REF_US_STATES.csv file

Follow the same steps to define a delimited file with the following properties:

Property

Value

Name

Physical Name

US_States

REF_US_STATES.csv

Field Separator

,

Header Line Position

1

Reverse engineer this file and review the resulting columns:

Name

Position

Type

STATE_UPPER_CASE

1

String

STATE

2

String

STATE_CODE

3

String

Reverse engineer a positional file

A positional (or fixed) file is a file in which columns are defined by their position and size instead of a text character.

The ref_us_cities.txt file found in the same folder is a positional file. You will now add it to the Reference Files metadata.

  1. Launch the File Wizard as previously.
  2. Click Browse and select the ref_us_cities.txt file.
  3. In the Name field, enter US_Cities.
  4. Modify the Type and choose POSITIONAL.
  5. Check the content of the file by clicking Refresh.
  6. Click Next to go to the Reverse page.

Contiguous columns in a positional file cannot be detected automatically by reverse engineering them. You must define them manually.

To define the three columns for the ref_us_cities.txt file, perform the following steps:

  1. Click to declare a new column.
  2. Set the Name, Position, Size, and Type properties with the values provided in the following table.
  3. Do these steps until the three columns have been added.

Name

Position

Size

Type

ZIP_CODE

1

5

String

CITY

6

72

String

STATE_CODE

78

10

String

Once done, click Refresh to preview the file and check the layout is correctly defined.

Congratulations

You have successfully created all the metadata you will need for the rest of this tutorial.

In the next section, you will learn how to use a filter in a mapping.

Filters in Semarchy xDI allow you to refine data from a source datastore before passing it to a target. In this step, you will create a new mapping with a filter.

Create the Load DIM_PAYMENT_TYPE mapping

Create a new mapping under the mappings folder:

  1. In the Project Explorer view, go to the Tutorial – Fundamentals project and right-click the Mappings folder.
  2. Choose New > Mapping...
  3. Set the File name to Load DIM_PAYMENT_TYPE then click Finish.

Add elements to the mapping:

  1. Add the source datastore by dragging and dropping the T_PAYMENT_TYPE datastore from the Hotel metadata into the mapping diagram.
  2. Add the target datastore by dragging and dropping the DIM_PAYMENT_TYPE datastore from the Datamart metadata into the mapping diagram.
  3. On the mapping diagram, drag and drop the PMT_CODE column from the T_PAYMENT_TYPE datastore onto DIM_PAYMENT_TYPE and select Map - all matching fields.

  1. Map - all matching fields will automatically create mapping expressions for all source and target columns with identical names.

Add a filter

In this section, you will filter out obsolete data coming from the T_PAYMENT_TYPE source table by only keeping records which have their PMT_ACTIVE column set to 1.

To create a Filter on the source datastore:

  1. Drag and drop the PMT_ACTIVE column to the empty space on the mapping diagram.

  1. Select Create a Filter in the dialog.

  1. Click on the created filter in the Mapping Diagram, and in the Expression Editor, review the generated filter expression: T_PAYMENT_TYPE.PMT_ACTIVE=1

  1. Click the icon in the Designer toolbar or press CTRL+S to save the mapping.

Run the mapping

Your mapping is now complete. To run it:

  1. Make sure you have saved the mapping.
  2. Right-click an empty area in the mapping editor and click Run.
  3. Go to the Statistics view and review the results:

Name

Value

SUM(SQL_NB_ROWS)

12

SUM(SQL_STAT_INSERT)

4

SUM(SQL_STAT_UPDATE)

0

  1. In the mapping, right-click the target table and select Action > Consult Data. The four following lines are returned by the query:

PMT_CODE

PMT_NAME

PMT_DESCRIPTION

BT

Bank Transfer

CC

Credit Card

visa,mastercard

CH

Cash

PP

PayPal

only web payment

Explore the generated process

Semarchy xDI Designer allows you to browse the generated process and its code.

  1. Go back to the process generated by xDI and open its editor.

This process is composed of 3 steps:

  1. Double-click the L1_DIM_PAYMENT_TYPE-Integration step to open it further. It contains two sub-steps: L1_DIM_PAYMENT_TYPE-Prepare and L1_DIM_PAYMENT_TYPE-Integration.

  1. Select the Load DIM_PAYMENT_TYPE tab above the editor to navigate back to the parent process.
  2. Double-click the L1_DIM_PAYMENT_TYPE-Load step. It is composed of process Actions and Sub-Processes:

  1. Double-click the For Simple Table sub-process to open it, then select the Action named Select on source table (SRC).
  2. Review the code that was executed on the source database in the Generated tab of the Step Detail view. Notice the where clause contains the filter that you added to the mapping:

Congratulations! You have successfully used a filter in a mapping. In the next step, you will use complex expressions and learn about the importance of business rules' execution location.

In this step, you will create and optimize a mapping with complex expressions.

Create the Load DIM_BEDROOM mapping

  1. Create a new mapping with the following properties:

Property

Value

Parent folder

Mappings

Mapping name

Load DIM_BEDROOM

Target Datastore

DIM_BEDROOM

Source Datastore

T_BEDROOM

  1. Define the business rules: for each row in the following table, select the Target Column in the target datastore and enter its corresponding Transformation Expression in the Expression Editor.

Target column

Transformation expression

BDR_ID

T_BEDROOM.BDR_ID

BDR_NUMBER

T_BEDROOM.BDR_NUMBER

BDR_FLOOR

case

when lower(T_BEDROOM.BDR_FLOOR) = 'gf' then 0

when lower(T_BEDROOM.BDR_FLOOR) = '1st' then 1

when lower(T_BEDROOM.BDR_FLOOR) = '2nd' then 2

end

BDR_BATH

case

when T_BEDROOM.BDR_BATH = 'true' then 1

else 0

end

BDR_SHOWER

case

when T_BEDROOM.BDR_SHOWER = 'true' then 1

else 0

end

BDR_BAR

case

when T_BEDROOM.BDR_BAR = 'true' then 1

else 0

end

BDR_BED_COUNT

convert(T_BEDROOM.BDR_BED_COUNT,NUMERIC)

BDR_PHONE_NUMBER

T_BEDROOM.BDR_PHONE_NUMBER

BDR_TYPE

T_BEDROOM.BDR_TYPE

UPDATE_DATE

current_timestamp

  1. Run the mapping.
  2. Got to the Statistics view and review the results:

Name

Value

SUM(SQL_NB_ROWS)

60

SUM(SQL_STAT_INSERT)

20

SUM(SQL_STAT_UPDATE)

0

Modify the execution location of an expression

Run this mapping a second time. It will return the following statistics:

Name

Value

SUM(SQL_NB_ROWS)

80

SUM(SQL_STAT_INSERT)

00

SUM(SQL_STAT_UPDATE)

20

20 rows are marked as updated (SUM(SQL_STAT_UPDATE)) because the UPDATE_DATE column is set to the source database's current_timestamp value at every execution. By default, this expression runs on the source datastore, and the mapping always considers it a change when it compares source and target records before integrating the data.

To prevent UPDATE_DATE from being included in the source/target data comparison, you must execute the business rule for this column on the target datastore instead of the source datastore. When you do this, the current_timestamp expression is computed after the data is integrated into the target table, and the field is excluded from the comparison.

To change where the rule is executed:

  1. Select the UPDATE_DATE target column on the mapping.
  2. Click the Execution location icon to the left of the datastore.
  3. Click the Target icon.

  1. Run the mapping.
  2. Go to the Statistics view to check the results. Changes to the UPDATE_DATE field are not counted as updates anymore.

Name

Value

SUM(SQL_NB_ROWS)

60

SUM(SQL_STAT_INSERT)

0

SUM(SQL_STAT_UPDATE)

0

In the next step, you will learn how to remove duplicate rows in a mapping.

In this step, you will create and configure a mapping to deduplicate rows found in the source datastore in order to only work with unique data.

Create the Load DIM_TIME mapping

Create a new mapping with the following properties:

Property

Value

Parent folder

Mappings

Mapping name

Load DIM_TIME

Target Datastore

DIM_TIME

Source Datastore

Time

The transformation business rules are:

Target column

Business rule

Execution location

Execution context

Key

TIME_KEY_DAY

substr(Time.DAY_DATE, 7,4) + '/' + substr(Time.DAY_DATE, 4,2) + '/' + substr(Time.DAY_DATE, 1,2)

Staging Area

I/U

Functional key

TIME_DATE

convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP)

Staging Area

I/U

TIME_MONTH_DAY

convert(substr(Time.DAY_DATE, 1,2),NUMERIC)

Staging Area

I/U

TIME_WEEK_DAY

dayofweek(CONVERT(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))

Staging Area

I/U

TIME_DAY_NAME

dayname(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00', TIMESTAMP))

Staging Area

I/U

TIME_MONTH

convert(substr(Time.DAY_DATE,4,2),NUMERIC)

Staging Area

I/U

TIME_MONTH_NAME

monthname(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))

Staging Area

I/U

TIME_QUARTER

quarter(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))

Staging Area

I/U

TIME_YEAR

convert(substr(Time.DAY_DATE, 7,4),NUMERIC)

Staging Area

I/U

Enable deduplication

If you view the data in the Time file, you will notice that it contains 4 records per day (one record for every 6-hour time slot):

01/01/2010 00:00

01/01/2010 06:00

01/01/2010 12:00

01/01/2010 18:00

02/01/2010 00:00

02/01/2010 06:00

02/01/2010 12:00

02/01/2010 18:00

03/01/2010 00:00

03/01/2010 06:00

However, the DIM_TIME target table expects only one line per day. To filter out the duplicates from the dataflow, proceed as follows:

  1. Select the Integration Template on the DIM_TIME datastore.
  2. Open the Properties view.
  3. Click the Use distinct link to enable the checkbox.
  4. Check the Use distinct box.
  5. Run the mapping and review the results in the Statistics view. Observe that the number of inserted rows is inferior to the total number of processed rows thanks to the Use Distinct option.

Name

Value

SUM(SQL_NB_ROWS)

6 576

SUM(SQL_STAT_INSERT)

1 096

SUM(SQL_STAT_UPDATE)

0

In the next step, you will create another mapping and define the functional key for a datastore.

The functional key in a mapping identifies a record and is used to define whether an incoming record is an existing or new record for the target datastore. By default, xDI Designer uses the target datastore's primary key as the functional key.

In this step, you will change a target datastore's functional key, as the default functional key cannot be used.

Create the Load DIM_GEOGRAPHY mapping

First, create a new mapping:

  1. Create a new mapping with the following properties:

Property

Value

Parent folder

Mappings

Mapping name

Load DIM_GEOGRAPHY

Target Datastore

DIM_GEOGRAPHY

Source Datastores

US_States, US_Cities

  1. Define the transformation business rules and their respective execution location as described in the following table:

Target column

Transformation expression

Execution location

GEO_KEY_ID

next value for HOTEL_DATAMART.SEQ_GEO_KEY_ID

Target

GEO_ZIP_CODE

US_Cities.ZIP_CODE

Source

GEO_CITY

US_Cities.CITY

Source

GEO_STATE_CODE

US_Cities.STATE_CODE

Source

GEO_STATE

US_States.STATE_UPPER_CASE

Source

Define the functional key of the mapping

In the DIM_GEOGRAPHY table, the primary key GEO_KEY_ID is not mapped with the source datastore because it is populated with a sequence number. It cannot be used as the functional key to identify new or existing records.

To define another functional key for this datastore:

  1. In the mapping, select the GEO_KEY_ID column on the DIM_GEOGRAPHY datastore and then click the icon to disable this column as the functional key.
  2. Select the GEO_ZIP_CODE column and click the icon to define it as the functional key for this mapping.

The functional key is now GEO_ZIP_CODE, which means that:

In the next step, you will modify the Load DIM_GEOGRAPHY mapping to apply a specific business rule only to updates.

In the Load DIM_GEOGRAPHY mapping, the GEO_KEY_ID target column is populated by a database sequence that generates a new value every time it is invoked.

Now, imagine that a second US_Cities source file is loaded to update an existing record. If the business rule for GEO_KEY_ID is applied, a new identifier will be generated and a new record will be created instead of updating the existing one.

To avoid this issue, you must configure this business rule so that it is applied only for insertions and not for updates:

  1. Select the GEO_KEY_ID column.
  2. Click the icon next to the datastore's left border to disable the execution of the business rule for updates.

In the next step, you will modify the Load DIM_GEOGRAPHY mapping to perform an inner join between the two source datastores.

Semarchy xDI supports joins to combine data between several source datastores. In this step, you will use an inner join in a mapping.

Add an inner join to Load DIM_GEOGRAPHY

In this mapping, the two data sets US_States and US_Cities are currently disjoint, which means that xDI will combine each line of the first set with each line of the second set. To avoid performing this cartesian product, you must create a join between these two datastores:

  1. Drag and drop the STATE_CODE column from US_States to the STATE_CODE column in US_Cities.
  2. Select Join in the dialog.
  3. Review the generated join expression in the Expression Editor.

  1. Run the mapping and go to the Statistics view to check the results. Note that the number of inserted lines corresponds to the number of rows in US_Cities having a STATE_CODE that exists in REF_US_STATES.

Name

Value

SUM(SQL_NB_ROWS)

125 628

SUM(SQL_STAT_INSERT)

41 693

SUM(SQL_STAT_UPDATE)

0

  1. Run the mapping again.

Name

Value

SUM(SQL_NB_ROWS)

125 628

SUM(SQL_STAT_INSERT)

0

SUM(SQL_STAT_UPDATE)

0

  1. Check the data in the target table.

Great job! You have successfully added an inner join to a mapping. In the next step, you will create outer joins.

By default, the joins added by the Designer are inner joins. In this step, you will define outer joins between datastores to keep the rows that do not have matching values in both tables involved in the join.

Create the Load DIM_CUSTOMER mapping

  1. Create a new mapping with the following properties:

Property

Value

Parent folder

Mappings

Mapping name

Load DIM_CUSTOMER

Target Datastore

DIM_CUSTOMER

Source Datastores

T_CUSTOMER, T_TITLE, T_ADDRESS, DIM_GEOGRAPHY

  1. Set the following business rules:

Target column

Business rule

Execution location

Execution context

CUS_ID

T_CUSTOMER.CUS_ID

Source

I/U

CUS_TITLE

T_TITLE.TIT_NAME

Source

I/U

CUS_NAME

rtrim(T_CUSTOMER.CUS_FIRST_NAME) + ' ' + upper(rtrim(T_CUSTOMER.CUS_LAST_NAME))

Source

I/U

CUS_COMPANY

T_CUSTOMER.CUS_COMPANY

Source

I/U

GEO_KEY_ID

DIM_GEOGRAPHY.GEO_KEY_ID

Source

I/U

UPDATE_DATE

current_timestamp

Target

I/U

CUS_VIP

(This column will be loaded later in the tutorial)

  1. Follow the same procedure as in the Create an inner join step to add the following joins. Make sure you select the correct execution location for each join.

First Datastore

Second Datastore

Business rule

Execution location

T_CUSTOMER

T_TITLE

T_CUSTOMER.TIT_CODE=T_TITLE.TIT_CODE

Source

T_CUSTOMER

T_ADDRESS

T_CUSTOMER.CUS_ID=T_ADDRESS.CUS_ID

Source

T_ADDRESS

DIM_GEOGRAPHY

DIM_GEOGRAPHY.GEO_ZIP_CODE=T_ADDRESS.ADR_ZIP_CODE

Staging Area

Define outer joins

In this mapping, we want to retrieve all the records that are in T_CUSTOMER, including the records that have no match in T_ADDRESS or T_TITLE according to the join expressions. Similarly, we want to include the records from T_ADDRESS that have no match in DIM_GEOGRAPHY. To do this, you must use outer joins instead of inner joins:

  1. Select the join between T_CUSTOMER and T_ADDRESS.
  2. In the Properties view, select the radio button next to T_CUSTOMER to define it as the master table (main table) for the outer join.

  1. Follow the same procedure to define the following outer joins:

First Datastore

Second Datastore

Master table

T_CUSTOMER

T_TITLE

T_CUSTOMER

T_ADDRESS

DIM_GEOGRAPHY

T_ADDRESS

  1. Run the mapping and check the results in the Statistics view:

Name

Value

SUM(SQL_NB_ROWS)

300

SUM(SQL_STAT_INSERT)

100

SUM(SQL_STAT_UPDATE)

0

  1. Right-click the target datastore, select Action > Consult data, and then modify the query as follows:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID is null

  1. Run the query and check on the result grid that it returned 11. The source data contains 11 customers that have no recorded address or have an incorrect address. The outer joins defined in this mapping include these customers, but with a NULL address.

In the next step, you will use aggregation in mappings.

In this step, you will create a mapping to load The FACT_BILLING table. This table will be loaded with data from the joined T_BILLING and T_BILLING_LINES source datastores from the Hotel metadata.

Two columns from FACT_BILLING will be populated by aggregating data from source data:

Create the Load FACT_BILLING mapping

  1. Create the following mapping:

Property

Value

Parent folder

Mappings

Mapping name

Load FACT_BILLING

Target Datastore

FACT_BILLING

Source Datastores

T_BILLING, T_BILLING_LINES, DIM_TIME, DIM_DISCOUNT

  1. Define the following transformation business rules. Notice that the rules for BIL_AMOUNT and BIL_QTY contain the SUM aggregate function.

Target column

Business rule

Execution location

Execution scope

Key

BIL_KEY_ID

BIL_ID

T_BILLING.BIL_ID

Source

I/U

Functional key

CUS_ID

T_BILLING.CUS_ID

Source

I/U

TIME_KEY_DAY

DIM_TIME.TIME_KEY_DAY

Source

I/U

DIS_RANGE

DIM_DISCOUNT.DIS_RANGE

Source

I/U

Functional key

PMT_CODE

T_BILLING.PMT_CODE

Source

I/U

BIL_AMOUNT

sum(T_BILLING_LINES.BLL_AMOUNT)

Staging Area

I/U

BIL_QTY

sum(T_BILLING_LINES.BLL_QTY)

Staging Area

I/U

UPDATE_DATE

current_timestamp

Target

I/U

  1. Add the following inner joins:

First Datastore

Second Datastore

Business rule

Execution location

T_BILLING

DIM_TIME

T_BILLING.BIL_DATE=DIM_TIME.TIME_DATE

Staging Area

T_BILLING

T_BILLING_LINES

T_BILLING_LINES.BIL_ID=T_BILLING.BIL_ID

Source

T_BILLING_LINES

DIM_DISCOUNT

round(case

when T_BILLING_LINES.BLL_DISCOUNT_AMOUNT != 0

then (T_BILLING_LINES.BLL_DISCOUNT_AMOUNT / (T_BILLING_LINES.BLL_AMOUNT + T_BILLING_LINES.BLL_DISCOUNT_AMOUNT))*100

else T_BILLING_LINES.BLL_DISCOUNT_RATE

end)

between DIM_DISCOUNT.DIS_MIN and DIM_DISCOUNT.DIS_MAX

Staging Area

Activate aggregation

Before running the mapping, you must indicate which columns use aggregate functions:

  1. Select the BIL_AMOUNT target column:
  2. Click the icon on the left border of the datastore to activate aggregation of this column.
  3. Apply the same procedure to the BIL_QTY column.
  4. Run the mapping and check the results in the Statistics view:

Name

Value

SUM(SQL_NB_ROWS)

53 314

SUM(SQL_STAT_INSERT)

12 107

SUM(SQL_STAT_UPDATE)

0

  1. View the data in the FACT_BILLING table and check that the BIL_AMOUNT and BIL_QTY columns are loaded as expected with the aggregated amounts and quantities.

In the next step, you will learn how to add constraints and leverage reject detection.

Data quality is a major topic in data integration. Semarchy xDI allows defining constraints on datastores that include:

Designers configure reject detection in Semarchy xDI to detect and possibly quarantine those records that violate the constraints.

In this step, you will learn how to add constraints and enable reject detection.

Enable data quality for DIM_CUSTOMER

Create user-defined constraints

The DIM_GEOGRAPHY table in the target demo database contains two particular records that correspond to invalid geographical data ("No address" and "Unknown zip code").

GEO_KEY_ID

GEO_ZIP_CODE

GEO_CITY

GEO_STATE_CODE

GEO_STATE

0

No Address

1

?

Unknown Zip Code

?

?

You will now create two constraints on DIM_CUSTOMER to check that the value of the GEO_KEY_ID column in DIM_CUSTOMER is different from these values, and reject customers records with invalid geographical data.

To add the constraints:

  1. In the Project Explorer, double-click the Datamart metadata to open the metadata editor.
  2. Find and expand the DIM_CUSTOMER node in the editor.
  3. Right-click DIM_CUSTOMER and select New > Condition.

  1. In the Properties view, click on each field legend to activate the box, and enter the following values for the first constraint:

Property

Value

Name

CK_CUS_GEO_KEY_ID_001

User Message

No address

Reject Code

CUS_001

Description

Keep track of each record loaded into DIM_CUSTOMER without any address.

Condition Expression

DIM_CUSTOMER.GEO_KEY_ID <> 0

  1. In the Project Explorer, right-click DIM_CUSTOMER and select New > Condition again to create a second constraint.
  2. In the Properties view, click on each field legend to activate the box, and enter the following values for the second constraint:

Property

Value

Name

CK_CUS_GEO_KEY_ID_002

User Message

Unknown Zip Code

Reject Code

CUS_002

Description

Keep track of each record loaded into DIM_CUSTOMER with an address containing an unknown Zip Code

Condition Expression

DIM_CUSTOMER.GEO_KEY_ID <> 1

Use default values in Load DIM_CUSTOMER

To use the conditions created in the previous section, you must modify the business rule for the GEO_KEY_ID column in DIM_CUSTOMER: the current rule populates the GEO_KEY_ID target field with the value found in DIM_GEOGRAPHY or sets it to NULL if no value is found.

You will now modify the business rule to set GEO_KEY_ID to 0 if the customer had no address in the source, or to 1 if the customer had an address with an invalid zipcode:

  1. Open the mapping Load DIM_CUSTOMER.
  2. Select the target column GEO_KEY_ID.
  3. In the Expression Editor, replace the existing business rule with the following expression:
case
when T_ADDRESS.ADR_ID is null then 0
when T_ADDRESS.ADR_ID is not null and DIM_GEOGRAPHY.GEO_KEY_ID is null then 1
else DIM_GEOGRAPHY.GEO_KEY_ID
end
  1. An error icon appears next to the GEO_KEY_ID column to alert that the execution location for this business rule has become invalid: the rule cannot run in the source anymore because it involves two different source datastores (leave your mouse pointer a few seconds over the red icon to see the error details as a tooltip).
  2. Modify the execution location so that it runs in the Staging Area.

Enable reject detection on Load DIM_CUSTOMER

You are now ready to enable reject detection on the Load DIM_CUSTOMER mapping:

  1. Select the DIM_CUSTOMER datastore on the mapping diagram.
  2. Click the icon on the left border of the datastore to enable reject management.
  1. A Reject step is added to the mapping and the DIM_CUSTOMER datastore now shows the available constraints.

In this mapping, we want to keep track of the rejects without removing them from the dataflow. To do this:

  1. Select the Reject step on the DIM_CUSTOMER datastore.
  2. In the Properties view, click the Delete Rejects On Checked Table parameter name, and then unselect that option.

Your mapping is now configured to detect and track the rejects while keeping them in the dataflow.

Run the mapping and analyze the results

  1. Run the mapping and review the results in the Statistics view. A new SUM(SQL_STAT_ERROR) statistic provides the number of rejects. Check that 11 rejects were detected:

Name

Value

SUM(SQL_NB_ROWS)

322

SUM(SQL_STAT_ERROR)

11

SUM(SQL_STAT_INSERT)

0

SUM(SQL_STAT_UPDATE)

11

  1. Right-click the target datastore DIM_CUSTOMER in the Project Explorer, select Action > Consult Reject Table.
  2. Run the query to review the 11 lines with errors.

  1. Right-click the target datastore again and select Action > Consult data.
  2. Modify the query as follows to return the number of customers without address:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 0

  1. Run the query and check it returns 10 in the result grid.
  2. Modify the query as follows to return the number of customers having an invalid zip code in their address:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 1

  1. Run the query again and check it returns 1 in the result grid.

Enable data quality for FACT_BILLING

In this section, you will add a constraint to FACT_BILLING to check that the value of the BIL_AMOUNT column is greater than 10 and enable reject detection for this table. The rejects will be tracked in the rejects table but must still be inserted in FACT_BILLING.

  1. Follow the same procedure as for DIM_CUSTOMER to add the following user-defined constraint to the BILL_AMOUNT column of the FACT_BILLING table: FACT_BILLING.BIL_AMOUNT > 10
  2. Enable reject detection for FACT_BILLING.
  3. The BIL_KEY_ID column is not loaded by this mapping but generated by the target database, hence it must be ignored by the reject detection step. To disable the control of this column, select PK_FACT_BILLING (pk): true on the datastore then click the icon on its left border.

  1. The full name of the constraints becomes PK_FACT_BILLING (pk): false to indicate that the primary key constraint will be excluded from reject detection.
  2. Run the mapping and check the results in the Statistics view. Observe that 2 rows were tracked down by reject detection.

Name

Value

SUM(SQL_NB_ROWS)

53 316

SUM(SQL_STAT_ERROR)

2

SUM(SQL_STAT_INSERT)

0

SUM(SQL_STAT_UPDATE)

0

  1. View the 2 reject lines in the Reject table (BIL_ID 224 and 5994).

Congratulations

You have successfully added constraints and configured reject detection for two mappings. In the next step, you will create a stage to reuse some of the transformations defined in the Load DIM_TIME mapping.

A Stage in Semarchy xDI is a temporary area used in a mapping to combine multiple sources or host reusable transformations.

In this step, you will use a stage to host some of the transformation rules defined in the Load DIM_TIME mapping.

Create a stage

Create a stage in the Load DIM_TIME mapping:

  1. Drag-and-drop the DAY_DATE column on the mapping.

  1. Select Create a Stage > Browse all staging areas.

  1. A dialog box opens to select the stage's schema. Select the HOTEL_DATAMART schema.

  1. Give your stage a more user-friendly name: Select the stage on the diagram and set the Alias to mutualization in the Properties view. This name will be used in all transformation business rules that involve the stage.

Define the stage's columns

Define all the needed fields for the stage:

  1. The DAY_DATE column was automatically added to the stage at creation. Select this column, go to the Expression Editor, and then replace the generated expression with the following business rule:
convert(substr(Time.DAY_DATE, 7,4)
  + '-' + substr(Time.DAY_DATE, 4,2)
  + '-' + substr(Time.DAY_DATE, 1,2)
  + ' 00:00:00',TIMESTAMP)
  1. Select the stage and click the icon next to its left border to add a new field. The new column is added below DAY_DATE with the field default name.

  1. Select the field column on the stage. In the Properties view, set its alias to TIME_MONTH_DAY.
  2. Select the Expression Editor and enter the following expression: substr(Time.DAY_DATE, 1,2)
  3. Repeat steps 2 to 4 to add the following fields to the stage:

Stage field

Business rule

TIME_MONTH

substr(Time.DAY_DATE,4,2)

TIME_YEAR

substr(Time.DAY_DATE, 7,4)

Use the stage in your mapping

Modify the business rules that load DIM_TIME to use the stage's columns:

Target column

Business rule

Execution location

Execution context

Func. key

TIME_KEY_DAY

mutualization.TIME_YEAR + '/' + mutualization.TIME_MONTH + '/' + mutualization.TIME_MONTH_DAY

Staging Area

I/U

x

TIME_DATE

mutualization.DAY_DATE

Staging Area

I/U

TIME_MONTH_DAY

convert(mutualization.TIME_MONTH_DAY,NUMERIC)

Staging Area

I/U

TIME_WEEK_DAY

dayofweek(mutualization.DAY_DATE)

Staging Area

I/U

TIME_DAY_NAME

dayname(mutualization.DAY_DATE)

Staging Area

I/U

TIME_MONTH

convert(mutualization.TIME_MONTH,NUMERIC)

Staging Area

I/U

TIME_MONTH_NAME

monthname(mutualization.DAY_DATE)

Staging Area

I/U

TIME_QUARTER

quarter(mutualization.DAY_DATE)

Staging Area

I/U

TIME_YEAR

convert(mutualization.TIME_YEAR,NUMERIC)

Staging Area

I/U

Run the mapping and analyze the results

Run the mapping and review the results in the Statistics view:

Name

Value

SUM(SQL_NB_ROWS)

6 576

SUM(SQL_STAT_INSERT)

0

SUM(SQL_STAT_UPDATE)

0

Mutualizing the transformations in the stage did not change target table contents (no insert and no update were performed on DIM_TIME), but reduced code redundancy in your project.

In the next step, you will create another stage to combine several sources.

In this step, you will combine two different sources to load the DIM_TIME table: the Time.csv file and the T_PLANNING table.

Data from several sources can be combined in a stage before being loaded to a target. Using that feature, designers do not have to create a mapping for each source.

Modify the DIM_TIME load mapping to load both sources in the mutualization stage:

  1. Select the mutualization stage on the mapping diagram.
  2. Click the icon to the left of the Stage to add a set. The stage now shows 2 sets that are represented as tabs: A and B.

  1. Rename the first set: Select the A dataset and then set the Alias to FILE in the Properties view.
  2. Follow the same procedure to rename the B dataset to RDBMS.
  3. Specify the type of union between these datasets: Select the mutualization stage and, in the Expression Editor, enter the following business rule: [FILE] union [RDBMS].
  4. In the Project Explorer, select the T_PLANNING table from the Hotel metadata and drag it into the mapping diagram.
  5. Select the RDBMS set of the stage and enter the following business rules:

Target column

Business rule

DAY_DATE

T_PLANNING.PLN_DAY

TIME_MONTH_DAY

lpad(day(T_PLANNING.PLN_DAY), 2, '0')

TIME_MONTH

lpad(month(T_PLANNING.PLN_DAY), 2, '0')

TIME_YEAR

convert(year(T_PLANNING.PLN_DAY), VARCHAR)

  1. Save the mapping.
  2. Run the mapping and review the results in the Statistics view. Observe that 730 rows were inserted thanks to the union with the T_PLANNING datastore.

Name

Value

SUM(SQL_NB_ROWS)

9 132

SUM(SQL_STAT_INSERT)

730

SUM(SQL_STAT_UPDATE)

0

Congratulations! You performed the union of two sources within a stage.

In this last step, you will create a mapping that loads the FACT_BOOKING table using an inner join. This table will be used in the next unit, Work With Processes.

  1. Create the mapping with the following properties:

Property

Value

Parent folder

Mappings

Mapping name

Load FACT_BOOKING

Target Datastore

FACT_BOOKING

Source Datastores

DIM_TIME and T_BDR_PLN_CUS

  1. Define the transformation business rules:

Target column

Business rule

Execution location

Execution context

Key

BOK_KEY_ID

CUS_ID

T_BDR_PLN_CUS.CUS_ID

Source

I/U

Functional key

TIME_KEY_DAY

DIM_TIME.TIME_KEY_DAY

Source

I/U

Functional key

BDR_ID

T_BDR_PLN_CUS.BDR_ID

Source

I/U

Functional key

BOK_PEOPLE

T_BDR_PLN_CUS.PLN_CUS_PERS_COUNT

Source

I/U

BOK_BOOKED

case

when T_BDR_PLN_CUS.PLN_CUS_BOOKED = 'true' then 1

else 0

end

Source

I/U

UPDATE_DATE

current_timestamp

Target

I/U

  1. Define the following inner join:

First Datastore

Second Datastore

Business rule

Execution Location

DIM_TIME

T_BDR_PLN_CUS

T_BDR_PLN_CUS.PLN_DAY=DIM_TIME.TIME_DATE

Staging Area

  1. Run the mapping and check the results in the Statistics view:

Name

Value

SUM(SQL_NB_ROWS)

43 650

SUM(SQL_STAT_INSERT)

14 550

SUM(SQL_STAT_UPDATE)

0

Great job going through this tutorial! With this unit, you learned to use essential design features available in Semarchy xDI.

What we've covered

What's next?

In the next unit, Work with processes, you will learn how to use processes to design end-to-end integration flows.

GO TO TUTORIALS