Work With Mappings

In this tutorial, you will learn how to use some advanced mapping features available in Semarchy xDI.

What you will learn

Metadata creation for a relational database, delimited files, and fixed-with files.
Filters
Joins
Complex mapping expressions
Functional keys
Business rules' execution location and context
Reject detection
Deduplication in mappings
Stages

Before you start

In this tutorial, we assume that you have already installed Semarchy xDI Designer and followed the Create Your First Data Flow With Semarchy xDI tutorial.

If you have not installed the Designer yet, follow the Install Semarchy xDI Designer tutorial from the main tutorials page. Additional setup information can also be found in the installation guide.

GO TO TUTORIALS

Enjoy this tutorial!

You must first create the metadata that you will use in this unit.

Reverse engineer the Hotel source database

In this tutorial, multiple mappings use a Hypersonic SQL database as a source.

Follow the same steps as in the Create Your First Data Flow With Semarchy xDI tutorial to create that metadata and run the reverse engineering, using the following parameters:

Property	Value
Location	Tutorial – Fundamentals
Technology	Hypersonic SQL
Metadata Model name	Hotel
URL	`jdbc:hsqldb:hsql://localhost:62210`
User	`sa`
Password
Schema	HOTEL_MANAGEMENT

Use default values for all the other properties.

Once you have created the Hotel metadata, review the content of the reverse engineered tables: in the Project Explorer, expand Hotel > Hotel > HOTEL_MANAGEMENT, right-click each datastore under the newly created Hotel Metadata and select Actions > Consult data.

Reverse engineer the other delimited files

In the Create Your First Data Flow With Semarchy xDI tutorial, you created the Reference Files metadata corresponding to a folder that contains files used in the tutorials. You also reverse engineered the DiscountRanges.txt file. You will now add another delimited file from the same folder to this metadata.

Add and reverse engineer the Time.csv file

Declare and reverse engineer the Time.csv file from the metadata editor:

In the Project Explorer, double-click the Reference Files metadata.
In the metadata editor, expand the Server node, right-click the Reference Files folder, and then select Actions > Launch File wizard.

On the Properties page of the File Wizard, enter the following properties:

Property	Value
Name	Time
Physical Name	`Time.csv`
Header Line Position	0

Click Refresh to check the content of the file, and then click Next
On the Reverse page of the wizard, click Reverse to launch the reverse engineering process. The following content appears in the results table:

Name	Position	Size	Type
F1	1	66	String

Click the F1 column and rename it to DAY_DATE.

Click Finish to close the wizard.
Click the icon in the Designer toolbar or press CTRL+S to save the metadata.
Right-click the Time datastore in the metadata editor and select Actions > Consult Data to view the file's content.

Add and reverse engineer the REF_US_STATES.csv file

Follow the same steps to define a delimited file with the following properties:

Property	Value
Name Physical Name	US_States `REF_US_STATES.csv`
Field Separator	,
Header Line Position	1

Reverse engineer this file and review the resulting columns:

Name	Position	Type
STATE_UPPER_CASE	1	String
STATE	2	String
STATE_CODE	3	String

Reverse engineer a positional file

A positional (or fixed) file is a file in which columns are defined by their position and size instead of a text character.

The ref_us_cities.txt file found in the same folder is a positional file. You will now add it to the Reference Files metadata.

Launch the File Wizard as previously.
Click Browse and select the ref_us_cities.txt file.
In the Name field, enter US_Cities.
Modify the Type and choose POSITIONAL.
Check the content of the file by clicking Refresh.
Click Next to go to the Reverse page.

Contiguous columns in a positional file cannot be detected automatically by reverse engineering them. You must define them manually.

To define the three columns for the ref_us_cities.txt file, perform the following steps:

Click to declare a new column.
Set the Name, Position, Size, and Type properties with the values provided in the following table.
Do these steps until the three columns have been added.

Name	Position	Size	Type
ZIP_CODE	1	5	String
CITY	6	72	String
STATE_CODE	78	10	String

Once done, click Refresh to preview the file and check the layout is correctly defined.

Congratulations

You have successfully created all the metadata you will need for the rest of this tutorial.

In the next section, you will learn how to use a filter in a mapping.

Filters in Semarchy xDI allow you to refine data from a source datastore before passing it to a target. In this step, you will create a new mapping with a filter.

Create the Load DIM_PAYMENT_TYPE mapping

Create a new mapping under the mappings folder:

In the Project Explorer view, go to the Tutorial – Fundamentals project and right-click the Mappings folder.
Choose New > Mapping...
Set the File name to Load DIM_PAYMENT_TYPE then click Finish.

Add elements to the mapping:

Add the source datastore by dragging and dropping the T_PAYMENT_TYPE datastore from the Hotel metadata into the mapping diagram.
Add the target datastore by dragging and dropping the DIM_PAYMENT_TYPE datastore from the Datamart metadata into the mapping diagram.
On the mapping diagram, drag and drop the PMT_CODE column from the T_PAYMENT_TYPE datastore onto DIM_PAYMENT_TYPE and select Map - all matching fields.

Map - all matching fields will automatically create mapping expressions for all source and target columns with identical names.

Add a filter

In this section, you will filter out obsolete data coming from the T_PAYMENT_TYPE source table by only keeping records which have their PMT_ACTIVE column set to 1.

To create a Filter on the source datastore:

Drag and drop the PMT_ACTIVE column to the empty space on the mapping diagram.

Select Create a Filter in the dialog.

Click on the created filter in the Mapping Diagram, and in the Expression Editor, review the generated filter expression: T_PAYMENT_TYPE.PMT_ACTIVE=1

Click the icon in the Designer toolbar or press CTRL+S to save the mapping.

Run the mapping

Your mapping is now complete. To run it:

Make sure you have saved the mapping.
Right-click an empty area in the mapping editor and click Run.
Go to the Statistics view and review the results:

Name	Value
`SUM(SQL_NB_ROWS)`	`12`
`SUM(SQL_STAT_INSERT)`	`4`
`SUM(SQL_STAT_UPDATE)`	`0`

In the mapping, right-click the target table and select Action > Consult Data. The four following lines are returned by the query:

PMT_CODE	PMT_NAME	PMT_DESCRIPTION
BT	Bank Transfer
CC	Credit Card	visa,mastercard
CH	Cash
PP	PayPal	only web payment

Explore the generated process

Semarchy xDI Designer allows you to browse the generated process and its code.

Go back to the process generated by xDI and open its editor.

This process is composed of 3 steps:

L1_DIM_PAYMENT_TYPE-Load is the Load step. It transfers the data from the source table to the target database. This step was generated using the Load template defined in the mapping.
L1-DIM_PAYMENT_TYPE-Integration is the Integration step. It was generated using the Integration template defined in the mapping.
L1_DIM_PAYMENT_TYPE-Cleanup is a Cleanup step that cleans up temporary objects created during the execution of the L1_DIM_PAYMENT_TYPE-Load step. This step was also generated using the Load template.

Double-click the L1_DIM_PAYMENT_TYPE-Integration step to open it further. It contains two sub-steps: L1_DIM_PAYMENT_TYPE-Prepare and L1_DIM_PAYMENT_TYPE-Integration.

L1_DIM_PAYMENT_TYPE-Prepare runs the preliminary steps (creating temporary objects, transforming source data, etc.).
L1_DIM_PAYMENT_TYPE-Integration integrates the source data into the target table.

Select the Load DIM_PAYMENT_TYPE tab above the editor to navigate back to the parent process.
Double-click the L1_DIM_PAYMENT_TYPE-Load step. It is composed of process Actions and Sub-Processes:

Process Actions are the atomic elements xDI uses during execution: They use the same color coding scheme as process steps.
Sub-Processes are containers used to group and organize Actions.

Double-click the For Simple Table sub-process to open it, then select the Action named Select on source table (SRC).
Review the code that was executed on the source database in the Generated tab of the Step Detail view. Notice the where clause contains the filter that you added to the mapping:

Congratulations! You have successfully used a filter in a mapping. In the next step, you will use complex expressions and learn about the importance of business rules' execution location.

In this step, you will create and optimize a mapping with complex expressions.

Create the Load DIM_BEDROOM mapping

Create a new mapping with the following properties:

Property	Value
Parent folder	Mappings
Mapping name	Load DIM_BEDROOM
Target Datastore	DIM_BEDROOM
Source Datastore	T_BEDROOM

Define the business rules: for each row in the following table, select the Target Column in the target datastore and enter its corresponding Transformation Expression in the Expression Editor.

Target column	Transformation expression
BDR_ID	`T_BEDROOM.BDR_ID`
BDR_NUMBER	`T_BEDROOM.BDR_NUMBER`
BDR_FLOOR	`case` `when lower(T_BEDROOM.BDR_FLOOR) = 'gf' then 0` `when lower(T_BEDROOM.BDR_FLOOR) = '1st' then 1` `when lower(T_BEDROOM.BDR_FLOOR) = '2nd' then 2` `end`
BDR_BATH	`case` `when T_BEDROOM.BDR_BATH = 'true' then 1` `else 0` `end`
BDR_SHOWER	`case` `when T_BEDROOM.BDR_SHOWER = 'true' then 1` `else 0` `end`
BDR_BAR	`case` `when T_BEDROOM.BDR_BAR = 'true' then 1` `else 0` `end`
BDR_BED_COUNT	`convert(T_BEDROOM.BDR_BED_COUNT,NUMERIC)`
BDR_PHONE_NUMBER	`T_BEDROOM.BDR_PHONE_NUMBER`
BDR_TYPE	`T_BEDROOM.BDR_TYPE`
UPDATE_DATE	`current_timestamp`

Run the mapping.
Got to the Statistics view and review the results:

Name	Value
`SUM(SQL_NB_ROWS)`	`60`
`SUM(SQL_STAT_INSERT)`	`20`
`SUM(SQL_STAT_UPDATE)`	`0`

Modify the execution location of an expression

Run this mapping a second time. It will return the following statistics:

Name	Value
`SUM(SQL_NB_ROWS)`	`80`
`SUM(SQL_STAT_INSERT)`	`00`
`SUM(SQL_STAT_UPDATE)`	`20`

20 rows are marked as updated (SUM(SQL_STAT_UPDATE)) because the UPDATE_DATE column is set to the source database's current_timestamp value at every execution. By default, this expression runs on the source datastore, and the mapping always considers it a change when it compares source and target records before integrating the data.

To prevent UPDATE_DATE from being included in the source/target data comparison, you must execute the business rule for this column on the target datastore instead of the source datastore. When you do this, the current_timestamp expression is computed after the data is integrated into the target table, and the field is excluded from the comparison.

To change where the rule is executed:

Select the UPDATE_DATE target column on the mapping.
Click the Execution location icon to the left of the datastore.
Click the Target icon.

Run the mapping.
Go to the Statistics view to check the results. Changes to the UPDATE_DATE field are not counted as updates anymore.

Name	Value
`SUM(SQL_NB_ROWS)`	`60`
`SUM(SQL_STAT_INSERT)`	`0`
`SUM(SQL_STAT_UPDATE)`	`0`

In the next step, you will learn how to remove duplicate rows in a mapping.

In this step, you will create and configure a mapping to deduplicate rows found in the source datastore in order to only work with unique data.

Create the Load DIM_TIME mapping

Create a new mapping with the following properties:

Property	Value
Parent folder	Mappings
Mapping name	Load DIM_TIME
Target Datastore	`DIM_TIME`
Source Datastore	`Time`

The transformation business rules are:

Target column	Business rule	Execution location	Execution context	Key
*TIME_KEY_DAY*	`substr(Time.DAY_DATE, 7,4) + '/' + substr(Time.DAY_DATE, 4,2) + '/' + substr(Time.DAY_DATE, 1,2)`	Staging Area	I/U	Functional key
*TIME_DATE*	`convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP)`	Staging Area	I/U
*TIME_MONTH_DAY*	`convert(substr(Time.DAY_DATE, 1,2),NUMERIC)`	Staging Area	I/U
*TIME_WEEK_DAY*	`dayofweek(CONVERT(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))`	Staging Area	I/U
*TIME_DAY_NAME*	`dayname(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00', TIMESTAMP))`	Staging Area	I/U
*TIME_MONTH*	`convert(substr(Time.DAY_DATE,4,2),NUMERIC)`	Staging Area	I/U
*TIME_MONTH_NAME*	`monthname(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))`	Staging Area	I/U
*TIME_QUARTER*	`quarter(convert(substr(Time.DAY_DATE, 7,4) + '-' + substr(Time.DAY_DATE, 4,2) + '-' + substr(Time.DAY_DATE, 1,2) + ' 00:00:00',TIMESTAMP))`	Staging Area	I/U
*TIME_YEAR*	`convert(substr(Time.DAY_DATE, 7,4),NUMERIC)`	Staging Area	I/U

Enable deduplication

If you view the data in the Time file, you will notice that it contains 4 records per day (one record for every 6-hour time slot):

01/01/2010 00:00

01/01/2010 06:00

01/01/2010 12:00

01/01/2010 18:00

02/01/2010 00:00

02/01/2010 06:00

02/01/2010 12:00

02/01/2010 18:00

03/01/2010 00:00

03/01/2010 06:00

However, the DIM_TIME target table expects only one line per day. To filter out the duplicates from the dataflow, proceed as follows:

Select the Integration Template on the DIM_TIME datastore.
Open the Properties view.
Click the Use distinct link to enable the checkbox.
Check the Use distinct box.
Run the mapping and review the results in the Statistics view. Observe that the number of inserted rows is inferior to the total number of processed rows thanks to the Use Distinct option.

Name	Value
`SUM(SQL_NB_ROWS)`	`6 576`
`SUM(SQL_STAT_INSERT)`	`1 096`
`SUM(SQL_STAT_UPDATE)`	`0`

In the next step, you will create another mapping and define the functional key for a datastore.

The functional key in a mapping identifies a record and is used to define whether an incoming record is an existing or new record for the target datastore. By default, xDI Designer uses the target datastore's primary key as the functional key.

In this step, you will change a target datastore's functional key, as the default functional key cannot be used.

Create the Load DIM_GEOGRAPHY mapping

First, create a new mapping:

Create a new mapping with the following properties:

Property	Value
Parent folder	Mappings
Mapping name	Load DIM_GEOGRAPHY
Target Datastore	DIM_GEOGRAPHY
Source Datastores	`US_States, US_Cities`

Define the transformation business rules and their respective execution location as described in the following table:

Target column	Transformation expression	Execution location
*GEO_KEY_ID*	`next value for HOTEL_DATAMART.SEQ_GEO_KEY_ID`	Target
*GEO_ZIP_CODE*	`US_Cities.ZIP_CODE`	Source
*GEO_CITY*	`US_Cities.CITY`	Source
*GEO_STATE_CODE*	`US_Cities.STATE_CODE`	Source
*GEO_STATE*	`US_States.STATE_UPPER_CASE`	Source

Define the functional key of the mapping

In the DIM_GEOGRAPHY table, the primary key GEO_KEY_ID is not mapped with the source datastore because it is populated with a sequence number. It cannot be used as the functional key to identify new or existing records.

To define another functional key for this datastore:

In the mapping, select the GEO_KEY_ID column on the DIM_GEOGRAPHY datastore and then click the icon to disable this column as the functional key.
Select the GEO_ZIP_CODE column and click the icon to define it as the functional key for this mapping.

The functional key is now GEO_ZIP_CODE, which means that:

If a source record has a GEO_ZIP_CODE value that does not exist in the target table, xDI will create a new record.
If the GEO_ZIP_CODE value already exists in the target table, the matching record in the target datastore will be compared with the source record and updated if needed.

In the next step, you will modify the Load DIM_GEOGRAPHY mapping to apply a specific business rule only to updates.

In the Load DIM_GEOGRAPHY mapping, the GEO_KEY_ID target column is populated by a database sequence that generates a new value every time it is invoked.

Now, imagine that a second US_Cities source file is loaded to update an existing record. If the business rule for GEO_KEY_ID is applied, a new identifier will be generated and a new record will be created instead of updating the existing one.

To avoid this issue, you must configure this business rule so that it is applied only for insertions and not for updates:

Select the GEO_KEY_ID column.
Click the icon next to the datastore's left border to disable the execution of the business rule for updates.

In the next step, you will modify the Load DIM_GEOGRAPHY mapping to perform an inner join between the two source datastores.

Semarchy xDI supports joins to combine data between several source datastores. In this step, you will use an inner join in a mapping.

Add an inner join to Load DIM_GEOGRAPHY

In this mapping, the two data sets US_States and US_Cities are currently disjoint, which means that xDI will combine each line of the first set with each line of the second set. To avoid performing this cartesian product, you must create a join between these two datastores:

Drag and drop the STATE_CODE column from US_States to the STATE_CODE column in US_Cities.
Select Join in the dialog.
Review the generated join expression in the Expression Editor.

Run the mapping and go to the Statistics view to check the results. Note that the number of inserted lines corresponds to the number of rows in US_Cities having a STATE_CODE that exists in REF_US_STATES.

Name	Value
`SUM(SQL_NB_ROWS)`	`125 628`
`SUM(SQL_STAT_INSERT)`	`41 693`
`SUM(SQL_STAT_UPDATE)`	`0`

Run the mapping again.

Name	Value
`SUM(SQL_NB_ROWS)`	`125 628`
`SUM(SQL_STAT_INSERT)`	`0`
`SUM(SQL_STAT_UPDATE)`	`0`

Check the data in the target table.

Great job! You have successfully added an inner join to a mapping. In the next step, you will create outer joins.

By default, the joins added by the Designer are inner joins. In this step, you will define outer joins between datastores to keep the rows that do not have matching values in both tables involved in the join.

Create the Load DIM_CUSTOMER mapping

Create a new mapping with the following properties:

Property	Value
Parent folder	Mappings
Mapping name	Load DIM_CUSTOMER
Target Datastore	DIM_CUSTOMER
Source Datastores	T_CUSTOMER, T_TITLE, T_ADDRESS, DIM_GEOGRAPHY

Set the following business rules:

Target column	Business rule	Execution location	Execution context
*CUS_ID*	`T_CUSTOMER.CUS_ID`	Source	I/U
*CUS_TITLE*	`T_TITLE.TIT_NAME`	Source	I/U
*CUS_NAME*	`rtrim(T_CUSTOMER.CUS_FIRST_NAME) + ' ' + upper(rtrim(T_CUSTOMER.CUS_LAST_NAME))`	Source	I/U
*CUS_COMPANY*	`T_CUSTOMER.CUS_COMPANY`	Source	I/U
*GEO_KEY_ID*	`DIM_GEOGRAPHY.GEO_KEY_ID`	Source	I/U
*UPDATE_DATE*	`current_timestamp`	Target	I/U
*CUS_VIP*	(This column will be loaded later in the tutorial)

Follow the same procedure as in the Create an inner join step to add the following joins. Make sure you select the correct execution location for each join.

First Datastore	Second Datastore	Business rule	Execution location
*T_CUSTOMER*	*T_TITLE*	`T_CUSTOMER.TIT_CODE=T_TITLE.TIT_CODE`	Source
*T_CUSTOMER*	*T_ADDRESS*	`T_CUSTOMER.CUS_ID=T_ADDRESS.CUS_ID`	Source
*T_ADDRESS*	*DIM_GEOGRAPHY*	`DIM_GEOGRAPHY.GEO_ZIP_CODE=T_ADDRESS.ADR_ZIP_CODE`	Staging Area

Define outer joins

In this mapping, we want to retrieve all the records that are in T_CUSTOMER, including the records that have no match in T_ADDRESS or T_TITLE according to the join expressions. Similarly, we want to include the records from T_ADDRESS that have no match in DIM_GEOGRAPHY. To do this, you must use outer joins instead of inner joins:

Select the join between T_CUSTOMER and T_ADDRESS.
In the Properties view, select the radio button next to T_CUSTOMER to define it as the master table (main table) for the outer join.

Follow the same procedure to define the following outer joins:

First Datastore	Second Datastore	Master table
*T_CUSTOMER*	*T_TITLE*	T_CUSTOMER
*T_ADDRESS*	*DIM_GEOGRAPHY*	T_ADDRESS

Run the mapping and check the results in the Statistics view:

Name	Value
`SUM(SQL_NB_ROWS)`	`300`
`SUM(SQL_STAT_INSERT)`	`100`
`SUM(SQL_STAT_UPDATE)`	`0`

Right-click the target datastore, select Action > Consult data, and then modify the query as follows:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID is null

Run the query and check on the result grid that it returned 11. The source data contains 11 customers that have no recorded address or have an incorrect address. The outer joins defined in this mapping include these customers, but with a NULL address.

In the next step, you will use aggregation in mappings.

In this step, you will create a mapping to load The FACT_BILLING table. This table will be loaded with data from the joined T_BILLING and T_BILLING_LINES source datastores from the Hotel metadata.

Two columns from FACT_BILLING will be populated by aggregating data from source data:

BIL_AMOUNT must contain the total amount of the billing lines (T_BILLING_LINES records) linked to a given billing (T_BILLING record).
BIL_QTY must contain the total quantity from these billing lines.

Create the Load FACT_BILLING mapping

Create the following mapping:

Property	Value
Parent folder	Mappings
Mapping name	Load FACT_BILLING
Target Datastore	`FACT_BILLING`
Source Datastores	`T_BILLING, T_BILLING_LINES, DIM_TIME, DIM_DISCOUNT`

Define the following transformation business rules. Notice that the rules for BIL_AMOUNT and BIL_QTY contain the SUM aggregate function.

Target column	Business rule	Execution location	Execution scope	Key
*BIL_KEY_ID*
*BIL_ID*	`T_BILLING.BIL_ID`	Source	I/U	Functional key
*CUS_ID*	`T_BILLING.CUS_ID`	Source	I/U
*TIME_KEY_DAY*	`DIM_TIME.TIME_KEY_DAY`	Source	I/U
*DIS_RANGE*	`DIM_DISCOUNT.DIS_RANGE`	Source	I/U	Functional key
*PMT_CODE*	`T_BILLING.PMT_CODE`	Source	I/U
*BIL_AMOUNT*	`sum(T_BILLING_LINES.BLL_AMOUNT)`	Staging Area	I/U
*BIL_QTY*	`sum(T_BILLING_LINES.BLL_QTY)`	Staging Area	I/U
*UPDATE_DATE*	`current_timestamp`	Target	I/U

Add the following outer joins:

First Datastore	Second Datastore	Business rule	Execution location
*T_BILLING*	*DIM_TIME*	`T_BILLING.BIL_DATE=DIM_TIME.TIME_DATE`	Staging Area
*T_BILLING*	*T_BILLING_LINES*	`T_BILLING_LINES.BIL_ID=T_BILLING.BIL_ID`	Source
*T_BILLING_LINES*	*DIM_DISCOUNT*	`round(case` `when T_BILLING_LINES.BLL_DISCOUNT_AMOUNT != 0` `then (T_BILLING_LINES.BLL_DISCOUNT_AMOUNT / (T_BILLING_LINES.BLL_AMOUNT + T_BILLING_LINES.BLL_DISCOUNT_AMOUNT))*100` `else T_BILLING_LINES.BLL_DISCOUNT_RATE` `end)` `between DIM_DISCOUNT.DIS_MIN and DIM_DISCOUNT.DIS_MAX`	Staging Area

Activate aggregation

Before running the mapping, you must indicate which columns use aggregate functions:

Select the BIL_AMOUNT target column:
Click the icon on the left border of the datastore to activate aggregation of this column.
Apply the same procedure to the BIL_QTY column.
Run the mapping and check the results in the Statistics view:

Name	Value
`SUM(SQL_NB_ROWS)`	`53 314`
`SUM(SQL_STAT_INSERT)`	`12 107`
`SUM(SQL_STAT_UPDATE)`	`0`

View the data in the FACT_BILLING table and check that the BIL_AMOUNT and BIL_QTY columns are loaded as expected with the aggregated amounts and quantities.

In the next step, you will learn how to add constraints and leverage reject detection.

Data quality is a major topic in data integration. Semarchy xDI allows defining constraints on datastores that include:

Primary, alternate, and foreign keys
Format constraints
Nullable/mandatory constraints
Custom constraints, configured with conditions.

Designers configure reject detection in Semarchy xDI to detect and possibly quarantine those records that violate the constraints.

In this step, you will learn how to add constraints and enable reject detection.

Enable data quality for DIM_CUSTOMER

Create user-defined constraints

The DIM_GEOGRAPHY table in the target demo database contains two particular records that correspond to invalid geographical data ("No address" and "Unknown zip code").

GEO_KEY_ID	GEO_ZIP_CODE	GEO_CITY	GEO_STATE_CODE	GEO_STATE
`0`		`No Address`
`1`	`?`	`Unknown Zip Code`	`?`	`?`

You will now create two constraints on DIM_CUSTOMER to check that the value of the GEO_KEY_ID column in DIM_CUSTOMER is different from these values, and reject customers records with invalid geographical data.

To add the constraints:

In the Project Explorer, double-click the Datamart metadata to open the metadata editor.
Find and expand the DIM_CUSTOMER node in the editor.
Right-click DIM_CUSTOMER and select New > Condition.

In the Properties view, click on each field legend to activate the box, and enter the following values for the first constraint:

Property	Value
Name	CK_CUS_GEO_KEY_ID_001
User Message	No address
Reject Code	CUS_001
Description	Keep track of each record loaded into DIM_CUSTOMER without any address.
Condition Expression	`DIM_CUSTOMER.GEO_KEY_ID <> 0`

In the Project Explorer, right-click DIM_CUSTOMER and select New > Condition again to create a second constraint.
In the Properties view, click on each field legend to activate the box, and enter the following values for the second constraint:

Property	Value
Name	CK_CUS_GEO_KEY_ID_002
User Message	Unknown Zip Code
Reject Code	CUS_002
Description	Keep track of each record loaded into DIM_CUSTOMER with an address containing an unknown Zip Code
Condition Expression	`DIM_CUSTOMER.GEO_KEY_ID <> 1`

Use default values in Load DIM_CUSTOMER

To use the conditions created in the previous section, you must modify the business rule for the GEO_KEY_ID column in DIM_CUSTOMER: the current rule populates the GEO_KEY_ID target field with the value found in DIM_GEOGRAPHY or sets it to NULL if no value is found.

You will now modify the business rule to set GEO_KEY_ID to 0 if the customer had no address in the source, or to 1 if the customer had an address with an invalid zipcode:

Open the mapping Load DIM_CUSTOMER.
Select the target column GEO_KEY_ID.
In the Expression Editor, replace the existing business rule with the following expression:

case
when T_ADDRESS.ADR_ID is null then 0
when T_ADDRESS.ADR_ID is not null and DIM_GEOGRAPHY.GEO_KEY_ID is null then 1
else DIM_GEOGRAPHY.GEO_KEY_ID
end

An error icon appears next to the GEO_KEY_ID column to alert that the execution location for this business rule has become invalid: the rule cannot run in the source anymore because it involves two different source datastores (leave your mouse pointer a few seconds over the red icon to see the error details as a tooltip).
Modify the execution location so that it runs in the Staging Area.

Enable reject detection on Load DIM_CUSTOMER

You are now ready to enable reject detection on the Load DIM_CUSTOMER mapping:

Select the DIM_CUSTOMER datastore on the mapping diagram.
Click the icon on the left border of the datastore to enable reject management.

A Reject step is added to the mapping and the DIM_CUSTOMER datastore now shows the available constraints.

In this mapping, we want to keep track of the rejects without removing them from the dataflow. To do this:

Select the Reject step on the DIM_CUSTOMER datastore.
In the Properties view, click the Delete Rejects On Checked Table parameter name, and then unselect that option.

Your mapping is now configured to detect and track the rejects while keeping them in the dataflow.

Run the mapping and analyze the results

Run the mapping and review the results in the Statistics view. A new SUM(SQL_STAT_ERROR) statistic provides the number of rejects. Check that 11 rejects were detected:

Name	Value
`SUM(SQL_NB_ROWS)`	`322`
`SUM(SQL_STAT_ERROR)`	`11`
`SUM(SQL_STAT_INSERT)`	`0`
`SUM(SQL_STAT_UPDATE)`	`11`

Right-click the target datastore DIM_CUSTOMER in the Project Explorer, select Action > Consult Reject Table.
Run the query to review the 11 lines with errors.

Right-click the target datastore again and select Action > Consult data.
Modify the query as follows to return the number of customers without address:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 0

Run the query and check it returns 10 in the result grid.
Modify the query as follows to return the number of customers having an invalid zip code in their address:

select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 1

Run the query again and check it returns 1 in the result grid.

Enable data quality for FACT_BILLING

In this section, you will add a constraint to FACT_BILLING to check that the value of the BIL_AMOUNT column is greater than 10 and enable reject detection for this table. The rejects will be tracked in the rejects table but must still be inserted in FACT_BILLING.

Follow the same procedure as for DIM_CUSTOMER to add the following user-defined constraint to the BILL_AMOUNT column of the FACT_BILLING table: FACT_BILLING.BIL_AMOUNT > 10
Enable reject detection for FACT_BILLING.
The BIL_KEY_ID column is not loaded by this mapping but generated by the target database, hence it must be ignored by the reject detection step. To disable the control of this column, select PK_FACT_BILLING (pk): true on the datastore then click the icon on its left border.

The full name of the constraints becomes PK_FACT_BILLING (pk): false to indicate that the primary key constraint will be excluded from reject detection.
Run the mapping and check the results in the Statistics view. Observe that 2 rows were tracked down by reject detection.

Name	Value
`SUM(SQL_NB_ROWS)`	`53 316`
`SUM(SQL_STAT_ERROR)`	`2`
`SUM(SQL_STAT_INSERT)`	`0`
`SUM(SQL_STAT_UPDATE)`	`0`

View the 2 reject lines in the Reject table (BIL_ID 224 and 5994).

Congratulations

You have successfully added constraints and configured reject detection for two mappings. In the next step, you will create a stage to reuse some of the transformations defined in the Load DIM_TIME mapping.

A Stage in Semarchy xDI is a temporary area used in a mapping to combine multiple sources or host reusable transformations.

In this step, you will use a stage to host some of the transformation rules defined in the Load DIM_TIME mapping.

Create a stage

Create a stage in the Load DIM_TIME mapping:

Drag-and-drop the DAY_DATE column on the mapping.

Select Create a Stage > Browse all staging areas.

A dialog box opens to select the stage's schema. Select the HOTEL_DATAMART schema.

Give your stage a more user-friendly name: Select the stage on the diagram and set the Alias to mutualization in the Properties view. This name will be used in all transformation business rules that involve the stage.

Define the stage's columns

Define all the needed fields for the stage:

The DAY_DATE column was automatically added to the stage at creation. Select this column, go to the Expression Editor, and then replace the generated expression with the following business rule:

convert(substr(Time.DAY_DATE, 7,4)
  + '-' + substr(Time.DAY_DATE, 4,2)
  + '-' + substr(Time.DAY_DATE, 1,2)
  + ' 00:00:00',TIMESTAMP)

Select the stage and click the icon next to its left border to add a new field. The new column is added below DAY_DATE with the field default name.

Select the field column on the stage. In the Properties view, set its alias to TIME_MONTH_DAY.
Select the Expression Editor and enter the following expression: substr(Time.DAY_DATE, 1,2)
Repeat steps 2 to 4 to add the following fields to the stage:

Stage field	Business rule
*TIME_MONTH*	`substr(Time.DAY_DATE,4,2)`
*TIME_YEAR*	`substr(Time.DAY_DATE, 7,4)`

Use the stage in your mapping

Modify the business rules that load DIM_TIME to use the stage's columns:

Target column	Business rule	Execution location	Execution context	Func. key
*TIME_KEY_DAY*	`mutualization.TIME_YEAR + '/' + mutualization.TIME_MONTH + '/' + mutualization.TIME_MONTH_DAY`	Staging Area	I/U	x
*TIME_DATE*	`mutualization.DAY_DATE`	Staging Area	I/U
*TIME_MONTH_DAY*	`convert(mutualization.TIME_MONTH_DAY,NUMERIC)`	Staging Area	I/U
*TIME_WEEK_DAY*	`dayofweek(mutualization.DAY_DATE)`	Staging Area	I/U
*TIME_DAY_NAME*	`dayname(mutualization.DAY_DATE)`	Staging Area	I/U
*TIME_MONTH*	`convert(mutualization.TIME_MONTH,NUMERIC)`	Staging Area	I/U
*TIME_MONTH_NAME*	`monthname(mutualization.DAY_DATE)`	Staging Area	I/U
*TIME_QUARTER*	`quarter(mutualization.DAY_DATE)`	Staging Area	I/U
*TIME_YEAR*	`convert(mutualization.TIME_YEAR,NUMERIC)`	Staging Area	I/U

Run the mapping and analyze the results

Run the mapping and review the results in the Statistics view:

Name	Value
`SUM(SQL_NB_ROWS)`	`6 576`
`SUM(SQL_STAT_INSERT)`	`0`
`SUM(SQL_STAT_UPDATE)`	`0`

Mutualizing the transformations in the stage did not change target table contents (no insert and no update were performed on DIM_TIME), but reduced code redundancy in your project.

In the next step, you will create another stage to combine several sources.

In this step, you will combine two different sources to load the DIM_TIME table: the Time.csv file and the T_PLANNING table.

Data from several sources can be combined in a stage before being loaded to a target. Using that feature, designers do not have to create a mapping for each source.

Modify the DIM_TIME load mapping to load both sources in the mutualization stage:

Select the mutualization stage on the mapping diagram.
Click the icon to the left of the Stage to add a set. The stage now shows 2 sets that are represented as tabs: A and B.

Rename the first set: Select the A dataset and then set the Alias to FILE in the Properties view.
Follow the same procedure to rename the B dataset to RDBMS.
Specify the type of union between these datasets: Select the mutualization stage and, in the Expression Editor, enter the following business rule: [FILE] union [RDBMS].
In the Project Explorer, select the T_PLANNING table from the Hotel metadata and drag it into the mapping diagram.
Select the RDBMS set of the stage and enter the following business rules:

Target column	Business rule
DAY_DATE	`T_PLANNING.PLN_DAY`
TIME_MONTH_DAY	`lpad(day(T_PLANNING.PLN_DAY), 2, '0')`
TIME_MONTH	`lpad(month(T_PLANNING.PLN_DAY), 2, '0')`
TIME_YEAR	`convert(year(T_PLANNING.PLN_DAY), VARCHAR)`

Save the mapping.
Run the mapping and review the results in the Statistics view. Observe that 730 rows were inserted thanks to the union with the T_PLANNING datastore.

Name	Value
`SUM(SQL_NB_ROWS)`	`9 132`
`SUM(SQL_STAT_INSERT)`	`730`
`SUM(SQL_STAT_UPDATE)`	`0`

Congratulations! You performed the union of two sources within a stage.

In this last step, you will create a mapping that loads the FACT_BOOKING table using an inner join. This table will be used in the next unit, Work With Processes.

Create the mapping with the following properties:

Property	Value
Parent folder	Mappings
Mapping name	Load FACT_BOOKING
Target Datastore	`FACT_BOOKING`
Source Datastores	`DIM_TIME` and `T_BDR_PLN_CUS`

Define the transformation business rules:

Target column	Business rule	Execution location	Execution context	Key
*BOK_KEY_ID*
*CUS_ID*	`T_BDR_PLN_CUS.CUS_ID`	Source	I/U	Functional key
*TIME_KEY_DAY*	`DIM_TIME.TIME_KEY_DAY`	Source	I/U	Functional key
*BDR_ID*	`T_BDR_PLN_CUS.BDR_ID`	Source	I/U	Functional key
*BOK_PEOPLE*	`T_BDR_PLN_CUS.PLN_CUS_PERS_COUNT`	Source	I/U
*BOK_BOOKED*	`case` `when T_BDR_PLN_CUS.PLN_CUS_BOOKED = 'true' then 1` `else 0` `end`	Source	I/U
*UPDATE_DATE*	`current_timestamp`	Target	I/U

Define the following inner join:

First Datastore	Second Datastore	Business rule	Execution Location
*DIM_TIME*	*T_BDR_PLN_CUS*	`T_BDR_PLN_CUS.PLN_DAY=DIM_TIME.TIME_DATE`	Staging Area

Run the mapping and check the results in the Statistics view:

Name	Value
`SUM(SQL_NB_ROWS)`	`43 650`
`SUM(SQL_STAT_INSERT)`	`14 550`
`SUM(SQL_STAT_UPDATE)`	`0`

Great job going through this tutorial! With this unit, you learned to use essential design features available in Semarchy xDI.

What we've covered

You created metadata for various technologies: database, delimited, and fixed files.
You added a filter to a mapping.
You modified the functional key of a datastore.
You learned how to apply a business rule for updates only.
You used joins to combine datastores.
You enabled reject detection.
You learned how to handle duplicate rows in a mapping.
You created a stage to mutualize transformations and combine data sources.

What's next?

In the next unit, Work with processes, you will learn how to use processes to design end-to-end integration flows.

GO TO TUTORIALS