Loading Data Into the Hub |
Previous
|
|
Next
|
Reverse-Engineering the Metadata |
|
Assembling the Publishing Process |
Loading Data Into the Hub
Loading data using Semarchy Convergence for Data Integration uses
Mappings. A mapping is a flow which extracts data from source datastores, transforms and loads this data into target datastores. A mapping uses declarative rules defining the relations (joins) between the source datastores, the filters to apply to the source data and the column mappings (transformations) between source and target columns, in the context of the data flow.
Loading CRM Data
The CRM customer and contact information is provided as an XML export. This information must be published in the MDM hub in two separate tables:
SD_CUSTOMER
and
SD_CONTACT
. To perform the publishing operation, we will create two mappings loading the CRM Customer and the CRM Contact data into these tables.
Loading CRM Customer Data
This mapping publishes data from the CRM XML file that you have reverse-engineered into the
SD_CUSTOMER
table stored in the hub.
Creating the New Mapping
When creating a mapping, the first step is adding the source(s) and target(s).
- Select the
GettingStarted > Mapping folder in your project.
- Click the
Create New Mapping button in the
Project Explorer toolbar. The
New Map Diagram wizard opens.
- Make sure that the
GettingStarted/Mapping folder is selected as the parent folder.
- In
File Name enter
PublishXMLCRMCustomertoHub and then click
Finish. The
PublishXMLCRMCustomertoHub mapping editor opens.
- In the
Project Explorer, select the
GettingStarted > Metadata >DemoXMLCRM node, right-click and then select
Refresh.
- In the
Project Explorer, expand the
GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node. Locate the
SD_CUSTOMER table under this node.
- Drag and drop the
SD_CUSTOMER table from the project explorer to the mapping editor. As it is dropped first in the diagram, this table is added as the target of the mapping, and appears with an orange title.
- In the project explorer, expand the
GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
- Drag and drop the
Customers root entity from the project explorer into the mapping editor. It appears with a blue title as it is one of the sources of our mapping.
Defining the Column Mappings
After adding the source and target datastores, it is possible to define the column mappings. Column Mappings are expressions used to load the target datastore’s columns. We use three types of column mappings:
- Mappings using the source datastore’s columns,
- Mappings using literals (constant values),
- Mappings using variables that are set at execution time.
- Expand the
Customers source in the mapping editor in order to see all the elements and attributes of the XML structure.
- In the mapping diagram, select the
Customers > Customer > id attribute from the source XML structure (
Customers). Drag
id to
B_SOURCEID in the
SD_CUSTOMER target table.
- Select
B_SOURCEID column of the
SD_CUSTOMER target table, and then select the
Expression Editor view in the views located below the editor panel. This view displays the code of the expression that will load the B_SOURCEID column. The code contains for the moment:
Customers.Customers/Customer/@id
, which is the path of the attribute in the source XML structure.
- Select the
Properties view. You can see the properties of this target column mapping.The
Execution Location indicates the location where this expression will be executed. In this case, the expression runs in the
Source. An source icon
in the diagram in front of the target table’s column name provides the same indication.
- Drag and drop to create the following column mappings:
- CUSTOMER_NAME = Customers > Customers > name
- TOTAL_REVENUE = Customers > Customers > totalRevenue
- INPADDRESS = Customers > Customer > address > address
- INPPOSTALCODE = Customer > Customer > address > zipCode
- INPCITY = Customer > Customer > address > city
- INPCOUTRY = Customer > Customer > address > country
- F_ACCOUNT_MANAGER = Customer > Customer > accountManagerId
- After this mapping phase, the diagram looks as shown below:
- Select the
B_BRANCHID column in the SD_CUSTOMER target table.
- Go to the
Expression Editor view and enter
0
(zero). This mapping is a constant numeric value and must be executed in the target execution location.
- In the
Properties view, select
Execution Location > Target. The execution location icon on the
B_BRANCHID is changed to
Target.
- Repeat the three previous steps to map the following columns to constant values and set their execution location to
Target. Make sure to use single quotes around these constant values.
- B_CLASSNAME =
'Customer'
.
- B_PUBID =
'CRM'
.
- In the
Project Explorer, expand the
global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
- Select the
SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
- Select the
SEM_LOAD_ID element in the diagram, drag and drop it on the
B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
. This expression returns
the value of the SEM_LOAD_ID variable. This variable will be set in the process that will execute this mapping.
The mapping appears shown below.
Note: In this mapping:
-
F_ACCOUNT_MANAGER is a reference to another entity (The ID of the employee managing this customer account).
-
B_PUBID and
B_SOURCEID identify the source publishing application (CRM) and the record in this source application (via its ID).
-
B_BRANCHID identifies the MDM data branch into which this data is published.
-
B_CLASSNAME identifies explicitly which entity type is loaded in the SD_CUSTOMER table.
-
B_LOADID identifies the data load transaction used for publishing data into the hub. Creating and submitting this transaction is explained later in this tutorial.
- The columns prefixed with
GEO are enriched by the Semarchy Convergence for MDM master data certification process and do not need to be loaded.
Configuring the Mapping
In Semarchy Convergence for Data Integration, the integration behavior is configured by parameterizing the process template that was automatically added for the technologies (XML and Oracle) involved in the mapping.
- In the mapping editor, select the
Integration:SD_CUSTOMER box. This graphical artifact represents the process template used for integrating data into SD_CUSTOMER.
- In the
Properties view, in the
Standard tab, you can configure how data integration to the SD_CUSTOMER table takes place.
- Select the
Action Process INTEGRATION Semarchy Convergence for MDM template.
- Click the
Use Distinct link to enable the field and select this option.
The Properties view is as shown below:
- This templates mandates a specific update key for the mapping. This update key will allow comparing the incoming data flow with the hub content, and only integrate the new or updated records into the hub. To set the update key:
- In the diagram, select the
B_LOADID column from the target
SD_CUSTOMER table, right-click and un-select the
Use as Key option in the context menu.
- Hold the
CTRL key pressed and select the
B_PUBID,
B_SOURCEID,
B_BRANCHID columns. Right-click and select the
Use as Key option in the context menu. The
SD_CUSTOMER target table appears with only these three columns marked with a key indicator, as shown below:
- Press
CTRL+S to save this mapping.
- Close this editor.
This mapping publishes data from the CRM XML file that you reverse-engineered into the
SD_CONTACT
table stored in the hub.
- Select the
GettingStarted > Mapping folder in your project.
- Click the
Create New Mapping button in the
Project Explorer toolbar. The
New Map Diagram wizard opens.
- Make sure that the
GettingStarted/Mapping folder is selected as the parent folder.
- In
File Name enter
PublishXMLCRMContacttoHub and then click
Finish. The
PublishXMLCRMContacttoHub mapping editor opens.
- In the
Project Explorer, expand the
GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
- Drag and drop the
SD_CONTACT table from the project explorer to the mapping editor. As it is dropped first in the diagram, this table is added as the target of the mapping, and appears with an orange title.
- In the project explorer, expand the
GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
- Drag and drop the
Customers root entity from the project explorer into the mapping editor. It appears with a blue title as it is one of the sources of our mapping.
- Expand the
Customers source in the mapping editor in order to see all the elements and attributes of the XML structure. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
- In the mapping diagram, select the
Customers > Customer > contacts > id attribute from the source XML structure (
Customers). Drag
id to the
B_SOURCEID column in the
SD_CONTACT target table.
- Repeat the same operation to perform the following mappings:
- JOB_TITLE = Customers > Customer > contacts > jobTitle
- IS_INFLUENCER = Customers > Customer > contacts > isIfluencer
- EMAIL_ADDRESS = Customers > Customer > contacts > email
- FS_CUSTOMER = Customers > Customer > id
- Select the
B_BRANCHID column in the SD_CONTACT target table.
- Go to the
Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
- In the
Properties view, select
Execution Location > Target. The execution location icon on the
B_BRANCHID is changed to
Target.
- Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to
Target:
- B_CLASSNAME =
'Contact'
.
- B_PUBID =
'CRM'
.
- FP_CUSTOMER =
'CRM'
.
- In the
Project Explorer, expand the
global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
- Select the
SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
- Select the
SEM_LOAD_ID element in the diagram, drag and drop it in the
B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
.
- In the mapping editor, select the
Integration:SD_CONTACT box. This graphical artifact represents the process template used for integrating data into SD_CONTACT.
- In the
Properties view, in the
Standard tab:
- Select the
Action Process INTEGRATION Semarchy Convergence for MDM template.
- Click the
Use Distinct link to enable the field and select this option.
- In the diagram, select the
B_LOADID column from the target
SD_CONTACT table, right-click and un-select the
Use as Key option in the context menu.
- Select the
B_BRANCHID column. Right-click and select the
Use as Key option in the context menu. The
SD_CONTACT target table appears with the
B_PUBID,
B_SOURCEID and
B_BRANCHID columns marked with a key indicator.
- The Customers XML file contains customers with no related contacts. To avoid generating empty contacts for these, add a filter to the
Customers source:
- In the mapping editor, select the
Customers > Customer > contacts > id attribute in the
Customers source table. Keep the mouse button pressed, and drop this column on the mapping editor background.
- In the
Expression Editor view, the code shows:
Customers.Customers/Customer/contacts/@id=1
.
- Replace this code with
Customers.Customers/Customer/contacts/@id IS NOT NULL
- Review the mapping. It looks as shown below.
- Press
CTRL+S to save this mapping.
- Close this editor.
Note:
FS_CUSTOMER and
FP_CUSTOMER identify a reference to a customer record. This reference consists of the source publisher code (mapped to the Foreign Publisher
FP_CUSTOMER
column) and the ID of the record in this source system (mapped to the Foreign Source ID
FS_CUSTOMER
column). The referenced customer for each contact is a customer from the
CRM
application with its ID appearing in the source
Customer > id
node.
Loading HR Data
Employee is read directly from the HR application. This information is stored in the
SRC_EMPLOYEE
table stored in the
SEMARCHY_DEMO_SOURCE
schema, and must be published into the
SD_EMPLOYEE
table stored in the hub.
- Select the
GettingStarted > Mapping folder in your project.
- Click the
Create New Mapping button in the
Project Explorer toolbar. The
New Map Diagram wizard opens.
- Make sure that the
GettingStarted/Mapping folder is selected as the parent folder.
- In
File Name enter
PublishHREmployeetoHub and then click
Finish. The
PublishHREmployeetoHub mapping editor opens.
- In the
Project Explorer, expand the
GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
- Drag and drop the
SD_EMPLOYEE table from the project explorer to the mapping editor. This table appears in orange as the target of the mapping.
- In the project explorer, expand the
GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_SOURCE node.
- Drag and drop the
SRC_EMPLOYEE table from the project explorer to the mapping editor. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
- Select the
B_BRANCHID column in the SD_EMPLOYEE target table.
- Go to the
Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
- Select the
B_BRANCHID column in the SD_EMPLOYEE target table, right click and select
Execution Location > Target. The execution location icon on the
B_BRANCHID is changed to
Target.
- Repeat the three previous to map the following columns to the constant values provided below and set their execution location to
Target:
- B_CLASSNAME =
'Employee'
.
- B_PUBID =
'HR'
.
- In the
Project Explorer, expand the
global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
- Select the
SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
- Select the
SEM_LOAD_ID element in the diagram, drag and drop it in the
B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
.
- In the mapping editor, select the
Integration:SD_EMPLOYEE box. This graphical artifact represents the process template used for integrating data into SD_EMPLOYEE.
- In the
Properties view, in the
Standard tab:
- Select the
Action Process INTEGRATION Semarchy Convergence for MDM template.
- Click the
Use Distinct link to enable the field and select this option.
- In the diagram, select the
B_LOADID column from the target
SD_EMPLOYEE table, right-click and un-select the
Use as Key option in the context menu.
- Select the
B_BRANCHID target column. Right-click and select the
Use as Key option in the context menu. The
SD_EMPLOYEE target table appears with the
B_PUBID,
B_BRANCHID and
EMPLOYEE_NUMBER columns marked with a key indicator.
- Select the
PICTURE target column. In the
Properties view, in the
Tags field, enter
IGNORE
and then click the “+” button. The
IGNORE
tag is added to this column. Tagging columns is used to provide directives to the templates. In this case as this column is a picture (a binary object), we want to ignore it for certain aspects of the processing.
- Review the mapping. It looks as shown below.
- Press
CTRL+S to save this mapping.
- Close this editor.
Loading Finance Data
Cost Center information is provided from the finance application via the
cost_centers.txt
file that was previously reverse-engineered.
This mapping publishes data from the
cost_centers.txt
file into the
SD_COST_CENTER
table stored in the hub.
- Select the
GettingStarted > Mapping folder in your project.
- Click the
Create New Mapping button in the
Project Explorer toolbar. The
New Map Diagram wizard opens.
- Make sure that the
GettingStarted/Mapping folder is selected as the parent folder.
- In
File Name enter
PublishFINCostCentertoHub and then click
Finish. The
PublishFINCostCentertoHub mapping editor opens.
- In the
Project Explorer, expand the
GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
- Drag and drop the
SD_COST_CENTER table from the project explorer to the mapping editor.
- In the project explorer, expand the
GettingStarted > Metadata > DemoFile > Server > Demo Files Folder node. If this node does not appear in the view, press the
F5 key to refresh the project explorer view.
- Drag and drop the
cost_centers file from the project explorer to the mapping editor. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
- In the mapping diagram, select the
PARENT_COST_CENTER field from the source file (
cost_centers). Drag
PARENT_COST_CENTER to the
F_PCOST_CENTER column in the
SD_COST_CENTER target table.
- Select the
B_BRANCHID column in the SD_COST_CENTER target table.
- Go to the
Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
- Select the
B_BRANCHID column in the SD_COST_CENTER target table, right click and select
Execution Location > Target. The execution location icon on the
B_BRANCHID is changed to
Target.
- Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to
Target:
- B_CLASSNAME =
'CostCenter'
.
- B_PUBID =
'FIN'
.
- In the
Project Explorer, expand the
global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
- Select the
SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
- Select the
SEM_LOAD_ID element in the diagram, drag and drop it in the
B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
.
- In the mapping editor, select the
Integration:SD_COST_CENTER box. This graphical artifact represents the process template used for integrating data into SD_COST_CENTER.
- In the
Properties view, in the
Standard tab, you can configure how data integration to the SD_COST_CENTER table takes place.
- Select the
Action Process INTEGRATION Semarchy Convergence for MDM template.
- Click the
Use Distinct link to enable the field and select this option.
- In the diagram, select the
B_LOADID column from the target
SD_COST_CENTER table, right-click and un-select the
Use as Key option in the context menu.
- Select the
B_BRANCHID target column. Right-click and select the
Use as Key option in the context menu. The
SD_COST_CENTER target table appears with the
B_PUBID,
B_BRANCHID and
COST_CENTER_ID columns marked with a key indicator.
- Review the mapping. It looks as shown below.
- Press
CTRL+S to save this mapping.
- Close this editor.
Previous
|
Top
|
Next
|
Reverse-Engineering the Metadata |
|
Assembling the Publishing Process |