Loading Data Into the Hub

Loading data using Semarchy Convergence for Data Integration uses Mappings. A mapping is a flow which extracts data from source datastores, transforms and loads this data into target datastores. A mapping uses declarative rules defining the relations (joins) between the source datastores, the filters to apply to the source data and the column mappings (transformations) between source and target columns, in the context of the data flow.

Loading CRM Data

The CRM customer and contact information is provided as an XML export. This information must be published in the MDM hub in two separate tables: SD_CUSTOMER and SD_CONTACT. To perform the publishing operation, we will create two mappings loading the CRM Customer and the CRM Contact data into these tables.

Loading CRM Customer Data

This mapping publishes data from the CRM XML file that you have reverse-engineered into the SD_CUSTOMER table stored in the hub.

Creating the New Mapping

When creating a mapping, the first step is adding the source(s) and target(s).

  1. Select the GettingStarted > Mapping folder in your project.
  2. Click the Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
  3. Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
  4. In File Name enter PublishXMLCRMCustomertoHub and then click Finish. The PublishXMLCRMCustomertoHub mapping editor opens.
  5. In the Project Explorer, select the GettingStarted > Metadata >DemoXMLCRM node, right-click and then select Refresh.
  6. In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node. Locate the SD_CUSTOMER table under this node.
  7. Drag and drop the SD_CUSTOMER table from the project explorer to the mapping editor. As it is dropped first in the diagram, this table is added as the target of the mapping, and appears with an orange title.
  8. In the project explorer, expand the GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
  9. Drag and drop the Customers root entity from the project explorer into the mapping editor. It appears with a blue title as it is one of the sources of our mapping.

Defining the Column Mappings

After adding the source and target datastores, it is possible to define the column mappings. Column Mappings are expressions used to load the target datastore’s columns. We use three types of column mappings:

  1. Expand the Customers source in the mapping editor in order to see all the elements and attributes of the XML structure.
  2. In the mapping diagram, select the Customers > Customer > id attribute from the source XML structure ( Customers). Drag id to B_SOURCEID in the SD_CUSTOMER target table.
  3. Select B_SOURCEID column of the SD_CUSTOMER target table, and then select the Expression Editor view in the views located below the editor panel. This view displays the code of the expression that will load the B_SOURCEID column. The code contains for the moment: Customers.Customers/Customer/@id, which is the path of the attribute in the source XML structure.
  4. Select the Properties view. You can see the properties of this target column mapping.The Execution Location indicates the location where this expression will be executed. In this case, the expression runs in the Source. An source icon in the diagram in front of the target table’s column name provides the same indication.
  5. Drag and drop to create the following column mappings:
  6. After this mapping phase, the diagram looks as shown below:
  7. Select the B_BRANCHID column in the SD_CUSTOMER target table.
  8. Go to the Expression Editor view and enter 0 (zero). This mapping is a constant numeric value and must be executed in the target execution location.
  9. In the Properties view, select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to Target.
  10. Repeat the three previous steps to map the following columns to constant values and set their execution location to Target. Make sure to use single quotes around these constant values.
  11. In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
  12. Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
  13. Select the SEM_LOAD_ID element in the diagram, drag and drop it on the B_LOADID column in the target table. This column’s mapping is set to: %{SEM_LOAD_ID}%. This expression returns the value of the SEM_LOAD_ID variable. This variable will be set in the process that will execute this mapping.

The mapping appears shown below.

Note: In this mapping:
- F_ACCOUNT_MANAGER is a reference to another entity (The ID of the employee managing this customer account).
- B_PUBID and B_SOURCEID identify the source publishing application (CRM) and the record in this source application (via its ID).
- B_BRANCHID identifies the MDM data branch into which this data is published.
- B_CLASSNAME identifies explicitly which entity type is loaded in the SD_CUSTOMER table.
- B_LOADID identifies the data load transaction used for publishing data into the hub. Creating and submitting this transaction is explained later in this tutorial.
- The columns prefixed with GEO are enriched by the Semarchy Convergence for MDM master data certification process and do not need to be loaded.

Configuring the Mapping

In Semarchy Convergence for Data Integration, the integration behavior is configured by parameterizing the process template that was automatically added for the technologies (XML and Oracle) involved in the mapping.

  1. In the mapping editor, select the Integration:SD_CUSTOMER box. This graphical artifact represents the process template used for integrating data into SD_CUSTOMER.
  2. In the Properties view, in the Standard tab, you can configure how data integration to the SD_CUSTOMER table takes place.
  3. This templates mandates a specific update key for the mapping. This update key will allow comparing the incoming data flow with the hub content, and only integrate the new or updated records into the hub. To set the update key:
  4. Press CTRL+S to save this mapping.
  5. Close this editor.

Loading CRM Contact Data

This mapping publishes data from the CRM XML file that you reverse-engineered into the SD_CONTACT table stored in the hub.

  1. Select the GettingStarted > Mapping folder in your project.
  2. Click the Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
  3. Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
  4. In File Name enter PublishXMLCRMContacttoHub and then click Finish. The PublishXMLCRMContacttoHub mapping editor opens.
  5. In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
  6. Drag and drop the SD_CONTACT table from the project explorer to the mapping editor. As it is dropped first in the diagram, this table is added as the target of the mapping, and appears with an orange title.
  7. In the project explorer, expand the GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
  8. Drag and drop the Customers root entity from the project explorer into the mapping editor. It appears with a blue title as it is one of the sources of our mapping.
  9. Expand the Customers source in the mapping editor in order to see all the elements and attributes of the XML structure. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
  10. In the mapping diagram, select the Customers > Customer > contacts > id attribute from the source XML structure ( Customers). Drag id to the B_SOURCEID column in the SD_CONTACT target table.
  11. Repeat the same operation to perform the following mappings:
  12. Select the B_BRANCHID column in the SD_CONTACT target table.
  13. Go to the Expression Editor view, and enter 0 (zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
  14. In the Properties view, select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to Target.
  15. Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to Target:
  16. In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
  17. Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
  18. Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to: %{SEM_LOAD_ID}%.
  19. In the mapping editor, select the Integration:SD_CONTACT box. This graphical artifact represents the process template used for integrating data into SD_CONTACT.
  20. In the Properties view, in the Standard tab:
  21. In the diagram, select the B_LOADID column from the target SD_CONTACT table, right-click and un-select the Use as Key option in the context menu.
  22. Select the B_BRANCHID column. Right-click and select the Use as Key option in the context menu. The SD_CONTACT target table appears with the B_PUBID, B_SOURCEID and B_BRANCHID columns marked with a key indicator.
  23. The Customers XML file contains customers with no related contacts. To avoid generating empty contacts for these, add a filter to the Customers source:
    1. In the mapping editor, select the Customers > Customer > contacts > id attribute in the Customers source table. Keep the mouse button pressed, and drop this column on the mapping editor background.
    2. In the Expression Editor view, the code shows: Customers.Customers/Customer/contacts/@id=1.
    3. Replace this code with Customers.Customers/Customer/contacts/@id IS NOT NULL
  24. Review the mapping. It looks as shown below.
  25. Press CTRL+S to save this mapping.
  26. Close this editor.

Note: FS_CUSTOMER and FP_CUSTOMER identify a reference to a customer record. This reference consists of the source publisher code (mapped to the Foreign Publisher FP_CUSTOMER column) and the ID of the record in this source system (mapped to the Foreign Source ID FS_CUSTOMER column). The referenced customer for each contact is a customer from the CRM application with its ID appearing in the source Customer > id node.

Loading HR Data

Employee is read directly from the HR application. This information is stored in the SRC_EMPLOYEE table stored in the SEMARCHY_DEMO_SOURCE schema, and must be published into the SD_EMPLOYEE table stored in the hub.

  1. Select the GettingStarted > Mapping folder in your project.
  2. Click the Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
  3. Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
  4. In File Name enter PublishHREmployeetoHub and then click Finish. The PublishHREmployeetoHub mapping editor opens.
  5. In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
  6. Drag and drop the SD_EMPLOYEE table from the project explorer to the mapping editor. This table appears in orange as the target of the mapping.
  7. In the project explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_SOURCE node.
  8. Drag and drop the SRC_EMPLOYEE table from the project explorer to the mapping editor. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
  9. Select the B_BRANCHID column in the SD_EMPLOYEE target table.
  10. Go to the Expression Editor view, and enter 0 (zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
  11. Select the B_BRANCHID column in the SD_EMPLOYEE target table, right click and select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to Target.
  12. Repeat the three previous to map the following columns to the constant values provided below and set their execution location to Target:
  13. In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
  14. Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
  15. Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to: %{SEM_LOAD_ID}%.
  16. In the mapping editor, select the Integration:SD_EMPLOYEE box. This graphical artifact represents the process template used for integrating data into SD_EMPLOYEE.
  17. In the Properties view, in the Standard tab:
  18. In the diagram, select the B_LOADID column from the target SD_EMPLOYEE table, right-click and un-select the Use as Key option in the context menu.
  19. Select the B_BRANCHID target column. Right-click and select the Use as Key option in the context menu. The SD_EMPLOYEE target table appears with the B_PUBID, B_BRANCHID and EMPLOYEE_NUMBER columns marked with a key indicator.
  20. Select the PICTURE target column. In the Properties view, in the Tags field, enter IGNORE and then click the “+” button. The IGNORE tag is added to this column. Tagging columns is used to provide directives to the templates. In this case as this column is a picture (a binary object), we want to ignore it for certain aspects of the processing.
  21. Review the mapping. It looks as shown below.
  22. Press CTRL+S to save this mapping.
  23. Close this editor.

Loading Finance Data

Cost Center information is provided from the finance application via the cost_centers.txt file that was previously reverse-engineered.
This mapping publishes data from the cost_centers.txt file into the SD_COST_CENTER table stored in the hub.

  1. Select the GettingStarted > Mapping folder in your project.
  2. Click the Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
  3. Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
  4. In File Name enter PublishFINCostCentertoHub and then click Finish. The PublishFINCostCentertoHub mapping editor opens.
  5. In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
  6. Drag and drop the SD_COST_CENTER table from the project explorer to the mapping editor.
  7. In the project explorer, expand the GettingStarted > Metadata > DemoFile > Server > Demo Files Folder node. If this node does not appear in the view, press the F5 key to refresh the project explorer view.
  8. Drag and drop the cost_centers file from the project explorer to the mapping editor. Note that when you added this source, certain source fields have been automatically mapped to target columns as their names match exactly.
  9. In the mapping diagram, select the PARENT_COST_CENTER field from the source file ( cost_centers). Drag PARENT_COST_CENTER to the F_PCOST_CENTER column in the SD_COST_CENTER target table.
  10. Select the B_BRANCHID column in the SD_COST_CENTER target table.
  11. Go to the Expression Editor view, and enter 0 (zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location.
  12. Select the B_BRANCHID column in the SD_COST_CENTER target table, right click and select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to Target.
  13. Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to Target:
  14. In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
  15. Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
  16. Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to: %{SEM_LOAD_ID}%.
  17. In the mapping editor, select the Integration:SD_COST_CENTER box. This graphical artifact represents the process template used for integrating data into SD_COST_CENTER.
  18. In the Properties view, in the Standard tab, you can configure how data integration to the SD_COST_CENTER table takes place.
  19. In the diagram, select the B_LOADID column from the target SD_COST_CENTER table, right-click and un-select the Use as Key option in the context menu.
  20. Select the B_BRANCHID target column. Right-click and select the Use as Key option in the context menu. The SD_COST_CENTER target table appears with the B_PUBID, B_BRANCHID and COST_CENTER_ID columns marked with a key indicator.
  21. Review the mapping. It looks as shown below.
  22. Press CTRL+S to save this mapping.
  23. Close this editor.