Welcome to Semarchy Convergence for Data Integration.
This Getting Started tutorial provides a step-by-step introduction to Semarchy Convergence for Data Integration.
Preface
Audience
If you want to learn about MDM or discover Semarchy Convergence for MDM, you can watch our tutorials. |
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface |
Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic |
Italic type indicates special emphasis or placeholder variable that you need to provide. |
|
Monospace type indicates code example, text or commands that you enter. |
Other Semarchy Resources
In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.
Obtaining Help
There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness
of this documentation.
If you find any error or have any suggestion for improvement, please
mail support@semarchy.com and indicate the title of the documentation
along with the chapter, section, and page number, if available. Please
let us know if you want a reply.
Introduction to Convergence for Data Integration
Semarchy Convergence for Data Integration is the next generation high performance data integration platform that enables your IT team to deliver the right data, at the right place, at the right time. Used in conjunction with Semarchy Convergence for MDM, it manages the integration flows between the operational/analytic applications and the master data hub.
Semarchy Convergence for Data Integration is designed with the following key features for better design-time productivity and higher run-time performances.
High-Performance E-LT Architecture
Traditional data integration products use an Extract-Transform-Load (ETL) architecture to run data transformations. In this architecture, a dedicated proprietary engine is in charge of the data processing.
The E-LT (Extract-Load and Transform) architecture used in Semarchy Convergence for Data Integration removes the need for this middle-tier engine. It leverages the data processing engines in place to run processes generated and optimized for these engines.
By removing the cost of the hardware, software, maintenance and tuning skills required for the ETL server, the E-LT architecture guarantees the best TCO. This architecture scales naturally with the source and target datastores. It guarantees the best performance and scalability for data integration at any time.
Powerful User Experience
Semarchy Convergence for Data Integration uses an intuitive, familiar and robust Integrated Development Environment (IDE), the Convergence for DI Designer that contributes to the unified user experience of the Semarchy platform. This IDE is designed with features that ensure better productivity for development and higher efficiency for maintenance.
-
Metadata Driven Approach: The entire integration logic design is driven by metadata, supporting powerful cross-referencing and impact analysis capabilities that ease the development and maintenance of the data integration flows.
-
Declarative Design: Data mappings are designed graphically in a declarative way, focusing on the purpose of these mappings. The technical processes required for achieving the data movement and transformation are automatically generated using built-in process templates.
-
Integration Process Workflows: Complex process workflows are designed in diagrams, using a rich toolbox. Workflows support parallelism, conditional branching and event-based execution.
-
Components Reusability and Templating: The mappings as well as the workflows can be reused within the same project or in other projects. Processes can be converted into user-defined templates for reusability.
-
Seamless Team Collaboration: Standard source control systems such as Concurrent Versioning System (CVS) or Apache Subversion (SVN) are available from within the Convergence for DI Designer to safely share projects within the team or across teams in robust infrastructures.
Rationalized Production
Setting up and maintaining a production environment with Semarchy Convergence for Data Integration is made fast and simple:
-
Lightweight Runtime Deployment: Convergence for Data Integration uses a single lightweight Java component – the Runtime Engine – to handle runtime execution and logging.
-
Rationalized Deployment Model: The deployment model for data integration processes is designed for production users. Package files generated by the development team are easily configured and deployed by the production teams, using graphical or command-line interfaces.
-
Comprehensive Monitoring: The entire execution flow is tracked in logs which are accessed using a Web-based administration dashboard. Live executions can be monitored and past executions can be replayed in diagrams that reflect the processes created at design-time.
Enterprise Data Integration
Semarchy Convergence for Data Integration provides universal data access and enterprise-class capabilities for integration.
-
Extensible Connectivity Framework: In Convergence for Data Integration, connectivity is a fully extensible framework. Technology Descriptors can be added or customized to support any type of technology without restricting the capabilities to a subset. Built-in Process Templates can also be customized to generate processes optimized for specific use cases.
-
Built-in Technology Adapters: Out of the box adapters provide read/write access to a variety of systems and data formats including files, databases, XML, web services, applications, etc. They include both the technology descriptors and process templates for these technologies.
-
Real-Time and Batch Integration Patterns: Both the real-time (Using Changed Data Capture) and batch integration patterns are supported. These patterns cover the most common use cases for Enterprise Data Integration and Master Data Management.
-
Data Integration Services: Convergence for Data Integration provides out of the box access to web services for integration purposes. In addition, data integration flows designed in Convergence for Data Integration can be automatically exposed as web services and used as part of a Service Oriented Architecture (SOA).
-
Unified Product Suite: Convergence for Data Integration is fully integrated with the Semarchy Product Suite. Built-in process templates and patterns are provided to manage publishing and consuming master data in the Semarchy Convergence for MDM golden data hub.
Through all these features, Semarchy Convergence for Data Integration enables you to deliver master data where and when the business needs it, in a simple, fast and safe way.
Tutorial Overview
This tutorial illustrates the features of Convergence for Data Integration in the context of a Master Data Management project. The goal of this tutorial is to use Convergence for Data Integration to Publish source data into a Semarchy Convergence for MDM Hub, and Consume the resulting golden data to distribute it to other applications.
Installation and Configuration
To start the tutorial, you will install and configure the various components involved.
You will run the following tasks:
-
Install and configure Semarchy Convergence for MDM, the pre-built MDM Hub, and Convergence for Data Integration. These tasks are covered in the Installation chapter.
-
Create and configure your Convergence for Data Integration workspace, as described in the Setting Up the Project chapter.
Flows Designed in the Tutorial
The following schema gives an overview the integration flows involved in this tutorial.
Publish Data Into the Hub
In this tutorial you will use Convergence for Data Integration to publish data from operational sources into the MDM Hub managed by Semarchy Convergence for MDM. Semarchy Convergence for MDM will certify golden records from this data.
Publishing source data for certification into golden data is performed as a transaction. Such a publisher transaction is called an External Load.
The external load lifecycle is described below:
-
Initialize Load: Convergence for Data Integration uses a PL/SQL Interface to initialize an external load and request from the Convergence for MDM platform a Load ID identifying this external load.
-
Load Data: Convergence for Data Integration inserts data into the landing tables of the MDM Hub, using the Load ID.
-
Submit Load: Convergence for Data Integration uses a PL/SQL Interface to submit the external load to the Convergence for MDM platform using the Load ID. In the event of an error, Convergence for Data Integration can Cancel Load instead of submitting it. For example, when problem occurs in the Load Data phase.
In this tutorial, you will load the following data into the landing tables of the hub:
-
Cost Centers provided from the Finance department in flat file format.
-
Customers and Contacts information from the CRM system, provided in XML format.
-
Employee information from the HR database.
-
Customer and Contacts information from the Marketing database. Note that this data load is pre-built in the tutorial.
To perform the publishing task, you will:
-
Reverse-engineer some of the Metadata (data structures) involved in the data flows (tables, flat files and XML files). This task is described in the Reverse-Engineering the Metadata chapter.
-
Design the Mappings to load the data from these structure to the landing tables. This task is described in the Loading Data Into the Hub chapter.
-
Design a Process to manage the external load. This task is described in the Assembling the Publishing Process chapter.
-
Run the Process and review its results, as described in the Running the Publishing Process
Consume Data From the Hub
In order to deliver golden records to operational applications, you will design mappings to consume golden data from the MDM Hub tables and deliver it in an predefined XML format. The consumption process will extract golden data for the Customers and Contacts entities.
This task is described in the Consuming Data from the Hub chapter.
Installation
Requirements
Before starting this tutorial, make sure that the following requirements are met:
-
Oracle Database version 10.2 or above is installed and configured.
-
A Java Runtime Environment (JRE) or Development Kit (JDK) 7 ou 8 is installed and the JAVA_HOME or JRE_HOME environment variable is configured to point to this installation of Java. Set JAVA_HOME to your JDK installation directory (e.g.,
c:\Progra~1\java\jdk1.7.0
) or set JRE_HOME to the JRE base directory (e.g.,c:\Progra~1\java\jre1.7.0
).
Make sure that you also have the license keys for Semarchy Convergence for MDM and Convergence for Data Integration. |
Installing the Semarchy Convergence for MDM Demo
If you have already installed and configured Semarchy Convergence for MDM for the Getting Started with MDM Projects tutorial, the following configuration is not necessary. Simply start the Tomcat server and connect to the Convergence for MDM Workbench. In the menu, select Help > Demo Environment > Open Sample Source Data. In the demo application, use the Reset Sample Data button to restore the sample data to its original state. After performing these steps, you can directly go to the Installing Convergence for Data Integration section. |
Configure the Database Schemas
Convergence for MDM uses four schemas for the demonstration environment:
-
SEMARCHY_DEMO_REPOSITORY
contains the Convergence for MDM Repository. -
SEMARCHY_DEMO_MDM
is the schema into which you will deploy your MDM Hub. -
SEMARCHY_DEMO_SOURCE
contains sample source data used to load the MDM Hub. -
SEMARCHY_DEMO_PULSE
contains the Pulse Metrics data used in the MDM Hub dashboards.
To configure the database schemas:
-
Connect with a system account to the Oracle Database.
-
Run the following script to create the Convergence for MDM demo schemas:
CREATE USER SEMARCHY_DEMO_REPOSITORY IDENTIFIED BY SEMARCHY_DEMO_REPOSITORY DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
CREATE USER SEMARCHY_DEMO_SOURCE IDENTIFIED BY SEMARCHY_DEMO_SOURCE DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
CREATE USER SEMARCHY_DEMO_MDM IDENTIFIED BY SEMARCHY_DEMO_MDM DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
CREATE USER SEMARCHY_DEMO_PULSE IDENTIFIED BY SEMARCHY_DEMO_PULSE DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
GRANT CONNECT, RESOURCE TO SEMARCHY_DEMO_REPOSITORY, SEMARCHY_DEMO_SOURCE, SEMARCHY_DEMO_MDM;
GRANT CONNECT, RESOURCE, SELECT ANY TABLE, CREATE ANY VIEW TO SEMARCHY_DEMO_PULSE;
-- The following command should be used for Oracle 12c and above
GRANT UNLIMITED TABLESPACE TO SEMARCHY_DEMO_REPOSITORY, SEMARCHY_DEMO_SOURCE, SEMARCHY_DEMO_MDM, SEMARCHY_DEMO_PULSE;
If you have already created these schemas and wish to delete them beforehand, use the following script:
DROP USER SEMARCHY_DEMO_REPOSITORY CASCADE;
DROP USER SEMARCHY_DEMO_MDM CASCADE;
DROP USER SEMARCHY_DEMO_SOURCE CASCADE;
DROP USER SEMARCHY_DEMO_PULSE CASCADE;
Install and Start the Convergence for MDM Server
Convergence for MDM runs as a web application in a Java Application Server. A simple Apache Tomcat Server, pre-configured with Convergence for MDM is used for this tutorial.
In the following section, the semarchy-mdm-preconfigured-xxx.zip
file refers to
the Semarchy Convergence for MDM - Pre-Configured with Apache Tomcat file
that you can download to install Semarchy Convergence for MDM. The name
of this file varies as it includes the Convergence for MDM version and
build number.
-
Download the Semarchy Convergence for MDM archive. Make sure to download the version that includes the pre-configured Tomcat Server. It is named
semarchy-mdm-preconfigured-xxx.zip
. -
Install and start the Apache Tomcat Server:
-
On a Windows Platform:
-
Decompress the
mdm-server/
folder from the archive in a folder called/semarchy
in your machine. -
Open Windows Explorer, and go to the
/semarchy/bin
folder. -
Run
startup.bat
.
-
-
On a UNIX/Linux Platform:
-
Decompress the
mdm_server/
folder from the archive in a$HOME/semarchy
folder in your machine. -
In a Shell window, run
$HOME/semarchy/bin/startup.sh
.
-
-
The Apache Tomcat server starts by default on the port
8088. If this port is already used by another application and you wish
to start the server on a different port, edit the /conf/server.xml
file with a text editor, and change the port value in the following
line: <Connector port="8088" protocol="HTTP/1.1"
|
The Convergence for MDM server is preconfigured with JDBC datasources pointing to the schemas you created (called
SEMARCHY_DEMO_REPOSITORY , SEMARCHY_DEMO_SOURCE , SEMARCHY_DEMO_MDM and SEMARCHY_DEMO_PULSE ), stored in an Oracle instance started on the
local machine and listening on port 1521 (that is: localhost:xe:1521 ).
If the Oracle instance that you are using is configured differently:
First, stop the Convergence for MDM server, edit the
/conf/catalina/localhost/semarchy.xml to change the configuration of
the datasources and then restart the Convergence for MDM server.
|
To shut down the Convergence for MDM server, run the shutdown.bat
(Windows) or shutdown.sh
(UNIX/Linux) script in the /bin
folder.
Install the Repository
Convergence for MDM holds all its information in a repository stored in a database schema. The first task when connecting Convergence for MDM is to create this repository structure in the database schema previously created.
-
Open your web browser and connect to the following URL:
http://localhost:8088/semarchy/workbench
(update the port number if you changed it during the installation process) -
In the login prompt, enter the following:
-
User: semadmin
-
Password: semadmin
-
-
The Convergence for MDM Workbench opens with the license agreement. Review the End-User License Agreement.
-
Check the I have read and accept Semarchy’s End-User License Agreement box and then click Next.
-
In the License Key File page, select a valid license key by clicking the Upload License Key file… button and then click Next.
-
In the Repository Creation wizard, select Design for the type of repository and then click Finish.
-
Click OK when the Repository Successfully Created message appears.
The repository has been created and Convergence for MDM is now up and running.
Create the Demo Model
The Tutorial Environment contains a Customer and Financial Hub model. This model simulates an MDM project in progress.
To seed the Customer and Financial Hub model:
-
In the Convergence for MDM menu, select Help > Demo Environment > Create Demo Model…
-
In the Getting Started Setup wizard, select the Full Setup option and then click Next.
-
The second wizard screen allows you to rename the model:
-
If you are running this tutorial for the first time, leave the fields unchanged.
-
If you have already seeded a demonstration model in this repository and want to seed the model a second time with a different name, edit the Demo Model Name (for example, enter CustomerAndFinancialMDM2). Note that you will have to use this new name for the rest of the tutorial every time you are requested to enter CustomerAndFinancialMDM as a value.
-
-
Click Finish.
-
Click OK in the confirmation window.
The demonstration model is seeded, and the Model Edition view opens.
Seed the Source Data
For this tutorial, we use a demonstration application that accesses sample data stored in the SEMARCHY_DEMO_SOURCE schema. |
When you access the Demo Application for the first time, you are prompted to install the sample data for the demo application. Click OK to install the sample data. This step is executed only once. |
To load sample data into the MDM Hub:
-
In the menu, select Help > Demo Environment > Open Sample Source Data. If you run the demo application for the first time, click OK to install the sample data when prompted. The Demo Applications editor opens:
-
The demonstration application contains several editors, each corresponding to an application publishing to the MDM hub. These application appear as separates tabs at the bottom of the editor.
-
You can edit or delete sample records from these tabs.
-
The Reset Sample Data button allows you to restore the sample data to its original state.
-
Do not use the Publish Data to MDM… button. The purpose of this tutorial is to create the data integration flows to perform all the data publishing operations. |
Installing Convergence for Data Integration
Install and Start Convergence for Data Integration
Convergence for Data Integration includes two key components:
-
The Designer is the Graphical User Interface (GUI) into which data developers create their integration jobs.
-
The Runtime Engine is the component that runs these integration jobs.
In the following section:
-
semarchy-di-designer.zip
file refers to the Convergence for Data Integration- Full Setup file that you can download to install Semarchy Convergence for Data Integration. The name of this file varies as it includes the platform information, product version and build number. -
<semarchy_di>
refers to the installation folder of Convergence for Data Integration.
To install Convergence for Data Integration:
-
Download the Convergence for Data Integration distribution (
semarchy-di-designer.zip
) corresponding to your platform and to your default Java Machine (32 vs. 64 Bits). -
Uncompress the
semarchy-di-designer.zip
file in your machine. This will create asemarchy_di
sub-folder. This sub-folder will be referred to as<semarchy_di>
(the Convergence for Data Integration installation directory). -
Start the Designer:
-
On Windows platforms:
-
Open Windows Explorer, and go to the
<semarchy_di>
folder. -
Run
semarchy.exe
. The Designer starts.
-
-
On UNIX/Linux platforms:
-
Open a shell window, and go to the
<semarchy_di>
folder. -
Run
./semarchy
. The Designer starts. -
In a shell window, go to the
<semarchy_di>/runtime
folder and runchmod 755 *.sh
to make the runtime scripts executable.
-
-
-
When the Designer starts, it prompts you for the license key.
-
In the Please validate your product dialog, enter in the Key the key string that was provided to you by Semarchy.
-
Click the Apply button.
-
After registering the license key, you must create the folder into which the Designer will store its data. This folder in your local machine is the Workspace. Convergence for DI Designer prompts creates by default a
workspace
folder in its installation directory.
To install it in a different location:-
In the Workspace Launcher window, click the Browse button.
-
In the Select Workspace Directory dialog, select the folder into which the workspace will be created.
-
Click OK to create the workspace and open it. The Convergence for Data Integration Designer window opens on the Introduction page. This page provides access Overview, Tutorials and Web Resource pages.
-
-
Click the Workbench link
to open the newly created workbench.
Directories Contents
The <semarchy_di>
directory contains the following sub-folders:
-
/samples
contains the files for running this getting started tutorial and other samples. -
/workspace
contains the workspace that you have created. Note that you can have several workspaces for a single installation. You can locate these workspaces anywhere in your file system. -
/templates
contains the templates provided out of the box with Convergence for Data Integration. -
/runtime
contains the Convergence for Data Integration runtime engine binary and startup scripts. -
/plugins
andconfiguration
contain the binaries and configuration files for the Designer.
Creating the Staging Area Schema
The E-LT architecture leverages the capabilities of a database for data integration. For the purpose of data processing, it may create temporary tables in the database it uses.
To host these temporary tables, it is recommended to create a schema called the Staging Area. We will create a schema called SEMARCHY_STAGING
as our staging area.
-
Connect with a system account to the Oracle Database.
-
Run the following script to create the staging area schema:
CREATE USER SEMARCHY_STAGING IDENTIFIED BY SEMARCHY_STAGING DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
GRANT CONNECT,RESOURCE,DBA TO SEMARCHY_STAGING;
-- The following command should be used for Oracle 12c and above
GRANT UNLIMITED TABLESPACE TO SEMARCHY_STAGING;
In a production environment, if a schema already exists for storing temporary objects in a database, this schema can be specified and used as the staging area. The user attached to this schema should have read access to the tables used as sources of the data integration flows, and write access to the tables used as targets of the integration flows. The user should have full access to the staging area schema. |
The DBA privilege is required for the user connecting with Convergence for Data Integration for working with the SQLExplorer component in the Designer to list the objects of the database instance. In production, when only processes need to run, it is recommended to implement a reduced set of privileges for this user.
|
Introduction to the Convergence for DI Designer
The Convergence for Data Integration Designer appears as follows.
In the Designer, the following sections are available:
-
The Project Explorer view provides a hierarchical view of the resources. From here, you can open files for editing or select resources for operations such as exporting.
-
The Editors section contains the various objects being edited: mappings, metadata, processes, etc.
-
Various other Views are organized around the edition view and allow navigating, viewing and object properties.
Perspectives store the layout of the various views in the Designer. A default Semarchy perspective is provided and it is possible to customize your own perspectives.
Setting Up the Project
In Semarchy Convergence for Data Integration, design-time resources are organized into projects and folders. A project can be used in a team environment and can be version-controlled using source control systems such as Concurrent Versioning System (CVS) or Apache Subversion (SVN).
Importing the Templates
Semarchy Convergence for Data Integration uses process templates to generate the processes from the mappings. For example, a mapping loading data from an XML file to an Oracle database server uses a process template that contains optimized technical steps for this particular situation.
Processes for handling integration load transactions with the Semarchy Convergence for MDM platform are also provided.
After creating your project, you must import the templates into the workspace.
A project named Global is automatically created in the workspace for hosting the built-in process templates. This project is a placeholder containing the configuration elements and templates shared by all projects in this workspace. |
-
In the Project Explorer, right-click the global project and then select Import….
-
In the Import Wizard, select General > Archive File for the import source.
-
Click Next.
-
Use the Browse button to select the
<semarchy_di>/templates/templates.generic.yyyy-mm-dd.zip
archive file in the From Archive File: field. Note that the name of this file contains the release dates for the templates. -
Expand the tree view and make sure that all the templates and folders are selected.
-
Click Finish. The import process imports all the templates from the archive file into the global project.
Import these 3 zip files from <semarchy_di>/templates
into the Global project:
-
templates.generic.yyyy-mm-dd.zip
-
templates.specific.yyyy-mm-dd.zip
-
templates.semarchy.zip
The global project appears as shown below:
Importing the Pre-Built Project
A pre-built project is provided in the <semarchy_di>/samples/projects/GettingStarted.zip
file, and must be imported into the workspace.
-
Select File > Import in the menu.
-
In the Import Wizard, select General > Existing Projects into Workspace for the import source.
-
Click Next.
-
Check the Select Archive File: option and then then use the Browse button to select the
<semarchy_di>/samples/projects/GettingStarted.zip
file.
-
Click Finish. The import process imports the pre-built project content into the workspace.
-
In the Project Explorer, right-click the GettingStarted project and then select Refresh.
-
Expand the nodes of the Getting Started project. It appears with the following content.
The Getting Started project contains two folders:
-
The Metadata folder includes one metadata definition for the Oracle database containing the SEMARCHY_DEMO_MDM, SEMARCHY_DEMO_SOURCE and SEMARCHY_STAGING schemas.
-
The Mapping folder contains pre-built mappings using this database. These mappings load the Customer and Contact data from the Marketing (MKT) system into the Hub.
Creating a Folder
To keep the project organized, we will create an additional Process folder to host the processes designed in this tutorial.
-
Right-click the Getting Started project in the Project Explorer and then select New > Folder in the context menu. The New Folder wizard opens.
-
In Folder Name, enter Process.
-
Click Finish. The Process folder appears under the Getting Started project.
Reverse-Engineering the Metadata
Semarchy Convergence for Data Integration moves and transforms data between datastores. These datastores are database tables, flat files, XML structures, web services, etc. The connection information to these datastores must be declared in Semarchy Convergence for Data Integration, and their structure (the metadata) can be automatically reverse-engineered. The reverse-engineering process creates a definition of the datastores' structures in Convergence for Data Integration. Data flows are created at design-time using this metadata. Run-time execution will use the connection information to perform the extract-load and transform operations between the source and target systems.
Reverse-Engineering the Databases
The MDM Hub is stored in an Oracle schema named SEMARCHY_DEMO_MDM. In the same database, the Source HR and Marketing data is stored in a schema named SEMARCHY_DEMO_SOURCE.
Defining the Oracle Data Server and MDM Hub Schema
The Oracle Data Server is already configured in the GettingStarted > Metadata > OracleServer metadata file. The three schemas used in this tutorial are also already defined:
-
SEMARCHY_STAGING: The staging area
-
SEMARCHY_DEMO_MDM: The MDM Hub
-
SEMARCHY_DEMO_SOURCE: The source HR and Marketing data.
In this last schema definition, one of the tables required for this tutorial, the SRC_EMPLOYEE
table, which contains HR data, was not reverse-engineered. We will check the configuration of the Oracle Instance and reverse-engineer the missing table.
-
In the Project Explorer, expand the GettingStarted > Metadata node.
-
Double-click OracleServer. The Metadata editor opens.
-
In this editor, select the OracleServer node, right-click and select Actions > Edit Connection.
-
In the Duplicate Connection Profile dialog, check that the connection information to the Oracle instance is correct. If not modify this connection information:
-
Driver: Select the Oracle Thin Driver.
-
URL:
jdbc:oracle:thin:@localhost:1521:XE
. Change this URL if your Oracle instance runs on a different host/port or has a different instance name. -
User: SEMARCHY_STAGING.
-
Password: SEMARCHY_STAGING.
-
-
Click the Connect button to validate this connection.
-
Click OK to close this dialog.
-
In the editor, select the OracleServer > SEMARCHY_DEMO_SOURCE node.
-
Expand the node SEMARCHY_DEMO_SOURCE. SRC_CONTACT and SRC_CUSTOMER have been defined for you. You will define SRC_EMPLOYEE.
-
Right-click and select Actions > Launch DataSchema Wizard. The Schema Wizard opens and shows the properties of this schema.
-
Click Next. The Reverse Datastore page opens.
-
Click the Refresh button. A filtered list of datastore appears.
-
Select the SRC_EMPLOYEE table.
-
Click Finish. The reverse-engineering process starts and retrieves the structure of the SRC_EMPLOYEE table.
-
When the reverse-engineering process is finished, press CTRL+S to save the editor.
-
In the OracleServer.md editor, expand the OracleServer > SEMARCHY_DEMO_SOURCE node. The list of reverse-engineered tables appears as shown below.
-
Close this editor.
Reverse-Engineering the Cost Centers Files
The file containing the Cost Centers data provided by the Finance team is a delimited file stored in the <semarchy_di>/samples/files/
folder. We will declare and reverse-engineer this file.
-
Click the
Create New Metadata button in the Project Explorer toolbar. The New Metadata Model wizard opens.
-
In the Choose the type of Metadata tree, select file > File Server.
-
Click Next.
-
Make sure that the GettingStarted/Metadata folder is selected as the parent folder.
-
In MetaData Model Name enter DemoFile and then click Finish. The DemoFile.md editor opens and the File Wizard open automatically.
-
In the Directory page, enter Demo Files Folder in the Name field.
-
Click the Browse button and select the
<semarchy_di>/samples/files/
.
-
Click Next. The Set File Properties page opens.
-
In this page:
-
Use the Browse button to select the
cost_centers.txt
file. -
Click the Refresh button to see the content of the file.
-
Make sure that the Type is set to DELIMITED
-
Make sure that the Line Separator is set to
\r\n
-
Replace the Field Separator value with a semi-colon
;
-
In the String Delimiter, enter a double-quote character
"
. -
In the Header Line Position, enter
1
.
-
-
Click Next.
-
In the Reverse page:
-
In the Size Management section, un-select the Determine Datatype option, set the String Min Size to 1, and the Safety Margin to 0.
-
Click the Reverse button. The list of fields is reverse-engineered from the file.
-
-
Click Finish. The DemoFile.md editor is populated with the definition of the cost_centers file’s fields.
-
Press CTRL+S to save this editor.
From the Designer, you can view the content of the reverse-engineered datastores.
-
In the DemoFile.md editor, right-click the cost_centers node and select Actions > Consult Data.
-
A SQL editor named cost_centers.sql appears with a predefined SQL query.
-
Click the Execute Current SQL
button.
-
In the lower panel of the editor, the content of the file is displayed.
-
Close the cost_centers.sql editor.
-
Close the DemoFile.md editor.
Reverse-Engineering the XML Files
In this project, we will use two XML files. * One XML file contains the data exported from the CRM application that needs to be loaded into the MDM Hub. The XSD definition for this file is available. * One XML file will contain golden records that we will extract from the hub for consumption by the other applications of the information system. Note that this file does not exist yet. It will be created during the extraction based on a XSD definition that is provided.
Defining the CRM XML File
The file containing the CRM exported data is stored in the <semarchy_di>/samples/xml/
folder.
-
Click the
Create New Metadata button in the Project Explorer toolbar. The New Metadata Model wizard opens.
-
In the Choose the type of Metadata tree, select XML > XML Schema.
-
Click Next.
-
Make sure that the GettingStarted/Metadata folder is selected as the parent folder.
-
In MetaData Model Name, enter DemoXMLCRM and then click Finish. The DemoXMLCRM editor is created and the XML Wizard opens.
-
In the Default XML Path field, click the Browse button and browse for the
<semarchy_di>/samples/xml/crm_customer.xml
file. -
In the XSD Path field, click the Browse button and browse for the
<semarchy_di>/samples/xml/crm_customer.xsd
file. -
Click the Refresh button to refresh the Root Element selection box content.
-
In the Root Element field, select Customers.
-
Click Reverse. The reverse-engineering process retrieves the XML structure from the XSD.
-
Click Finish to close the wizard and return to the editor.
-
Press CTRL+S to save the editor.
-
Expand the nodes and review the XML structure that was reverse-engineered.
-
Close this editor.
Defining the XML Target
The file into which we will load golden records for consumption by third party applications does not exist yet, but its XSD is stored in the <semarchy_di>/samples/xml/
folder.
-
Click the
Create New Metadata button in the Project Explorer toolbar. The New Metadata Model wizard opens.
-
In the Choose the type of Metadata tree, select XML > XML Schema.
-
Click Next.
-
Make sure that the GettingStarted/Metadata folder is selected as the parent folder.
-
In MetaData Model Name, enter DemoXMLGolden and then click Finish. The DemoXMLGolden editor is created and the XML Wizard opens.
-
In the XSD Path field, click the Browse button and browse for the
<semarchy_di>/samples/xml/golden_customer.xsd
file. -
In the XML Path field, enter
<semarchy_di>/samples/xml/golden_customer.xml
. Note that the file does not exist and will be created by Convergence for Data Integration. It cannot be selected using the Browse button. -
Click the Refresh button.
-
In the Root Element field, select Customers root element for this schema.
-
Click Reverse. The reverse-engineering process retrieves the XML structure from the XSD.
-
Click Finish to close the wizard and return to the editor.
-
Press CTRL+S to save the editor.
-
Expand the nodes and review the XML structure that was reverse-engineered.
-
Close this editor.
Loading Data Into the Hub
Loading data using Semarchy Convergence for Data Integration uses Mappings. A mapping is a flow which extracts data from source datastores, transforms and loads this data into target datastores. A mapping uses declarative rules defining the relations (joins) between the source datastores, the filters to apply to the source data and the column mappings (transformations) between source and target columns, in the context of the data flow.
Loading CRM Data
The CRM customer and contact information is provided as an XML export. This information must be published in the MDM hub in two separate tables: SD_CUSTOMER
and SD_CONTACT
. To perform the publishing operation, we will create two mappings loading the CRM Customer and the CRM Contact data into these tables.
Loading CRM Customer Data
This mapping publishes data from the CRM XML file that you have reverse-engineered into the SD_CUSTOMER
table stored in the hub.
Creating the New Mapping
When creating a mapping, the first step is adding the source(s) and target(s).
-
Select the GettingStarted > Mapping folder in your project.
-
Click the
Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
-
Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
-
In File Name enter Publish CRM Customers to MDM and then click Finish. The Publish CRM Customers to MDM mapping editor opens.
-
In the Project Explorer, select the GettingStarted > Metadata >DemoXMLCRM node, right-click and then select Refresh.
-
In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node. Locate the SD_CUSTOMER table under this node.
-
Drag and drop the SD_CUSTOMER table from the project explorer to the mapping editor. This table will be the target of the mapping.
-
In the project explorer, expand the GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
-
Drag and drop the
Customers root entity from the project explorer into the mapping editor. This XML File will be the source of the mapping.
Defining the Column Mappings
After adding the source and target datastores, it is possible to define the column mappings. Column Mappings are expressions used to load the target datastore’s columns. We use three main types of column mappings:
-
Mappings using the source datastore’s columns,
-
Mappings using literals (constant values),
-
Mappings using variables that are set at execution time.
To define the columns mappings:
-
Expand the Customers source in the mapping editor in order to see all the elements and attributes of the XML structure.
-
In the mapping diagram, select the Customers > Customer > id attribute from the source XML structure (Customers). Drag id to B_SOURCEID in the SD_CUSTOMER target table.
-
A popup menu appears, asking whether you want to join the two tables or map data from the Customers XML file to the SD_CUSTOMER table. Select Map.
An orange link appears from Customers to SD_CUSTOMER. -
Select B_SOURCEID column of the SD_CUSTOMER target table, and then select the Expression Editor view in the views located below the editor panel. This view displays the code of the expression that will load the B_SOURCEID column. The code contains for the moment:
Customers.Customers/Customer/@id
, which is the path of the attribute in the source XML structure. -
Select the Properties view. You can see the properties of this target column mapping.The Execution Location indicates the location where this expression will be executed. In this case, the expression runs in the Source. An source icon
in the diagram in front of the target table’s column name provides the same indication.
-
Most elements in the XML structure were automatically mapped to target columns with the same name. You can review them:
-
CUSTOMER_NAME =
Customers.Customers/Customer/@customer_name
-
TOTAL_REVENUE =
Customers.Customers/Customer/@total_revenue
-
INPADDRESS =
Customers.Customers/Customer/address/@inpaddress
-
INPPOSTALCODE =
Customers.Customers/Customer/address/@inppostal_code
-
INPCITY =
Customers.Customers/Customer/address/@inp_city
-
INPCOUTRY =
Customers.Customers/Customer/address/@inpcountry
-
-
Drag the
account_manager_id
element from the XML file to theF_ACCOUNT_MANAGER
target column to create the following column mapping:-
F_ACCOUNT_MANAGER =
Customers.Customers/Customer/@account_manager_id
-
-
After this mapping phase, the diagram looks as shown below:
-
Select the B_BRANCHID column in the SD_CUSTOMER target table.
-
Go to the Expression Editor view and enter
0
(zero). This mapping is a constant numeric value and must be executed in the target execution location. -
In the Properties view, select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to
Target.
-
Repeat the three previous steps to map the following columns to constant values and set their execution location to Target. Make sure to use single quotes around these constant values.
-
B_CLASSNAME =
'Customer'
. -
B_PUBID =
'CRM'
.
-
-
In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
-
Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
-
Select the SEM_LOAD_ID element in the diagram, drag and drop it on the B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
. This expression returns the value of the SEM_LOAD_ID variable. This variable will be set in the process that will execute this mapping. The mapping appears shown below.
In this mapping:
-
F_ACCOUNT_MANAGER is a reference to another entity (The ID of the employee managing this customer account).
-
B_PUBID and B_SOURCEID identify the source publishing application (CRM) and the record in this source application (via its ID).
-
B_BRANCHID identifies the MDM data branch into which this data is published.
-
B_CLASSNAME identifies explicitly which entity type is loaded in the SD_CUSTOMER table.
-
B_LOADID identifies the data load transaction used for publishing data into the hub. Creating and submitting this transaction is explained later in this tutorial.
-
The columns prefixed with GEO are enriched by the Semarchy Convergence for MDM master data certification process and do not need to be loaded.
Configuring the Mapping
In Semarchy Convergence for Data Integration, the integration behavior is configured by parameterizing the process template that was automatically added for the technologies (XML and Oracle) involved in the mapping.
-
In the mapping editor, click the INT icon
in the upper-right corner SD_CUSTOMER box.
This graphical artifact represents the process template used for writing (integrating) data into the SD_CUSTOMER table. Similarly, the LOAD icon represents the load template used for extracting (loading) data from the XML file. -
In the Properties view, in the Standard tab, you can configure how data integration to the SD_CUSTOMER table takes place.
Select the Action Process INTEGRATION Semarchy Convergence for MDM template and leave the template options to their default values.
-
This templates mandates a specific update key for the mapping. This update key will allow comparing the incoming data flow with the hub content and skip unchanged records. To set the update key:
-
In the diagram, select the B_LOADID column from the target SD_CUSTOMER table.
-
Right-click and un-select the Use as Key option in the context menu.
-
Hold the CTRL key pressed and select the B_PUBID, B_SOURCEID, B_BRANCHID columns.
-
Right-click and select the Use as Key option in the context menu. The SD_CUSTOMER target table appears with only these three columns marked with a key indicator, as shown below:
-
-
Press CTRL+S to save this mapping.
-
Close this editor.
Loading CRM Contact Data
This mapping publishes data from the CRM XML file that you reverse-engineered into the SD_CONTACT
table stored in the hub.
-
Select the GettingStarted > Mapping folder in your project.
-
Click the
Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
-
Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
-
In File Name enter Publish CRM Contacts to MDM and then click Finish. The Publish CRM Contacts to MDM mapping editor opens.
-
In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
-
Drag and drop the SD_CONTACT table from the project explorer to the mapping editor.
-
In the project explorer, expand the GettingStarted > Metadata >DemoXMLCRM > crm_customer.xml node.
-
Drag and drop the
Customers root entity from the project explorer into the mapping editor.
-
Expand the Customers source in the mapping editor in order to see all the elements and attributes of the XML structure.
-
Select the Customers > Customer > contacts > id attribute from the source XML structure (Customers). Drag id to the B_SOURCEID column in the SD_CONTACT target table.
-
Select Map in the popup menu.
-
Review the mappings made automatically for the following target columns: FIRST_NAME, LAST_NAME, GENDER, JOB_TITLE, IS_INFLUENCER, EMAIL_ADDRESS, PHONE1, PHONE2.
-
Use drag and drop to make the following mapping:
-
FS_CUSTOMER =
Customers.Customers/Customer/@id
-
-
Select the B_BRANCHID column in the SD_CONTACT target table.
-
Go to the Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location. -
In the Properties view, select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to
Target.
-
Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to Target:
-
B_CLASSNAME =
'Contact'
. -
B_PUBID =
'CRM'
. -
FP_CUSTOMER =
'CRM'
.
-
-
In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
-
Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
-
Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
. -
In the mapping editor, click the INT icon
in the upper-right corner SD_CONTACT box.
-
In the Properties view, in the Standard tab, select the Action Process INTEGRATION Semarchy Convergence for MDM template.
-
In the diagram, select the B_LOADID column from the target SD_CONTACT table, right-click and un-select the Use as Key option in the context menu.
-
Select the B_BRANCHID column. Right-click and select the Use as Key option in the context menu. The SD_CONTACT target table appears with the B_PUBID, B_SOURCEID and B_BRANCHID columns marked with a key indicator.
-
The Customers XML file contains customers with no related contacts. To avoid generating empty contacts for these, add a filter to the Customers source:
-
In the mapping editor, select the Customers > Customer > contacts > id attribute in the Customers source table. Keep the mouse button pressed, and drop this column on the mapping editor background.
-
In the Expression Editor view, the code shows:
Customers.Customers/Customer/contacts/@id=1
. -
Replace this code with
Customers.Customers/Customer/contacts/@id IS NOT NULL
-
-
Review the mapping. It looks as shown below.
-
Press CTRL+S to save this mapping.
-
Close this editor.
FS_CUSTOMER and FP_CUSTOMER identify a reference to a customer record. This reference consists of the source publisher code (mapped to the Foreign Publisher FP_CUSTOMER column) and the ID of the record in this source system (mapped to the Foreign Source ID FS_CUSTOMER column). The referenced customer for each contact is a customer from the CRM application with its ID appearing in the source Customer > id node.
|
Loading HR Data
Employee is read directly from the HR application. This information is stored in the SRC_EMPLOYEE
table stored in the SEMARCHY_DEMO_SOURCE
schema, and must be published into the SD_EMPLOYEE
table stored in the hub.
-
Select the GettingStarted > Mapping folder in your project.
-
Click the
Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
-
Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
-
In File Name enter Publish HR Employees to MDM and then click Finish. The Publish HR Employees to MDM mapping editor opens.
-
In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
-
Drag and drop the SD_EMPLOYEE table from the project explorer to the mapping editor.
-
In the project explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_SOURCE node.
-
Drag and drop the SRC_EMPLOYEE table from the project explorer to the mapping editor.
-
Drag and drop the EMPLOYEE_NUMBER column from the SRC_EMPLOYEE table to the EMPLOYEE_NUMBER column in the SD_EMPLOYEE table and then select Map in the popup menu.
This operation creates a mapping between SRC_EMPLOYEE.EMPLOYEE_NUMBER and SD_EMPLOYEE.EMPLOYEE_NUMBER. Most fields are automatically mapped as their names match exactly. -
Select the B_BRANCHID column in the SD_EMPLOYEE target table.
-
Go to the Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location. -
Select the B_BRANCHID column in the SD_EMPLOYEE target table, right click and select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to
Target.
-
Repeat the three previous to map the following columns to the constant values provided below and set their execution location to Target:
-
B_CLASSNAME =
'Employee'
. -
B_PUBID =
'HR'
.
-
-
In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
-
Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
-
Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
. -
In the mapping editor, click the INT icon
in the upper-right corner SD_EMPLOYEE box.
-
In the Properties view, in the Standard tab, select the Action Process INTEGRATION Semarchy Convergence for MDM template.
-
In the diagram, select the B_LOADID column from the target SD_EMPLOYEE table, right-click and un-select the Use as Key option in the context menu.
-
Select the B_BRANCHID target column. Right-click and select the Use as Key option in the context menu. The SD_EMPLOYEE target table appears with the B_PUBID, B_BRANCHID and EMPLOYEE_NUMBER columns marked with a key indicator.
-
Select the PICTURE target column. In the Properties view, in the Tags field, enter
IGNORE
and then click the "+" button. TheIGNORE
tag is added to this column. Tagging columns is used to provide directives to the templates. In this case as this column is a picture (a binary object), we want to ignore it for certain aspects of the processing.
-
Review the mapping. It looks as shown below.
-
Press CTRL+S to save this mapping.
-
Close this editor.
Loading Finance Data
Cost Center information is provided from the finance application via the cost_centers.txt
file that we reverse-engineered.
This mapping publishes data from the cost_centers.txt
file into the SD_COST_CENTER
table stored in the hub.
-
Select the GettingStarted > Mapping folder in your project.
-
Click the
Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
-
Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
-
In File Name enter Publish Finance Cost Centers to MDM and then click Finish. The Publish Finance Cost Centers to MDM mapping editor opens.
-
In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
-
Drag and drop the SD_COST_CENTER table from the project explorer to the mapping editor.
-
In the project explorer, expand the GettingStarted > Metadata > DemoFile > Server > Demo Files Folder node. If this node does not appear in the view, press the F5 key to refresh the project explorer view.
-
Drag and drop the cost_centers file from the project explorer to the mapping editor.
-
In the mapping diagram, select the PARENT_COST_CENTER field from the source file (cost_centers).
-
Drag PARENT_COST_CENTER to the F_PCOST_CENTER column in the SD_COST_CENTER target table and then select Map in the popup dialog.
The following columns are automatically mapped: COST_CENTER_ID, COST_CENTER_NAME, DIVISION_CODE. -
Select the B_BRANCHID column in the SD_COST_CENTER target table.
-
Go to the Expression Editor view, and enter
0
(zero). This mapping is a constant numeric value. In our mapping, it must be executed in the target execution location. -
Select the B_BRANCHID column in the SD_COST_CENTER target table, right click and select Execution Location > Target. The execution location icon on the B_BRANCHID is changed to
Target.
-
Repeat the three previous steps to map the following columns to the constant values provided below and set their execution location to Target:
-
B_CLASSNAME =
'CostCenter'
. -
B_PUBID =
'FIN'
.
-
-
In the Project Explorer, expand the global > templates.semarchy > Semarchy Convergence for MDM Integration > Global Variables.variable > Semarchy Variables node.
-
Select the SEM_LOAD_ID variable from this node. Drag and drop this variable in the mapping diagram. The variable is added as a variable element in the diagram.
-
Select the SEM_LOAD_ID element in the diagram, drag and drop it in the B_LOADID column in the target table. This column’s mapping is set to:
%{SEM_LOAD_ID}%
. -
In the mapping editor, click the INT icon
in the upper-right corner SD_COST_CENTER box.
-
In the Properties view, in the Standard tab, select the Action Process INTEGRATION Semarchy Convergence for MDM template.
-
In the diagram, select the B_LOADID column from the target SD_COST_CENTER table, right-click and un-select the Use as Key option in the context menu.
-
Select the B_BRANCHID target column. Right-click and select the Use as Key option in the context menu. The SD_COST_CENTER target table appears with the B_PUBID, B_BRANCHID and COST_CENTER_ID columns marked with a key indicator.
-
Review the mapping. It looks as shown below.
-
Press CTRL+S to save this mapping.
-
Close this editor.
Assembling the Publishing Process
Data is published in the hub using a transaction named an external load. We will implement the external load in Semarchy Convergence for Data Integration as a Process. This process will create the external load, run the mappings designed in the previous chapter and then submit this external load to the MDM hub. The process will also manage issues and automatically cancel the external load in the event of an error.
Creating the Publishing Process
-
Select the GettingStarted > Process folder in your project.
-
Click the
Create New Process button in the Project Explorer toolbar. The New Process Diagram wizard opens.
-
Make sure that the GettingStarted/Process folder is selected as the parent folder.
-
In File Name enter Publish Data to MDM and then click Finish. The Publish Data to MDM process editor opens.
Initializing the External Load
The first step of the external load process is the Initialization. This step requests a Load ID from the Convergence for MDM Repository using a PL/SQL function call and stores it in the SEM_LOAD_ID
variable used in the mappings.
-
In the Project Explorer, expand the global > template.semarchy > Semarchy Convergence for MDM Integration node.
-
Drag and drop the PROCESS Get LoadID process from the project explorer to the process editor.
This process must be configured with the connection information to the Repository and with information about the data location. -
Expand the GettingStarted > Metadata > OracleServer node.
-
From the Project Explorer, drag and drop the OracleServer data server in the PROCESS Get LoadID process reference in the process editor. This metadata link to the data server appears in the process reference as shown below.
-
Select the metadata link named OracleServer, then click on it again to make it editable.
-
Rename it to
SOURCE
and then press ENTER.
The PROCESS Get LoadID process assumes that the repository data server is provided as a metadata link namedSOURCE
. By adding this metadata link to the data server containing the repository and renaming it accordingly, you allow this process to use the data server configuration previously defined. -
Select the PROCESS Get LoadID process reference in the process diagram.
-
In the Properties view, select the Standard tab.
-
In the Name, enter Initialize Load. The PROCESS Get LoadID step in the diagram is renamed to Initialize Load.
-
Click the Data Location Name link to unlock the field and then enter
CustomerAndFinancialMDM
. This parameter of the process is the name of the data location hosting the data edition into which the publishing process takes place. -
Click the User Name link to unlock the field and then enter
Convergence for DI Publisher
. This name is used to initialize the load and manage it afterwards. -
Click the Sem Repo Schema link to unlock the field and then enter
SEMARCHY_DEMO_REPOSITORY
. This value identifies the repository schema in the data server referred to as SOURCE.
-
-
Click the edit parameters icon to expand the content of the Initialize Load step and view all its parameters.
Loading Data
The second step of the external load process is the data load. For this step, the mappings created previously are directly re-used.
-
In the Publish Data to MDM process editor, in the Palette that appears on the right-hand side, select the Process tool in the Component accordion.
-
Click in the Publish Data to MDM process editor background. A new sub-process appears, and you can edit its name.
-
Enter Load All Data and then press ENTER.
-
Double-click the Load All Data sub-process. A new blank editor named Load All Data opens to edit this sub-process.
-
In the Project Explorer Expand the GettingStarted > Mapping node.
-
Press the SHIFT key and select all six mappings in this folder. Drag and drop the selection into the Load All Data sub-process editor.
The mappings are added to the process. -
Press CTRL+S to save the process.
-
Use the editor breadcrumb to return to the Publish Data to MDM parent process.
Submitting the Load
After loading the publisher data into the landing tables using the mappings, the external load must be submitted to Semarchy Convergence for MDM. This step submits the Load using a PL/SQL function call to the Repository.
-
From the Project Explorer, drag and drop the global > template.semarchy > Semarchy Convergence for MDM Integration > PROCESS Submit Load process into the process editor.
-
From the Project Explorer, drag and drop the OracleServer data server in the Semarchy Submit Load process reference in the process editor. This metadata link to the data server appears in the process reference.
-
Select the metadata link named OracleServer, then click on it again to make it editable.
-
Rename it to
SOURCE
and then press ENTER. -
Select the PROCESS Submit Load process reference in the process diagram.
-
In the Properties view, select the Standard tab.
-
In the Name, enter Submit Load. The step in the diagram is renamed to Submit Load.
-
Click the Integration Job Name link to unlock the field and then enter
INTEGRATE_DATA
. This parameter of the process is the name of the Semarchy Convergence for MDM integration job that must be started when the external load is submitted. -
Click the User Name link to unlock the field and then enter
Convergence for DI Publisher
. -
Click the Sem Repo Schema link to unlock the field and then enter
SEMARCHY_DEMO_REPOSITORY
.
-
-
Click the edit parameters icon to expand the content of the Submit Load step and view all its parameters.
-
Press CTRL+S to save the process.
Canceling the Load
In the event of an error while loading the data, the external load should be cancelled. Canceling the load is also performed using a PL/SQL function call to the Repository.
-
From the Project Explorer, drag and drop the global > template.semarchy > Semarchy Convergence for MDM Integration > PROCESS Cancel Load process into the process editor.
-
From the Project Explorer, drag and drop the OracleServer data server in the Semarchy Cancel Load process reference in the process editor. This metadata link to the data server appears in the process reference.
-
Select the metadata link named OracleServer, then click on it again to make it editable.
-
Rename it to
SOURCE
and then press ENTER. -
Select the PROCESS Cancel Load process reference in the process diagram.
-
In the Properties view, select the Standard tab.
-
In the Name, enter Cancel Load. The step in the diagram is renamed to Cancel Load.
-
Click the Cleanup Data link to unlock the field and then select this option. This parameter forces this process to cleanup the data loaded in the Load All Data step.
-
Click the User Name link to unlock the field and then enter
Convergence for DI Publisher
. -
Click the Sem Repo Schema link to unlock the field and then enter
SEMARCHY_DEMO_REPOSITORY
. -
Click the MDM Schema link to unlock the field and then enter
SEMARCHY_DEMO_MDM
. This value identifies the schema in the data server containing the MDM Hub.
-
-
Click the edit parameters icon to expand the content of the Cancel Load step and view all its parameters.
-
Press CTRL+S to save the process.
Designing the Publishing Process Flow
All the steps of the process are now created. It is time to link them together.
Creating the Publishing Process
-
In the Publish Data to MDM process editor, in the Palette that appears on the right hand, select the Successful Link tool in the Link accordion.
-
Click and drag from Initialize Load to Load All Data. A green arrow appears between these two steps.
-
Repeat the two previous steps to create a link from Load All Data to Submit Load.
-
In the Palette, select the Unsuccessful Link tool.
-
Create a link from Load All Data to Cancel Load. This link will be followed if the Load All Data step fails, and appears in red.
-
Press CTRL+S to save the Publish Data to MDM process.
Editing the Load All Data Sub-Process
-
Double-click the Load All Data sub-process to open it.
-
Use the Successful Link tool in the palette to link the steps of this sub-process. Note that these mappings target tables with no referential constraints implemented in the database. They can be executed in any order in the sub-process. The sub-process appears as shown below:
-
Press CTRL+S to save the Publish Data to MDM process.
Running the Publishing Process
Starting the Runtime Engine
Processes are executed by the Runtime Engine component. This component must be started before running the process.
-
Select the Runtime view.
-
In the Runtime view, click the Environment button. The Environment dialog opens.
-
Click the Start Local Runtime button. On Windows, a Runtime Engine is started on your local machine in a command line window. On Linux, it starts as a background process.
-
Click OK to close the Environment dialog.
-
Select the Connect option. This option opens a communication channel between the Convergence for DI Designer and the Runtime Engine.
-
Select the Refresh Diagram option. This option refreshes the process diagram automatically while the process runs in the Runtime Engine and reflects the execution in real time.
The runtime dialog indicates that the runtime is Connected.
The runtime engine starts on the port 42000. Make sure that no other process uses this same port. |
For this tutorial, a local runtime is used and processes are started manually. In production environments, the runtime engine is deployed on a server and is able to serve execution requests remotely or run processes on a schedule. |
Testing the Process
Now that the Runtime Engine is running, we can test the process.
-
Select the background of the Publish Data to MDM process diagram.
-
Right-click and select Execute.
-
The process is first built and then starts. You can see the steps running. When a steps runs, it appear in green. A completed step appears in blue.
When the execution completes, the diagram looks as shown below.
Drilling Down Into the Session
The session was executed successfully. It is possible to drill down into the details of this session.
-
In the Sessions view, select the latest session.
-
Select the Session Details view. This view displays the session’s information errors, warnings and the variables used in this session. The External Load ID (
SEM_LOAD_ID
variable) is displayed in this view.
-
In the Publish Data to MDM process diagram, click the Load All Data sub-process step.
-
Select the Statistics view. This view shows the statistics of the selected process step. It shows the number of rows inserted, the number of XML attributes and elements processed.
-
Double-click the Load All Data sub-process, the sub-process opens and shows the executed mappings.
-
Click the Publish Finance Cost Centers to MDM mapping. The statistics show the number of lines inserted by the mapping:
SUM(SQL_STAT_INSERT)=27
. -
Close the Load All Data sub-process editor.
Testing the Error Behavior
To test the error behavior, you can rename one of the source files used in the mappings.
-
Using the File Explorer, open the
<semarchy_di>/samples/files/
folder and rename thecost_centers.txt
file tocost_centers.txt.bad
file. -
Return to the Convergence for DI Designer.
-
Select the background of the Publish Data to MDM process diagram.
-
Right-click and select Execute.
-
The process is first built and then starts. You can see the steps running. As one of the file is missing, the Load All Data sub-process fails and the Cancel Load step is executed.
When the execution completes, the diagram looks as shown below.
-
Select the latest session in the Sessions view.
-
Select the Session Details view, and in this view select the Errors tab. Double-click on the first error in the list. The error opens in a dialog.
-
In the Windows Explorer, rename the
cost_centers.txt.bad
file back tocost_centers.txt
.
Reviewing the External Load
The publishing process has submitted to the Semarchy Convergence for MDM hub this external load. It is possible to review this external load in Semarchy Convergence for MDM.
-
Open your web browser and connect to the following URL:
http://localhost:8088/semarchy/
(update the port number if you changed it during the installation process) -
In the login prompt, enter the following:
-
User: semadmin
-
Password: semadmin
-
-
The Convergence for MDM Workbench opens.
-
In the Overview page, Select the Manage Data Locations link in the Administration section.
-
In the Data Editions view, expand the CustomerAndFinancialMDM > Data Branches > RootBranch > Latest Loads node. The external loads that you have submitted appear under this node. The first one appears with a
Finished status, the second one (the error test) in a
Cancelled status.
You can review the golden data consolidated from the data published by your process by Navigating the MDM Hub. See the "Navigating the MDM Hub" section in the "Semarchy Convergence for MDM Getting Started Guide" for more information about navigating the hub. |
Consuming Data from the Hub
Consuming data from the hub is performed through mappings using source tables from the hub. These tables contain the golden and master data. In this tutorial, the consumption process targets an XML file which has the following structure:
This structure contains information about Customers and Employee (The customer’s account manager), and must be loaded using the corresponding entities in the MDM Hub. The golden data for these entities is stored in the GD_CUSTOMER
and GD_EMPLOYEE
tables from the SEMARCHY_DEMO_MDM
schema that was previously reverse-engineered.
Creating the New Mapping
In this mapping, we are extracting data from two tables in the MDM Hub (GD_EMPLOYEE
and GD_CUSTOMER
) and target the DemoXMLGolden XML file.
-
Select the GettingStarted > Mapping folder in your project.
-
Click the
Create New Mapping button in the Project Explorer toolbar. The New Map Diagram wizard opens.
-
Make sure that the GettingStarted/Mapping folder is selected as the parent folder.
-
In File Name enter Consume Golden Customers and then click Finish. The Consume Golden Customers mapping editor opens.
-
In the project explorer, expand the GettingStarted > Metadata >DemoXMLGolden > golden_customer.xsd node.
-
From the project explorer, drag and drop the
Customers root entity into the mapping editor.
-
In the Project Explorer, expand the GettingStarted > Metadata > OracleServer > OracleServer > SEMARCHY_DEMO_MDM node.
-
From the Project Explorer, drag and drop the GD_CUSTOMER table into the mapping editor.
-
Repeat the previous step to add the GD_EMPLOYEE table to the mapping editor.
Filtering Golden Data
The GD tables store golden data for all data branches and editions. Therefore, a GD table needs to be filtered in order to retrieve data:
-
for given entity classes (this must be done if the MDM model uses entity inheritance),
-
for a given data branch,
-
for a given data edition.
To filter the golden data:
-
In the ConsumeGoldenCustomer editor, select the B_BRANCHID column in the GD_CUSTOMER source table.
-
Keep the mouse pressed and drag this column on the editor’s background.
-
Release the mouse button. A filter appears, attached to the GD_CUSTOMER table.
-
Select the Expression Editor view. The code shows:
GD_CUSTOMER.B_BRANCHID=1
. -
Replace this code with the following expression:
GD_CUSTOMER.B_BRANCHID = 0 and GD_CUSTOMER.B_CLASSNAME = 'Customer' and GD_CUSTOMER.B_FROMEDITION <= 0 and (GD_CUSTOMER.B_TOEDITION is null or GD_CUSTOMER.B_TOEDITION > 0)
This filter mandates that records retrieved:
-
exist in the branch with ID 0:
GD_CUSTOMER.B_BRANCHID = 0
-
represent instances of the Customer entity (or class): +
GD_CUSTOMER.B_CLASSNAME = 'Customer'
-
exist in data edition 0:
GD_CUSTOMER.B_FROMEDITION <= 0 and (GD_CUSTOMER.B_TOEDITION is null or GD_CUSTOMER.B_TOEDITION > 0)
A similar filter must be applied to the GD_EMPLOYEE table.
-
In the ConsumeGoldenEmployee editor, select the B_BRANCHID column in the GD_EMPLOYEE source table.
-
Keep the mouse pressed and drag this column on the editor’s background.
-
Release the mouse button. A filter appears, attached to the GD_EMPLOYEE table.
-
Select the Expression Editor view. The code shows:
GD_EMPLOYEE.B_BRANCHID=1
. -
Replace this code with the following expression:
GD_EMPLOYEE.B_BRANCHID = 0 and GD_EMPLOYEE.B_CLASSNAME = 'Employee' and GD_EMPLOYEE.B_FROMEDITION <= 0 and (GD_EMPLOYEE.B_TOEDITION is null or GD_EMPLOYEE.B_TOEDITION > 0)
Joining the Sources
It is now necessary to join the two sources to establish the relation between the GD_CUSTOMER and GD_EMPLOYEE tables.
-
In the ConsumeGoldenCustomer editor, select the F_ACCOUNT_MANAGER column in the GD_CUSTOMER source table.
-
Keep the mouse pressed and drag this column onto the EMPLOYEE_NUMBER column in the GD_EMPLOYEE table
-
Release the mouse button. A popup menu appears. Select Join in this menu.
A join appears between the two tables. -
Select the Expression Editor view. The code shows:
GD_CUSTOMER.F_ACCOUNT_MANAGER=GD_EMPLOYEE.EMPLOYEE_NUMBER
This hub mandates that a customer has at least one account manager. This join uses a default join type, that is an INNER JOIN. |
Defining the Column Mappings
-
Expand the Customers target in the mapping editor in order to see all the elements and attributes of the XML structure.
-
In the mapping diagram, select the CUSTOMER_ID column from the GD_CUSTOMER table. Drag CUSTOMER_ID to the Customer> Customer target element in the XML structure.
-
Select Map in the popup menu. Notice that the
GD_CUSTOMER.CUSTOMER_NAME
is automatically mapped. -
Repeat the same operation to perform the following mappings:
-
Customer > Customer > id =
GD_CUSTOMER.CUSTOMER_ID
-
Customer > Customer > account_manager > first_name =
GD_EMPLOYEE.FIRST_NAME
-
-
Notice that the
GD_EMPLOYEE.LAST_NAME
,GD_EMPLOYEE.PHONE_NUMBER
andGD_EMPLOYEE.EMAIL_ADDRESS
are automatically mapped. -
Select the Customer> Customer > address target element, then select the Expression Editor view. Enter the following code:
GD_CUSTOMER.INPADDRESS || ', ' || GD_CUSTOMER.INPPOSTAL_CODE || ' - ' || GD_CUSTOMER.INPCITY || ', ' || GD_CUSTOMER.INPCOUNTRY
-
Press CTRL+S to save this mapping.
The mapping appears as shown below.
To iterate and create multiple elements in the target XML file, it is necessary to provide a key indicating how to break the records into sub-elements. In this mapping, the key is provided in the Customer> Customer target element, which appears with a key indicator. One <Customer> sub-element is created for each CUSTOMER_ID value returned from the GD_CUSTOMER table, that is for each customer golden record.
|
Running the Mapping
-
Select the background of the Consume Golden Customers mapping editor, right-click and select Execute
-
The mapping execution starts.
-
Select the Sessions view and check that the session is executed successfully.
-
Open with a text editor the
<semarchy_di>/samples/xml/golden_customer.xml
file created by the mapping. It displays the golden customers records and their attached golden account manager records.
The file content is as shown below.
<?xml version="1.0" encoding="windows-1252"?>
<CORP:Customers xmlns:CORP="http://www.semarchy.com/samples"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Customer id="1">
<customer_name>GADGETROM</customer_name>
<address>711 KAPIOLANI BOULEVARD, - HONOLULU, USA</address>
<account_manager>
<first_name>Matthew</first_name>
<last_name>WEISS</last_name>
<phone_number>+1.650.123.1234</phone_number>
<email_address>mweiss@fakecompany.com</email_address>
</account_manager>
</Customer>
<Customer id="2">
<customer_name>BURLEIGH AND STRONGINTHEARM</customer_name>
Conclusion
Summary
Congratulation! You have now completed your first project with Semarchy Convergence for Data Integration!
In this tutorial, you learned how to:
-
Install Convergence for MDM and Convergence for Data Integration,
-
Set up a data integration project,
-
Reverse-engineer database, files and XML data models,
-
Publish data in an MDM hub using mappings and a publishing process,
-
Consume data from an MDM hub.
Going Further with Convergence for MDM
You have learned how to use Convergence for Data Integration in conjunction with Convergence for MDM for a typical MDM project. Convergence for Data Integration also allows you to leverage the master data certified with Convergence for MDM in a variety of data integration projects, including data warehousing, data migration, application integration and B2B data exchange.
Go further with Semarchy Convergence for Data Integration, you can read with the Semarchy Convergence for Data Integration Developer’s Guide. To have a better understanding of the integration (publishing and consuming) mechanisms involved in Convergence for MDM, read the Semarchy Convergence for MDM Integration Guide.
Learn More
The Semarchy Convergence for MDM Documentation Library, which includes development, administration and installation guides is available online at the following URL: http://www.semarchy.com/documentation-mdm |
In addition to the product manuals, Semarchy provides other resources
including whitepapers, datasheets, and a complete set of videos
demonstrating the product features.
The resources are available on the Semarchy
Website.
© Copyright Semarchy 2012-2015. All Rights Reserved.