Reverse-Engineering the Metadata |
Previous
|
|
Next
|
Setting Up the Project |
|
Loading Data Into the Hub |
Semarchy Convergence for Data Integration moves and transforms data between datastores. These datastores are database tables, flat files, XML structures, web services, etc. The connection information to these datastores must be declared in Semarchy Convergence for Data Integration, and their structure (the metadata) can be automatically reverse-engineered. The reverse-engineering process creates a definition of the datastores' structures in Convergence for DI.
Data flows are created at design-time using this metadata. Run-time execution will use the connection information to perform the extract-load and transform operations between the source and target systems.
Reverse-Engineering the Databases
The MDM Hub is stored in an Oracle schema named SEMARCHY_DEMO_MDM.
In the same database, the Source HR and Marketing data is stored in a schema named SEMARCHY_DEMO_SOURCE.
Defining the Oracle Data Server and MDM Hub Schema
The Oracle Data Server is already configured in the
GettingStarted > Metadata > OracleServer metadata file.
The three schemas used in this tutorial are also already defined:
- SEMARCHY_STAGING: The staging area
- SEMARCHY_DEMO_MDM: The MDM Hub
- SEMARCHY_DEMO_SOURCE: The source HR and Marketing data.
In this last schema definition, one of the tables required for this tutorial, the
SRC_EMPLOYEE
table, which contains HR data, was not reverse-engineered. We will check the configuration of the Oracle Instance and reverse-engineer the missing table.
- In the
Project Explorer, expand the
GettingStarted > Metadata node.
- Double-click
OracleServer. The Metadata editor opens.
- In this editor, select the
OracleServer node, right-click and select
Actions > Edit Connection.
- In the
Duplicate Connection Profile dialog, check that the connection information to the Oracle instance is correct. If not modify this connection information:
-
Driver: Select the
Oracle Thin Driver.
-
URL:
jdbc:oracle:thin:@localhost:1521:XE
. Change this URL if your Oracle instance runs on a different host/port or has a different instance name.
-
User:
SEMARCHY_STAGING.
-
Password:
SEMARCHY_STAGING.
- Click the
Connect button to validate this connection.
- Click
OK to close this dialog.
- In the editor, select the
OracleServer > SEMARCHY_DEMO_SOURCE node.
- Expand the node
SEMARCHY_DEMO_SOURCE. SRC_CONTACT and SRC_CUSTOMER have been defined for you. You will define SRC_EMPLOYEE.
- Right-click and select
Actions > Launch DataSchema Wizard. The
Schema Wizard opens and shows the properties of this schema.
- Click
Next. The
Reverse Datastore page opens.
- Click the
Refresh button. A filtered list of datastore appears.
- Select the SRC_EMPLOYEE table.
- Click
Finish. The reverse-engineering process starts and retrieves the structure of the SRC_EMPLOYEE table.
- When the reverse-engineering process is finished, press
CTRL+S to save the editor.
- In the
OracleServer.md editor, expand the
OracleServer > SEMARCHY_DEMO_SOURCE node. The list of reverse-engineered tables appears as shown below.
- Close this editor.
Reverse-Engineering the Cost Centers Files
The file containing the Cost Centers data provided by the Finance team is a delimited file stored in the
<semarchy_di>/samples/files/
folder. We will declare and reverse-engineer this file.
- Click the
Create New Metadata button in the
Project Explorer toolbar. The
New Metadata Model wizard opens.
- In the
Choose the type of Metadata tree, select
file > File Server.
- Click
Next.
- Make sure that the
GettingStarted/Metadata folder is selected as the parent folder.
- In
MetaData Model Name enter
DemoFile and then click
Finish. The
DemoFile.md editor opens and the
File Wizard open automatically.
- In the
Directory page, enter
Demo Files Folder in the
Name field.
- Click the
Browse button and select the
<semarchy_di>/samples/files/
.
- Click
Next. The
Set File Properties page opens.
- In this page:
- Use the
Browse button to select the
cost_centers.txt
file.
- Click the
Refresh button to see the content of the file.
- Make sure that the
Type is set to
DELIMITED
- Make sure that the
Line Separator is set to
\r\n
- Replace the
Field Separator value with a semi-colon
;
- In the
String Delimiter, enter a double-quote character
"
.
- In the
Header Line Position, enter
1
.
- Click
Next.
- In the
Reverse page:
- In the
Size Management section, un-select the
Determine Datatype option, set the
String Min Size to 1, and the
Safety Margin to 0.
- Click the
Reverse button. The list of fields is reverse-engineered from the file.
- Click
Finish. The
DemoFile.md editor is populated with the definition of the cost_centers file’s fields.
- Press
CTRL+S to save this editor.
From the DI Designer, you can view the content of the reverse-engineered datastores.
- In the
DemoFile.md editor, right-click the
cost_centers node and select
Actions > Consult Data.
- A SQL editor named
cost_centers.sql appears with a predefined SQL query.
- Click the
Execute Current SQL
button.
- In the lower panel of the editor, the content of the file is displayed.
- Close the
cost_centers.sql editor.
- Close the
DemoFile.md editor.
Reverse-Engineering the XML Files
In this project, we will use two XML files.
- One XML file contains the data exported from the CRM application that needs to be loaded into the MDM Hub. The XSD definition for this file is available.
- One XML file will contain golden records that we will extract from the hub for consumption by the other applications of the information system. Note that this file does not exist yet. It will be created during the extraction based on a XSD definition that is provided.
Defining the CRM XML File
The file containing the CRM exported data is stored in the
<semarchy_di>/samples/xml/
folder.
- Click the
Create New Metadata button in the
Project Explorer toolbar. The
New Metadata Model wizard opens.
- In the
Choose the type of Metadata tree, select
XML > XML Schema.
- Click
Next.
- Make sure that the
GettingStarted/Metadata folder is selected as the parent folder.
- In
MetaData Model Name, enter
DemoXMLCRM and then click
Finish. The
DemoXMLCRM editor is created and the
XML Wizard opens.
- In the
Default XML Path field, click the
Browse button and browse for the
<semarchy_di>/samples/xml/crm_customer.xml
file.
- In the
XSD Path field, click the
Browse button and browse for the
<semarchy_di>/samples/xml/crm_customer.xsd
file.
- Click the
Refresh button to refresh the
Root Element selection box content.
- In the
Root Element field, select
Customers.
- Click
Reverse. The reverse-engineering process retrieves the XML structure from the XSD.
- Click
Finish to close the wizard and return to the editor.
- Press
CTRL+S to save the editor.
- Expand the nodes and review the XML structure that was reverse-engineered.
- Close this editor.
Defining the XML Target
The file into which we will load golden records for consumption by third party applications does not exist yet, but its XSD is stored in the
<semarchy_di>/samples/xml/
folder.
- Click the
Create New Metadata button in the
Project Explorer toolbar. The
New Metadata Model wizard opens.
- In the
Choose the type of Metadata tree, select
XML > XML Schema.
- Click
Next.
- Make sure that the
GettingStarted/Metadata folder is selected as the parent folder.
- In
MetaData Model Name, enter
DemoXMLGolden and then click
Finish. The
DemoXMLGolden editor is created and the
XML Wizard opens.
- In the
XSD Path field, click the
Browse button and browse for the
<semarchy_di>/samples/xml/golden_customer.xsd
file.
- In the
XML Path field, enter
<semarchy_di>/samples/xml/golden_customer.xml
. Note that the file does not exist and will be created by Semarchy Convergence for DI. It cannot be selected using the
Browse button.
- Click the
Refresh button.
- In the
Root Element field, select
Customers root element for this schema.
- Click
Reverse. The reverse-engineering process retrieves the XML structure from the XSD.
- Click
Finish to close the wizard and return to the editor.
- Press
CTRL+S to save the editor.
- Expand the nodes and review the XML structure that was reverse-engineered.
- Close this editor.