Architecture

This section details the various components of the Semarchy Convergence for MDM architecture and their interactions.

Semarchy Convergence for MDM Application

The Semarchy Convergence for MDM application is a Java EE application deployed and running in a supported application server.

This application provides several access methods:

The Convergence for MDM application stores its information in a repository. One application is always attached to a single repository, and connects this repository using a JDBC datasource named SEMARCHY_REPOSITORY. This datasource is configured in the application server.

The Convergence for MDM application is used at design-time to design and version control data models. At run-time, it is used to deploy the models in the form of MDM hubs and to manage the integration process that certifies golden data from source applications' data in these hubs. It also exposes the Web Services used to access the golden data from the hubs.

Integration Process

Semarchy Convergence for MDM certifies golden data from data pushed by source applications (the Publishers), through an integration process.

The integration process is triggered as explained below:

  1. Publishers submit to the MDM hub source data as an External Load. Publishing the source data is a three steps operation:
    1. The publisher initializes (via a SQL command or a web service call) an External Load, and receives unique Load ID from the platform.
    2. The publisher inserts source data in various landing tables of the MDM hub in the context of this external load.
    3. The publisher submits the external load (via a SQL command or a web service call), which is converted to an Integration Batch, identified by a unique Batch ID.
  2. The Integration Batch Poller polls integration batches at regular intervals. When a new batch is detected, the integration batch poller requests the Execution Engine to start the Integration Job associated with this batch. This Integration Job is created from a Job Definition. Data in the batch is passed through the different steps of the integration process, and golden data is certified from the source data.

Note: The various steps of the integration process are detailed in the "Integration Process Design" chapter of the "Semarchy Convergence for MDM Developer’s Guide".

The integration processes involves the following components:

Repository

The repository contains the design-time and run-time information for a given Semarchy Convergence for MDM Application instance.

Repository Contents

The repository stores the following information:

A repository is stored in an Oracle database schema accessed from the application using a JDBC datasource named SEMARCHY_REPOSITORY.

Note: The repository should never be accessed directly via SQL queries. Access to the Semarchy Convergence for MDM information must be performed through the Semarchy Workbench user interface provided by the application.

Repository Types

There are two types of repositories:

The deployment repositories are suitable for production sites. Model transfer from design to deployment repositories is handled via incremental export/import of closed model editions. Refer to the "Planning the Installation" chapter in the "Semarchy Convergence for MDM Installation Guide" for examples of repository deployment patterns.

Note: The repository type is selected at creation time and cannot be modified afterwards.

Data Locations

When a MDM hub must be available for run-time (for testing or production purposes), it is generated from a data model defined in the repository, and deployed in a data location. Data Locations contain the deployed hubs. Each hub contains the golden data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process.

The data location content is hosted in an Oracle database schema and accessed via a JDBC datasource defined in the application server. A data location refers to the datasource via its JNDI URL.

Data Locations, Repositories and Models

A data location is attached to a repository: You can declare as many data locations as you want in a repository, but a data location is always attached to a single repository. It is not possible to have a data location attached to two repositories at the same time.

A data location can contain several editions of a single model from this repository. It is not possible to store two different models in the same data location.

Data Location Contents

A Data Location stores several Deployed Model Editions of the same model and several Data Editions:

Data Location Types

There are two types of data locations:

Note: The type is selected when the data location is created and cannot be changed afterwards.

Refer to the "Planning the Installation" chapter in the "Semarchy Convergence for MDM Installation Guide" for examples of data location deployment patterns.

Data Structures and Integration Processes

A deployed model edition is made of a Data Structure and an Integration Process.

Data Structure Details

The data structure is implemented to support all the editions of the model deployed in the hub. It is created when the first model edition is deployed, and is changed when new model editions are deployed. Changes to the structure are incremental and the same set of tables hold the data for the various data editions and deployed model edition.

For example, the GD_CUSTOMER table holds the data for all the data editions and all the editions of the Customer entity. If a new FaxNumber attribute is added to the entity and deployed in the model edition 1.1, a new FAX_NUMBER column is created and is taken into account in the data editions using model edition 1.1 and above. If the attribute Telex Number is removed in the model edition 1.2, the TELEX_NUMER column for this attribute remains in the data structures. Data editions using model editions prior to 1.2 still use this column, but it is no longer used by data editions using the model editions 1.2 and above.

The data structure is also implemented to reduce the storage of the various data editions. Data duplication is avoided as much as possible across data editions.

For example, if a golden record exists and remains for 5 successive editions, it exists only once in the data structures, and flagged as existing in the 5 editions. If it is changed in the next edition, then new data is added to store this change while preserving the previous data editions' content.

Platform Components

The Semarchy platform contains several components described in the sections below.

Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform on a defined schedule, and starts the Integration Jobs on the Execution Engine.

Execution Engine

The Execution Engine processes the Integration Jobs submitted by the Integration Batch Poller. It orchestrates the certification process to generate the golden data. This engine sequences the jobs in Clusters and Queues.

The engine can use user-created Plug-ins developed using the Semarchy Open Plug-in API. For more information about plug-ins development, see the "Semarchy Convergence for MDM Plug-in Development Guide".

The execution engine logs the activity of the platform and manages notification policies.

Web Services

There are several type of web services available in the platform:

Security

The application uses role-based security for accessing Convergence for MDM features. The users and roles used to connect to the application must be defined in the security realm as part of the application server configuration and then declared in Convergence for MDM.

Role base security is used in Convergence for MDM to define the access privileges to the features of the platform ( Platform-Level Security), as well as the privileges to access and modify data in the data editions ( Model-Level Security).