Planning the Installation

Architecture Details

This section details the various components of the Semarchy Convergence for MDM architecture and their interactions.

Application

The Semarchy Convergence for MDM JEE application is deployed and runs in a supported application server.

This application provides several access methods:

The Convergence for MDM application stores its information in a repository. One application is always attached to a single repository, and connects this repository using a JDBC datasource named SEMARCHY_REPOSITORY configured in the application server.

The Convergence for MDM application is used at design-time to design data models and deploy them. At run-time, it manages the processes involved to schedule and execute the convergence processes in the hub.

The processes are managed by:

The application uses role-based security for accessing Convergence for MDM features. The users and roles used to connect to the application must be defined in the security realm of the application server. Configuring the roles and users is part of the application configuration.

Repository

The repository contains the information for a given Convergence for MDM application. It stores:

A repository is stored in an Oracle database schema accessed from the application using a JDBC datasource named SEMARCHY_REPOSITORY.

Note: The repository should never be accessed directly via SQL queries. Access to the Semarchy Convergence for MDM information must be performed through the Semarchy Workbench user interface that is served by the application.

Repository Types

There are two types of repositories. The repository type is selected at creation time and cannot be modified afterwards.

The repository types are:

The deployment repositories are suitable for production sites. Model transfer from design to deployment repositories is handled via incremental export/import of closed model editions.

Data Locations

When a MDM hub must be available for run-time (for testing or production purposes), it is generated from a data model defined in the repository, and deployed in a data location.
Data Locations contain the deployed data convergence hubs. Each hub contains the golden data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process.

The data location content is hosted in an Oracle database schema and accessed via a JDBC datasource defined in the application server. A data location refers to the datasource via its JNDI URL.

Data Location Content

A Data Location stores several Model Editions of the same model and several Data Editions.

A Deployed Model Edition is a model version deployed at a given time in a data location. As an MDM Hub model evolves over time, for example to include new entities or functional areas, new model editions are created then deployed. Deployed Model Editions reflect this evolution in the structure of the MDM Hub. Similarly, a Data Edition reflects the evolution of the data stored in the hub over time. You can perform snapshots (editions) of the master data at given points in time. Data Editions reflect the evolution in the content of the MDM Hub.

Data Locations, Repositories and Models

A data location is attached to a repository: You can declare as many data locations as you want in a repository, but a data location is always attached to a single repository. It is not possible to have a data location attached to two repositories at the same time.
A data location can contain several editions of a single model from this repository. It is not possible to store two different models in the same data location.

Data Location Types

There are two types of data locations. The type is selected when the data location is created and cannot be changed afterwards:

The data location types are:

Convergence Pulse Metrics

Convergence Pulse is an optional component that captures metrics from the Data Locations and the Repository and stores them in a Pulse Metrics Warehouse.
The metrics from this warehouse can be displayed into dashboards available from Convergence for MDM.

The Pulse Metrics Warehouse is stored in an Oracle database schema accessed from the Convergence for MDM application using a JDBC datasource named SEMARCHY_PULSE_METRICS.

If you plan to use Semarchy Convergence Pulse Metrics dashboards in Convergence for MDM, you must plan and configure Convergence Pulse. Review the Semarchy Convergence Pulse Installation and Configuration Guide for more information on the architecture and configuration of Convergence Pulse. Make sure to gather the connection information to your Pulse Metrics Warehouse database schema before proceeding.

If you do not plan to use Semarchy Convergence Pulse, skip the creation of the Pulse Metrics Warehouse datasource in the installation process.

Installation Patterns

This section provides patterns for deploying Semarchy Convergence for MDM in real-life environments.

Pattern #1: Single Repository and Project

This pattern assumes that a single project is designed through a development/QA/Production lifecycle.
For this pattern:

In this pattern, a single repository contains the development, QA and production editions of the models. Model versioning allows freezing and delivering to the next stage (and next deployment location) a model as it moves along its lifecycle.

Pattern #2: Single Repository, Multiple Projects.

This pattern is similar to the previous one, but assumes that several projects/models are managed in the same repository.
For this pattern,

The organization is the same as in pattern 1, but a set of data locations exists for each project managed in the single repository.

Pattern #3: Development and Production in Different Sites

This pattern is similar to Pattern #1, but assumes that the development/QA and production sites are located on different networks or sites.

For this pattern, two repositories are created instead of one:

With this configuration, when the QA phase is finished, the closed models editions are exported to files from the REPO_DEV Design repository and imported in the REPO_PROD Deployment repository. From this repository these closed models are deployed to the PROD Production data location.

Note: With this configuration, you need to deploy two instances of the Semarchy Convergence for MDM Application, one per repository. These two instances are located on two different networks with possibly different security, scalability and high availability requirements.

Although patterns #1 and #2 work in most cases, this last may be preferable in environment where the production is clearly separated from the development side.

High-Availability Configuration

Semarchy Convergence for MDM can be configured to support enterprise-scale deployment and high-availability.

Convergence for MDM supports the clustered deployment of the Convergence for MDM web application for high-availability and failover. A clustered deployment can be set up for example to support a large number of concurrent users performing data access, entry or duplicate management operations.

Reference Architecture for High-Availability

In a clustered deployment, only one instance of the Semarchy application manages and runs the certification processes. This instance is the Active Instance. A set of clustered Semarchy applications serves users accessing the Workbench (for modeling, administration or data stewardship) as well as applications accessing data locations via web services. These are Passive Instances. The back-end databases hosting the repository and data locations are deployed in a database cluster, and an HTTP load balancer is used as a front-end for users and applications.

The reference architecture for such a configuration is described in the following figure:

This architecture is composed of the following components:

In this architecture:

Important: In this architecture, only one Active Instance must be configured. Multiple active instances are not supported.

Load Balancing

In the architecture, load balancing ensures an optimal usage of the resources for a large number of users and applications accessing simultaneously Semarchy Convergence for MDM.

Load balancing is performed at two levels:

Failure and Recovery

In the reference architecture, failover is managed for both user sessions (connections through the Convergence Workbench) and application sessions (via the web services). This section describes the behavior in case of a failure at the various points of the architecture.

Database Failure

In the event of a RAC node failure, other nodes are able to recover and process the incoming database requests.

Passive Instance Failure

If one of the nodes of the JEE application server cluster fails:

Active Instance Failure

The single point of failure in this architecture is the Active Instance, which sole purpose is to process batches and jobs.

If this server fails:

The active instance must be restarted automatically or manually to recover from a failure.
When it is restarted, the platform resumes its normal course of activity with no user action required.

Tip: A Failure of the Active Instance does not impact the overall activity of users or applications, as these rely on the (clustered) Passive Instances. The only impact of such a failure may be a delay in the processing of data changes.

Configuring Convergence for MDM for High-Availability

The Semarchy Convergence for MDM application comes in two flavors corresponding to two WAR (Web Application Archive) files. Both these files are in the semarchy-mdm-platform-war-<version tag>.zip installation file:

The overall installation process for a high-availability configuration is similar to the general installation process:

  1. Create the repository and data location schemas in the RAC cluster.
  2. Configure the application server security for both the cluster and the active node.
  3. Configure the Oracle RAC JDBC Datasources for the nodes/cluster. Refer to your Oracle Database and Application Server documentation for more information about configuring RAC JDBC Datasources.
  4. Deploy the applications: