Semarchy xDM platform overview

This document provides an overview of the Semarchy xDM platform for administrators.

Architecture overview

The Semarchy xDM architecture includes:

  • The Semarchy xDM Server, a Java EE application deployed and running in an application server. This application serves:

    • The Application Builder, Dashboard Builder, xDM Discovery, Setup and Configuration user interfaces: These web application are used by designers and administrators to create, manage and administer the models and applications designed in Semarchy xDM.

    • Data Management Applications and Dashboard Applications: These web applications are used by business users to browse and manage data and visualize metrics dashboards.

    • A REST API to perform programmatically data integration, management and administrative operations.

  • The Repository that stores all the metadata used by Semarchy xDM. This includes the data models as well as the definition of the data management and dashboard applications, the Discovery datasources and profiles. The repository is hosted in a database schema. A given Semarchy xDM server is always attached to a single repository.

  • The Data Locations that contain data for the data models. This data include the golden, master and source data, with all the lineage and history. A data location is hosted in a database schema. Multiple data locations can be attached to a single repository.

The following sections detail the component of this architecture.

Semarchy xDM server

The Semarchy xDM server is a Java EE application deployed and running in a supported application server. It stores all its information in the repository. One application is always attached to a single repository.

The Semarchy xDM application is used at design-time to:

  • design Data models and applications, using the Application Builder,

  • design Dashboard applications, using the Dashboard Builder.

  • define and profile datasources, using xDM Discovery.

At run-time, the application:

  • serves the Data Management Application,

  • serves the Dashboard Applications,

  • serves the REST API to manage data and platform operations,

  • runs the processes to profile data.

  • runs the jobs that certify golden data.

Certification process

Semarchy xDM creates or modifies golden data using data feeds from:

  • Data feeds from source applications (the Publishers). Such data is published programmatically using the SQL or REST APIs.

  • Data creation or changes performed by users in the Data Management applications.

The certification process is triggered as explained below:

  1. The Publishers or the Data Management application submit to the data hub a batch of source data. Such data is submitted as a Load that contains multiple datasets to process together.

  2. The Integration Batch Poller polls integration batches at regular intervals. When a new batch is detected, the batch poller requests the Execution Engine to start the Integration Job associated with this batch.

  3. The Integration Job is created from a Job Definition and executed by the Execution Engine. Data is passed through the different steps of the certification process, and golden data is certified from the source data.

The various steps of the certification process are detailed in The data certification process

Repository

The repository contains the design-time and run-time information for a given Semarchy xDM Server instance.

Repository contents

The repository stores the following information:

  • For Data Management Models

    • The entities, attributes, etc.

    • The model versions: branches, editions, etc.

    • The platform configuration and security information: roles, privileges, notification servers, preferences, etc.

    • The data locations information: deployed model, job definitions, notifications, etc.

    • Run-time information: Engine queues, logs, etc.

  • For Dashboard Applications

    • The dashboard application metadata.

  • For xDM Discovery datasources:

    • The datasource configuration and profiled statistics.

Although the repository is stored in a database schema, it should never be accessed directly via SQL queries. Access to the Semarchy xDM information must be performed through the Semarchy user interfaces.

Repository types

There are two types of repositories:

  • Design: All design-time and run-time operations are possible in this type of repository.

  • Deployment: With this repository type, you can only import closed model editions and cannot edit them.

The deployment repositories are suitable for production sites. Model transfer from design to deployment repositories is handled via incremental export/import of closed model editions. Refer to Installation patterns for examples of repository deployment patterns.

The repository type is selected at creation time and cannot be modified afterwards.
Both type of repositories can be used indifferently for Semarchy xDM Dashboard applications.

Data locations

When a data management hub must be available for run-time (for testing or production purposes), it is generated from a data model defined in the repository, and deployed in a data location. Data Locations contain the deployed data hubs. Each hub contains the golden data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process.

The data location is hosted in a database schema and accessed via a datasource defined in Semarchy xDM.

Data locations are only used for Semarchy models designed in the Application Builder. Dashboard Applications created in Semarchy xDM Dashboard do not require a data location.

A data location is attached to a repository: You can declare as many data locations as you want in a repository, but a data location is always attached to a single repository. It is not possible to have a data location attached to two repositories at the same time.

Data location contents

A Data Location contains the hub data, stored in the schema accessed a datasource. This schema contains database tables and other objects generated from the model edition.

The data location also refers three type of jobs (stored in the repository):

  • Installation Jobs: The jobs for creating or modifying in a non-destructive way the data structures in the schema.

  • Integration Jobs: The jobs for certifying data in these data structures, according to the model job definitions.

  • Purge Jobs: The jobs for purging the logs, data lineage and history according to the retention policies.

You may deploy several model editions successively in a data location, but only one model edition is deployed and is active in the data location at a certain point in time.

Data location types

There are two types of data locations:

  • Development Data Locations: A data location of this type supports deploying open or closed model editions. This type of data location is suitable for testing models in development and quality assurance environments.

  • Production Data Location: A data location of this type supports deploying only closed model editions. This type of data location is suitable for deploying MDM hubs in production environments.

The type is selected when the data location is created and cannot be changed afterwards.

Refer to Installation patterns for examples of data location deployment patterns.

Data structures and certification process

A deployed model edition is made of a Data Structure and a set of Integration Jobs.

  • The Data Structure of the MDM Hub is a set of tables stored in the database schema of the data location. This structure contains the landing tables for the loads pushed in the hub by the publishers, the golden records tables and the tables handled by the certification process to create golden records from the source records.

  • The Integration Jobs are sequences of tasks stored in the repository, which perform the certification for data entering the hub. They are specific the deployed model edition. When a model edition is deployed in a data location, the integration job definition for this model edition replace previous job definitions.

Data Structure and Integration Jobs work together and are deployed simultaneously by the model edition deployment process.

Data structure details

The data structure is implemented to support all the successive model editions deployed in the data location. The data structure is created when the first model edition is deployed, and is altered for new model editions. Changes to the structure are only incremental and non-destructive.

For example, the GD_CUSTOMER table holds the data for all successive design states of the Customer entity. If a new FaxNumber attribute is added to the entity and deployed with model edition 1.1, the FAX_NUMBER column is created to contain the fax number data. If this attribute is removed in subsequent model editions (1.2 and so forth), the column for this attribute remains in the data structures, when these model editions are deployed.

Platform components

The Semarchy xDM platform contains several components described in the sections below.

Integration batch poller

The Integration Batch Poller polls the integration batches submitted to the platform on a defined schedule, and starts the Integration Jobs on the Execution Engine.

Execution engine

The Execution Engine processes the Integration Jobs submitted by the Integration Batch Poller. It orchestrates the certification process to generate the golden data. This engine sequences the jobs in Clusters and Queues.

The execution engine logs the activity of the platform and manages notification policies.

Security

The application uses role-based security for accessing Semarchy xDM features. The users and their assigned roles used may stored in Semarchy xDM or in an external identity provider system.

Role base security is used in Semarchy xDM to define the access privileges to the features of the platform (Platform-Level Security), as well as the privileges to access and modify data in the data locations (Model-Level Security).