Welcome to Semarchy xDM.
This guide contains information about administering and monitoring Semarchy xDM.

Preface

Audience

This document is intended for administrators managing and configuring Semarchy xDM in an Enterprise Master Data Management Initiative.

If you want to learn about MDM or discover Semarchy xDM, you can watch our tutorials.
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Overview

Using this guide, you will:

  • Understand the Semarchy xDM architecture and components.

  • Learn how to manage the various component of the architecture.

  • Learn how to manage run-time and troubleshoot errors.

  • Learn how to enable and manage a secure environment for Semarchy xDM.

Introduction to Semarchy xDM

What is Semarchy xDM?

Semarchy xDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.

Semarchy xDM is based on a coherent set of features for all Master Data Management projects.

Architecture Overview

The Semarchy xDM architecture includes:

  • The Semarchy xDM Application: This JEE application is deployed and runs in an application server, and stores its information in a repository.

  • The Repository that stores all the MDM projects metadata and execution logs. This repository is hosted in a schema within a Database instance. A given Semarchy xDM Application is attached to a single repository, and users connecting to this applications access only the content of this repository.

  • The Data Locations. Each data location contains a Semarchy hub. This hub contains the golden data and master data, the source data published to or authored in the hub, and the various stages of this data in the certification process. A data location is hosted in a schema of a database instance. Several data locations can be attached to a single repository.

  • The Semarchy xDM Workbench and Applications: These web applications are served from the Semarchy xDM JEE Application and run in a web browser.

Architecture

This section details the various components of the Semarchy xDM architecture and their interactions.

Semarchy xDM Application

The Semarchy xDM application is a Java EE application deployed and running in a supported application server.

This application provides several access methods:

  • Users access it via their web browser using the Semarchy Workbench user interface or generated Applications.

  • Applications access the platform services and MDM hub data via integration points.

The Semarchy xDM application stores its information in a repository. One application is always attached to a single repository, and connects this repository using a JDBC datasource named SEMARCHY_REPOSITORY. This datasource is configured in the application server.

The Semarchy xDM application is used at design-time to design and version control data models. At run-time, it is used to deploy the models in the form of MDM hubs and to manage the certification process that certifies golden data from source applications’ data in these hubs. It also exposes the integration points used to access the golden data from the hubs.

Certification Process

Semarchy xDM certifies golden data from data pushed by source applications (the Publishers), through an integration process.

The certification process is triggered as explained below:

  1. Publishers submit to the MDM hub source data as an External Load. Publishing the source data is a three steps operation:

    1. The publisher initializes an External Load, and receives unique Load ID from the platform.

    2. The publisher inserts source data in various landing tables of the MDM hub in the context of this external load

    3. The publisher submits the external load, which is converted to an Integration Batch, identified by a unique Batch ID.

  2. The Integration Batch Poller polls integration batches at regular intervals. When a new batch is detected, the integration batch poller requests the Execution Engine to start the Integration Job associated with this batch. This Integration Job is created from a Job Definition. Data in the batch is passed through the different steps of the certification process, and golden data is certified from the source data.

The various steps of the certification process are detailed in the Certification Process Design chapter of the Semarchy xDM Developer’s Guide.

The certification process involves the following components:

  • The Integration Batch Poller that polls new data batches submitted in the hub by the publishing applications.

  • The Execution Engine that orchestrates the certification process to generate the golden data.

Repository

The repository contains the design-time and run-time information for a given Semarchy xDM Application instance.

Repository Contents

The repository stores the following information:

  • The models: entities, attributes, etc.

  • The models version control information: branches, editions, etc.

  • The configuration & security information: roles, privileges, notification servers, notification policies, preferences, etc.

  • Data locations information: deployed model, job definitions, etc.

  • Run-time information: queues, logs

A repository is stored in a database schema accessed from the application using a JDBC datasource named SEMARCHY_REPOSITORY.

The repository should never be accessed directly via SQL queries. Access to the Semarchy xDM information must be performed through the Semarchy Workbench user interface provided by the application.

Repository Types

There are two types of repositories:

  • Design: All design-time and run-time operations are possible in this type of repository.

  • Deployment: With this repository type, you can only import closed model editions and cannot edit them.

The deployment repositories are suitable for production sites. Model transfer from design to deployment repositories is handled via incremental export/import of closed model editions. Refer to the Planning the Installation chapter in the Semarchy xDM Installation Guide for examples of repository deployment patterns.

The repository type is selected at creation time and cannot be modified afterwards.

Data Locations

When a MDM hub must be available for run-time (for testing or production purposes), it is generated from a data model defined in the repository, and deployed in a data location. Data Locations contain the deployed data hubs. Each hub contains the golden data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process.

The data location is hosted in a database schema and accessed via a JDBC datasource defined in the application server. A data location refers to the datasource via its JNDI URL.

Data Locations, Repositories and Models

A data location is attached to a repository: You can declare as many data locations as you want in a repository, but a data location is always attached to a single repository. It is not possible to have a data location attached to two repositories at the same time.

You may deploy several model editions successively in a data location, but only one model edition is deployed and is active in the data location at a certain point in time.

Data Location Contents

A Data Location contains the hub data, stored in the schema accessed using the data location’s datasource. This schema contains database tables and other objects generated from the model edition.

The data location also refers three type of jobs (stored in the repository):

  • Installation Jobs: The jobs for creating or modifying in a non-destructive way the data structures in the schema.

  • Integration Jobs: The jobs for certifying data in these data structures, according to the model job definitions.

  • Purge Jobs: The jobs for purging the logs, data lineage and history according to the retention policies.

Data Location Types

There are two types of data locations:

  • Development Data Locations: A data location of this type supports deploying open or closed model editions. This type of data location is suitable for testing models in development and quality assurance environments.

  • Production Data Location: A data location of this type supports deploying only closed model editions. This type of data location is suitable for deploying MDM hubs in production environments.

The type is selected when the data location is created and cannot be changed afterwards.

Refer to the Planning the Installation chapter in the Semarchy xDM Installation Guide for examples of data location deployment patterns.

Data Structures and Certification Process

A deployed model edition is made of a Data Structure and a set of Integration Jobs.

  • The Data Structure of the MDM Hub is a set of tables stored in the database schema of the data location. This structure contains the landing tables for the loads pushed in the hub by the publishers, the golden records tables and the tables handled by the certification process to create golden records from the source records.

  • The Integration Jobs are sequences of tasks stored in the repository, which perform the certification for data entering the hub. They are specific the deployed model edition. When a model edition is deployed in a data location, the integration job definition for this model edition replace previous job definitions.

Data Structure and Integration Jobs work together and are deployed simultaneously by the model edition deployment process.
Data Structure Details

The data structure is implemented to support all the successive model editions deployed in the data location. The data structure is created when the first model edition is deployed, and is altered for new model editions. Changes to the structure are only incremental and non-destructive.

For example, the GD_CUSTOMER table holds the data for all successive design states of the Customer entity. If a new FaxNumber attribute is added to the entity and deployed with model edition 1.1, the FAX_NUMBER column is created to contain the fax number data. If this attribute is removed in subsequent model editions (1.2 and so forth), the column for this attribute remains in the data structures, when these model editions are deployed.

Platform Components

The Semarchy xDM platform contains several components described in the sections below.

Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform on a defined schedule, and starts the Integration Jobs on the Execution Engine.

Execution Engine

The Execution Engine processes the Integration Jobs submitted by the Integration Batch Poller. It orchestrates the certification process to generate the golden data. This engine sequences the jobs in Clusters and Queues.

The engine can use user-created Plug-ins developed using the Semarchy xDM Open Plug-in API. For more information about plug-ins development, see the Semarchy xDM Plug-in Development Guide.

The execution engine logs the activity of the platform and manages notification policies.

  • Job Logs are stored in the repository and trace the execution of the jobs submitted to the engine. These logs include full job description and statistics.

  • Job Notification Policies are configured per data location. These policies define the conditions upon which job notifications are issued, as well as the content of these notifications. These notifications use Notification Servers declared in the platform.

  • The Execution Console displays the execution detailed activity, and can be used for troubleshooting purposes, for example when restarting a job.

  • Logging (trace) can be configured for debugging the platform behavior.

Security

The application uses role-based security for accessing Semarchy xDM features. The users and roles used to connect to the application must be defined in the security realm as part of the application server configuration and then declared in Semarchy xDM.

Role base security is used in Semarchy xDM to define the access privileges to the features of the platform (Platform-Level Security), as well as the privileges to access and modify data in the data locations (Model-Level Security).

Introduction to the Administration Perspectives

For an introduction to the Semarchy Workbench user interface, see the Introduction to the Semarchy Workbench chapter in the Semarchy xDM Developer’s Guide.

The Semarchy Workbench provides three perspectives for administrators:

  • Administration Console: this perspective is used to administer the platform components and monitor run-time activity.

  • Model Administration: this perspective is used to manage model editions and branches.

  • Data Locations: this perspective is used to create data locations and deploy model editions in these locations.

Administration Console

In the Administration Console perspective, you can view and administer the following components:

  • Applications Configuration: Global parameters for all applications.

  • Execution Engine: Start and stop the engine and manage the jobs, queues and clusters.

  • Executions: View the job log as well as the job definitions.

  • Image Libraries: Manage image libraries and images in these libraries.

  • Integration Batch Poller: Start, stop and configure the behavior of this component.

  • Logging Configuration: Configure the platform logging (trace) for debugging purposes.

  • Notification Servers: Add, remove and configure servers used to send job notifications and application emails.

  • Plug-ins: View, add or update user-created plug-ins.

  • Roles: Declare in Semarchy xDM the application server roles, and grant them with platform-level privileges.

  • Variable Value Providers: Configure the system queried by Semarchy xDM to retrieve values for model variables.

Model Administration

In the Model Administration perspective, you can manage the versions (editions) of the models in design-time as well as the model branches. You can create and maintain using this perspective several simultaneous branches of a model, and let developers work on these various branches.

Data Locations

In the Data Location perspective, you can manage the data locations, including:

  • Data Locations creation and deletion.

  • Deployed Model Editions: deploy model editions in the data location, and view the jobs related to the deployed model editions.

  • Job Notifications Policies: Configure the job notification issued on job success or failure.

  • Continuous Loads: Configure the loads into which middleware systems can push data that is processed in a continuous way.

  • Purge Schedule: Configure the schedule and activation of the data location’s purge job.

Managing Repositories

The repository contains the design-time and run-time information for the Semarchy xDM application.

Understanding Repositories

The repository is created when Semarchy xDM is installed. The type of the repository (Design Repository or Deployment Repository) is set also at creation time, and the application always connects to a single repository.

The repository creation process is detailed in the Semarchy xDM Installation Guide.

The type of a repository defines the capabilities of this repository:

  • A Design Repository allows you to perform all design-time and run-time operations.

  • A Deployment Repository only allows run-time operations. You can import closed model editions in such repository but cannot edit them.

Typical Patterns for repository deployment are detailed in the Planning the Installation chapter of the Semarchy xDM Installation Guide. Simple and advanced deployment tasks are explained in the Deployment chapter of the Semarchy xDM Developer’s Guide.

Repository Administration Tasks

Purging Logs

Both the design and deployment repositories contain the execution logs of the integration jobs. These logs should be deleted regularly to reduce the repository space in the database.

See the Purging the Logs section in this guide for a description of this task.

Viewing the Repository and System Information

Semarchy xDM exposes the repository and system details in the About dialog.

To view the repository and system details:

  1. In the Semarchy Workbench menu, select Help > About.

  2. In the About dialog:

    • The License Information link displays the current license information and allow for Updating the License Key.

    • The Repository Information link displays the repository details (including name and version).

    • The System Information link displays the platform system details and may be used for support purposes.

Updating the License Key

Semarchy xDM stores in the repository the license information and the license key provided to you for evaluation or when you purchased the product. You can update an expired license key with a new one using the following procedure.

You must be logged with a user having the semarchyAdmin role to perform license key update tasks. Without this privilege, you are only able to view the license key.

To update the license key:

  1. In the Semarchy Workbench menu, select Help > About.

  2. In the About dialog select the License Information link.

  3. In the License Key Information dialog, click the Upload License Key File… button.

  4. Use the Browse button to select the license key file.

  5. If the selected license key recognized as a valid one, you can click the OK button to register the license key in the repository.

A temporary license key must be updated when it expires. When such license key expires, the repository content is preserved as is, but the application is no longer accessible and a popup window will prompt you for a new license key when you log in.

Managing Model Editions

This chapter discusses administration considerations related to Model Editions management.

Understanding Model Editions

Model Changes are handled using Model Editions. This version control mechanism allows you to freeze versions of a model (called Model Editions) then deploy them for data loading and certification processing in a Data Location.

A data location always uses a given model edition. This means that this data location contains data organized according to the model structure in the given model edition, and that golden data in this data location is processed and certified according to the rules of the given model edition.

Version Numbers

Model editions are identified by a version number. This version number format is <branch>.<edition>. The branch and model numbers start at zero and are automatically incremented as you create new branches or editions.
For example, the first model edition in the first branch has the version [0.0]. The fourth edition of the CustomerAndFinancialMDM model in the second branch is named CustomerAndFinancialMDM [1.3].

Actions on Model Editions

Model Editions support the following actions:

  • Creating a New Model creates the first edition of the model.

  • Closing and Creating a New Edition of the model freezes the model edition in its current state, and opens a new edition of the model for modification.

  • Branching, to maintain several parallel branches of the model. You create a branch based on an existing closed model edition when you want to fork the project from this edition, or create a maintenance branch.

  • Deployment, to install or update a model edition in a data location.

  • Export and Import model editions, to transfer them between repositories.

Refer to the the following chapters for more information about model editions management tasks:

  • Models Management chapter in the Semarchy xDM Developer’s Guide.

  • Deployment chapter in the Semarchy xDM Developer’s Guide.

Model Editions Lifecycle

The model edition lifecycle is described below.

  1. The project manager creates a new model and the first model edition.

  2. Developers edit the model metadata. They perform their logical modeling and certification process design activities.

  3. When the developers reach a level of completion in the project, they deploy the model edition for testing, and afterwards deploy it again while pursuing their developments and tests. Such actions are typically performed in a development data location. Sample data can be submitted to the data location for integration in the hub.

  4. When the first project milestone is reached, the project manager:

    1. Closes and create a new model edition.

    2. Deploys the closed model edition or exports the model edition for deployment on a remote repository.

  5. The project can proceed to the next iteration (go to step 2).

  6. When needed, the project manager creates a new branch starting from a closed edition. This may be needed for example when a feature or fix needs to be backported to a close edition without taking all the changes done on later editions.

Considerations for Models Editions Management

The following points should be taken into account when managing the model editions lifecycle:

  • No Model Edition Deletion: It is not possible to delete old model editions. The entire history of the project is always preserved.

  • Use Product Data Locations: Although deploying open model editions is a useful feature in development for updating quickly a model edition, it is not recommended to perform updates on data location that host production data, and it is not recommended to use development data locations for production. The best practice is to have Production Data Locations that only allow deploying closed model edition for production data.

  • Import/Export for Remote Deployment: It is possible to export and import model from both deployment and development repositories. Importing a model edition is possible in a Deployment Repository if this edition is closed.

  • Avoid Skipping Editions: When importing successive model editions, it is not recommended to skip intermediate editions, as it is not possible import them at a later time. For example, if importing edition 0.1 of a model, then importing edition 0.4, the intermediate editions - 0.2 and 0.3 - can longer be imported in this repository.

Managing the Platform

The platform consists of several components that can be managed from the Administration Console perspective.
These components include the Engine, the Integration Batch Poller, the Notification Servers and Notification Policies, the Plug-ins, etc.

Managing the Execution Engine

Accessing the Execution Engine

To access the execution engine:

  1. In the Administration view, double-click the Execution Engine node.
    The Execution Engine editor opens.

The Execution Engine Editor

This editor displays the list of queues grouped by clusters. Queue currently pending on suspended jobs appear in red.

The list of queues and clusters displays the following information:

  • Cluster/Queue Name: the name of the cluster or queue.

  • Status: Status of the queue or cluster. A queue can be either READY, SUSPENDED or BLOCKED. A cluster may be in a BLOCKED or READY status.

  • Queued Jobs: For a queue, the number of jobs queued in this queue. For a cluster number of jobs queued in all the queues of this cluster.

  • Running Jobs: For a queue, the number of jobs running in this queue (1 or 0). For a cluster, the number of jobs running in all the queues of this cluster.

  • Suspend on Error: Defines the behavior of the queue on job error. See the Troubleshooting Errors section for more information.

From the Execution Engine editor, you can perform the following operations:

Stopping and Starting the Execution Engine

To stop and start the execution engine:

  1. In the Administration view, double-click the Execution Engine node. The Execution Engine editor opens.

  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the execution engine.

Stopping the execution engine does not kill running jobs. The engine stops after all running jobs are completed. Beside, the content of the queues is persisted. When the execution engine is restarted, the execution of queued jobs proceeds normally.

Managing the Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform, and starts the integration jobs on the execution engine. The polling action is performed on a schedule configured in the batch poller.

Stopping and Starting the Integration Batch Poller

To stop and start the integration batch poller:

  1. In the Administration view, double-click the Integration Batch Poller node. The Integration Batch Poller editor opens.

  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the integration batch poller.

Stopping the batch poller does not kill running jobs, and does not prevent new batches to be submitted. When this component is stopped, the submitted batches are simply not taken into account and no jobs is queued on the execution engine until the batch poller is restarted.

Configuring the Integration Batch Poller

The integration batch poller configuration determines the frequency at which submitted batches are picked up for processing.

To configure the integration batch poller:

  1. In the Administration view, double-click the Integration Batch Poller node.

  2. In the Integration Batch Poller editor, choose in the Configuration section the polling frequency:

    • Weekly at a given day and time.

    • Daily at a given time.

    • Hourly at a given time.

    • Every n second.

    • With a UNIX See Cron syntax.

  3. Press CTRL+S to save the configuration.

It is not necessary to restart the integration batch poller to take into account the configuration changes.

In the Advanced section, set optionally the following logging parameters:

  • Job Log Level: Select the logging level that you want for the jobs:

    • No Logging disables all logging. Jobs and tasks are no longer traced in the job log. Job restartability is not possible. This level is not recommended.

    • No Tasks only logs job information, and not the task details. This mode supports job restartability.

    • Exclude Skipped Tasks (default) logs job information and task details, except for the tasks that are skipped.

    • Include All Tasks logs job information and all task details.

  • Execution Monitor Log Level: Logging level [1…3] for the execution console for all the queues.

  • Enable Conditional Execution: A task may be executed or skipped depending on a condition set on the task. For example, a task may be skipped depending on parameters passed to the job. Disabling this option prevents conditional executions and forces the engine to process all the tasks.

Deployment repositories are created with a Job Log Level value set to No Task. Other repositories are created with no configured value, and use the Exclude Skipped Tasks default value.

Configuring Continuous Loads

Continuous loads enable integration developers to push data into the MDM hub in a continuous way without having to take care of Load Initialization or Load Submission.

With continuous loads:

  • Integration developers do not need to initialize and submit individual external loads. They directly load data into the hub using the Load ID of the continuous load.

  • At regular intervals, Semarchy xDM automatically creates then submits an external load with the data loaded in the continuous load. This external load is submitted with a program name, a job, and a submitter name.

  • The continuous load remains, with the same Load ID. Subsequent data loads made with this Load ID are processed at the next interval.

Continuous loads are configured and managed by the administrator in a data location. Unlike external loads, they cannot be created, submitted or canceled via integration points.

To configure a continuous load:

  1. Switch to the Data Location perspective.

  2. In the Data Locations view, expand the data location for which you want to configure a continuous load.

  3. Right-click the Continuous Loads node and select New Continuous Load. The Create New Continuous Load wizard opens.

  4. Enter the following values:

    • Active: Check this option to make the continuous load active. Only active loads integrate data at a regular interval.

    • Program Name: This value is for information only. It describes the submitted external loads.

    • On Submit Job: Integration job submitted with the external loads. This job is selected among those available in the deployed model edition.

    • Submit Interval: Interval in second between submissions.

    • Submit as: name of the user submitting the external loads. This user may or may not be a user defined in the security realm of the application server.

  5. Click Finish to close the wizard. The Continuous Load editor opens.

  6. In the Description field, optionally enter a description for this load.

  7. Press CTRL+S to save the editor.

The administrator can deactivate a continuous load to prevent it from processing its data.

To activate or deactivate continuous loads:

  1. Switch to the Data Location perspective.

  2. In the Data Locations view, expand the data location for which you want to configure a continuous load.

  3. Double-click the Continuous Loads node. The Data Location editor opens on the Continuous Loads tab.

  4. Select one or more continuous loads in the list, and then click the Activate or Deactivate button in the toolbar.

  5. Press CTRL+S to save the editor.

You do not need to restart any other component after creating, activating or deactivating a continuous load. The changes are immediately taken into account.
When deploying a new model edition that deprecates a job, continuous loads using this job are automatically made inactive. They must be updated to use the updated integration job and then reactivated by the administrator.

Configuring Notifications

Notifications tell users or applications when a job completes or when operations are performed into workflows, for example, when task is assigned to a role.

There are two types of notifications:

  • Job Notifications issued under certain conditions when an integration job completes. These notifications are used for administration, monitoring, or integration automation. These notifications are configured with Notification Policies in the data locations.

  • Workflow Notifications are emails sent to users when operations are performed in a workflow. They are configured in workflow transitions and tasks.

Both families of Notifications are issued via Notification Servers.

Notifications Servers Types

Notifications recipients may be users or systems. The type of notification sent as well as the recipient depends on the type of notification server configured.

Each notification server uses a Notification Plug-in that:

  • defines the configuration parameters for the notification server,

  • defines the configuration and form of the notification,

  • sends the notifications via the notification servers.

Semarchy xDM is provided with several built-in notification plug-ins:

  • JavaMail: The notification is sent in the form of an email via a Mail Session server configured in the application server, and referenced in the notification server. For more information about configuring Mail Session, see the Semarchy xDM Installation Guide.

  • SMTP: The notification is sent in the form of an email via a SMTP server entirely configured in the notification server.

  • File: The notification is issued as text in a file stored in a local directory or in a FTP/SFTP file server.

  • HTTP: The notification is issued as a GET or POST request sent to a remote HTTP server. Use this server type to call a web service with the notification information.

  • JMS: The notification is issued as a JMS message in a message queue.

It is possible to develop additional plug-ins to issue other type of notifications. See the Semarchy xDM Plug-in Development Guide for more information about plug-in development.

A single notification server having either the JavaMail or SMTP type can be used to send Workflow Notifications. This server is flagged as the Workflow Notification Server

Any servers can be used to send Job Notifications. Each Job Notification Policy specifies the notification server it uses.

Configuring Notification Servers

This section explains how to create notification servers using the built-in notification plug-ins.

Creating a Notification Server

To create a notification server:

  1. In the Administration view, double-click the Notification Servers node. The Notification Servers editor opens.

  2. Select the Notification Servers list, right click and select image New Notification Server. The Create New Notification Server wizard opens.

  3. Enter the following workflow parameters:

    • Name: Internal name of the notification server.

    • Label: User-friendly label for the server.

    • Plug-in ID: Select one of the available notification server plug-in.

    • Workflow Notification Server: Select this option to use this notification server by default in the workflows. This options can be selected only if the Plug-in ID is JavaMail or SMTP.

  4. Click Next.

  5. In the second wizard page, enter the configuration information for your type of server:

    • JavaMail:

      • JNDI URL: JNDI URL of the Java Mail Session service available in the application server. This URL is typically java:comp/env/mail/Session if the Mail Session service is declared as mail/Session in the application server.

      • From User: Email address of the sender of the notifications from this server. This address is also used in the reply-to address for notification emails.

      • Password If this server requires specific authentication, enter a password for this server.

    • SMTP:

      • SMTP Host Name and SMTP Port: Name or address, and port of the SMTP host.

      • From User: Email address of the sender of the notifications from this server. This address is also used in the reply-to address for notification emails.

      • Authentication Required: If this server requires specific authentication, select the Authentication Required option and enter a User Name and Password for this server, and indicate whether it uses TLS or SSL.

      • Additional SMTP Properties Enter additional properties as property=value pairs.

    • File:

      • File System: Select the file system of the file server. FILE for a local server, FTP or SFTP for a remote server.

      • Host, Port, Login, Password are required to connect an FTP or SFTP server.

      • Root Path: Provide the root path for storing the notification file. For example c:\work\notifications for Windows or /work/notifications for UNIX/Linux

    • HTTP:

      • Scheme: Specify whether the HTTP request should be done using HTTP or HTTPS

      • Host, Port and optionally Login, Password are used to connect the HTTP server.

      • Base Path: Root path appended added after the host and port in the URL. For example: / or /rest/api/.

      • Use System Properties: Check this option to use the system-defined properties to configure the HTTP connection. This option allows using a proxy configuration defined in the Java parameters for the application server.

      • Proxy Host, Proxy Port, Proxy Login and Proxy Password are used to configure the connection through an HTTP proxy.

      • Headers: Enter additional HTTP headers as property=value pairs.

    • JMS:

      • Connection Factory URL: JNDI URL of the factory used to create a connection to the JMS destination. The URL is typically java:comp/env/jms/ConnectionFactory if the connection factory is declared as jms/ConnectionFactory in the application server.

      • Login and Password used when initiating the JMS connection.

  6. Press CTRL-S to save the configuration.

Testing a Notification Server

After configuring the notification server, it is recommended to run a test email on this server.

To test a notification server:

  1. In the Notification Servers editor, select the notification server you want to test, right-click and select Test Configuration.

  2. The next steps depend on the type of notification servers:

    • File, HTTP, JMS: No further operation is needed. A connection attempt is made on the notification server.

    • JavaMail and SMTP: Provide a comma-separated list of email addresses and then click OK. An email is sent via the notification server to these recipients.

Configuring a Job Notification Policy

With a notification server configured, it is possible to create notification policies using this server.

To create a notification policy:

  1. Open the Data Locations perspective.

  2. In the Data Locations view, right-click the Job Notification Policies node and select image New Job Notification Policy. The Create New Job Notification Policy wizard opens.

  3. In the first wizard page, enter the following information:

    • Name: Internal name of the notification policy.

    • Label: User-friendly label for the notification policy. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

    • Notification Server: Select the notification server that will be used to send these email notifications.

    • Use Complex Condition: Check this option to use a freeform Groovy Condition. Leave it unchecked to define the condition using a form.

  4. Click Next.

  5. Define the job notification condition. This condition apply to a completing job.

    • If you have checked the Use Complex Condition option, enter the Groovy Condition that must be true to issue the notification. See Groovy Condition for more information.

    • If you have not checked the Use Complex Condition option, use the form to define the condition to issue the notification.

      • Job Name Pattern: Name of the job. Use the _ and % wildcards to represent one or any number of characters.

      • Notify on Failure: Select this option to send notification when a job fails or is suspended.

      • Notify on Success: Select this option to send notification when a job completes successfully.

      • … Count Threshold: Select the maximum number of errors, inserts, etc. allowed before a notification is sent.
        If you define a Job Name Pattern, Notify on Failure and a Threshold, a notification is sent if a job matching the pattern fails or to reaches the threshold.

  6. Click Next.

  7. Define the job notification Payload. This payload is a text content, but you can use Groovy also to programmatically generate it. See Groovy Template for more information.
    This payload has a different purpose depending on the type of notification:

    • JavaMail or SMTP: The body of the email

    • File: the content written to the target file.

    • JMS: the payload of the JMS message.

    • HTTP: The content of a POST request.

  8. Click Next.

  9. Define the Notification Properties. These properties depend on the type of notification server:

    • JavaMail or SMTP:

      • Subject: Subject of the email. The subject may be a Groovy Template

      • To, CC: List of recipients of this email. These recipients are roles. Each of these roles points to a list of email addresses.

      • Content Type: Email content type. For example: text/html, text/plain. This content type must correspond to the generated payload.

    • File:

      • Path: Path of the file in the file system. The path may be a Groovy Template. Make sure to use only forward slashes / for this path. Note that this path is a relative path from the Notification Server’s Root Path location. For example, if you set the Path to /new and the Notification Server Root Path to /work/notifications, then the notification files are stored in the /work/notifications/new folder.

      • Append: Check this option to append the payload to the file. Otherwise, the file is overwritten.

      • Charset: Charset used for writing the file. Typically UTF-8, UTF-16 or ISO-8859-1.

      • File Name: Name of the file to write. the file name may be a Groovy Template.

      • Root Path: Provide the root path for storing the notification file.

    • HTTP:

      • Method: HTTP request method (POST or GET)

      • Request Path: Path of the request in the HTTP server. The request path may be a Groovy Template

      • Parameters: HTTP Parameters passed to the request in the form a list of property=value pairs separated by a & character. If no parameter is passed and the method is GET, all the notification properties are passed as parameters. The parameters may be a Groovy Template

      • Headers: HTTP Parameters passed to the request as header=value pairs, with one header per line.

      • Content Type: Content type of the payload. For example: text/html, text/plain. This content type must correspond to the generated payload.

      • Failure Regexp: If the server returns an HTTP Code 200, the response payload is parsed with this regular expression. If the entire payload matches this expression, then the notification is considered failed. For example, to detect the NOTIFICATION FAILED string in the payload, the Failure Regexp value should be (.*)NOTIFICATION FAILED(.*).

    • JMS:

      • JMS Destination: JNDI URL of the JMS topic or queue. The URL is typically java:comp/env/jms/queue/MyQueue if a queue factory is declared as jms/queue/MyQueue in the application server. The destination may be a Groovy Template

      • Message Type: Type of JMS Message sent: TextMessage, MapMessage or Message. See Message Types for more information. When using a MapMessage, the payload is ignored and all properties are passed in the MapMessage.

      • Set Message Properties: Check this option to automatically set all notification properties as message properties. Passing properties in this form simplifies message filtering.

  10. Press CTRL-S to save the configuration.

Using Groovy for Notifications

The Groovy scripting language is used to customize the notification. See http://groovy.codehaus.org for more information about this language.

Groovy Condition

When using a complex condition for triggering the notification, the condition is expressed in the form of a Groovy expression that returns true or false. If this condition is true, then the notification is triggered.

This condition may use properties of the job that completes. Each property is available as a Groovy variable.

You can use the image Edit Expression button and open the condition editor.
In the condition editor:

  • Double-click one of the Properties in the list to add it to the condition.

  • Click the Test button to test the condition against the notification properties provided in the Test Values tab.

  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample conditions are given below:

Trigger a notification if a job has got errors.
ErrorCount > 0
Trigger a notification for batches in status DONE, triggered by a workflow which name contains "Product".
BatchStatus == 'DONE' && WorkflowName.find("Product") != null
Trigger a notification if the batch has processed the "Customers" or "Contacts" entities. EntityNames is a list of the names of the entities processed by the job.
EntityNames.find() == "Customers" || EntityNames.find() == "Contacts"
Groovy Template

You can use Groovy to customize some elements of the notification, such as the Payload, the subject or the name of the JMS destination of the notification.

In these cases, a Groovy Template is used generate a string output from the notification properties.

In the template:

  • The notification properties are available using the the $<property_name> syntax.

  • You can also use Groovy code surrounded with <% %> tags.

  • You can use the <%= %> syntax to output a string generated by Groovy.

Use the image Edit Expression button to open the expression editor to modify a Groovy template. In the template editor:

  • Double-click one of the Properties in the list to add it to the template. It is added with the $<property_name> syntax.

  • Click the Test button to test the template against the notification properties provided in the Test Values tab.

  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample templates are given below:

Generating a notification text file named after the Batch ID.
File Name: NotificationFile_${BatchId}$.txt
Generated email subject that contains the Job Name and Batch Status
Job ($JobName) is finished as: $BatchStatus.
Creates a message with the job name, and extra content if the batch status is not DONE.
Job ($JobName) is complete.
<% if (BatchStatus != 'DONE')  { %> Please reviews the completed batch : $BatchStatus. <% } %>
Generates an HTML content with a formatted list of entities.
<p>Job ($JobName) is complete.</p>
<p>Entities:</p>
<ul>
<% EntityNames.each() { entityName-> %>
        <li>entityName</li>
<% } %>
</ul>

Configuring Variable Value Providers

Semarchy xDM uses variables defined in models to enforce certain data governance policies for a user’s session.
For more information about model variables, see the Semarchy xDM Developer’s Guide.

A Variable Value Provider is a system that can be queried by Semarchy xDM to retrieve the values for these variables. Typically, this system is a server containing information about the user connected to Semarchy xDM.

Two type of variable value providers are supported out-of-the-box:

  • Datasource Variable Provider: This variable value provider is a relational database that is accessed through a JDBC datasource. Semarchy xDM can issue SQL statements against this database to retrieve variable values. For example, an employee database that can be queried to retrieve the country of the connected user.

  • LDAP Variable Provider: This variable value provider is a directory server that is accessed using the LDAP protocol. Semarchy xDM can issue queries against this directory server to retrieve variable values. For example, an LDAP directory that can be used to retrieve the organizational unit of the connected user.

Variable value providers are configured in the repository, and can be used by any model in this repository.

When working with a deployment repository, make sure to configure the variable value providers used in the models before importing or deploying them in this repository.

Creating a Variable Value Provider

To create a variable value provider:

  1. In the Administration view, double-click the Variable Value Providers node. The Variable Value Providers editor opens.

  2. Select the Variable Value Providers list, right-click and then select image New Variable Value Provider. The Install Variable Value Provider wizard opens.

  3. Enter the following information:

    • Name: Internal name of the variable value provider.

    • Label: User-friendly label for the variable value provider. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

  4. Select the Plug-in ID corresponding to the variable value provider type: LDAP Variable Provider or Datasource Variable Provider.

  5. Click Next.

  6. Click the image Edit Expression button.

  7. In the the Variable Value Configuration dialog, enter the configuration information.
    This information differs depending on the selected Plug-In.

    • For a Datasource Variable Value Provider, enter the following information:

      • JNDI Data Source Name: Select a JDBC datasource in the list.
        This datasource must be defined in the application server. It is used to connect the database acting as the variable value provider.

    • For an LDAP Variable Value Provider, enter the following information:

      • LDAP Host: Name or IP address of the LDAP server.

      • LDAP Port: Listening port of the LDAP server. The port is typically 389 for non-SSL connections, and 636 for SSL connections.

      • Use SSL: Check this options to use SSL to connect to the LDAP server.

      • User: Name of the user used to retrieve data from the LDAP Server. Note that this user should have read privileges to the LDAP structure.

      • Password: This user’s password.

  8. Click OK to close the Variable Value Configuration dialog.

  9. Click Finish.

The variable value provider is added to the list.

Testing the Variable Value Provider Configuration

After configuring a new variable value provider, it is recommended to test its configuration.

to test a variable value provider configuration:

  1. In the Variable Value Providers editor, select of the variable value provider in the list.

  2. Right-click and select Test Configuration.

A message indicates whether the connection test was successful or not.

The configuration test only tests the connection information, but does not check the privileges granted to the user to retrieve the values from the provider.

Managing Plug-ins

Semarchy xDM allows extending its capabilities using Java code and external APIs. Using the Open Plug-in Architecture, existing services or information systems can contribute to the master data processing and enrichment. You can extend the Enrichment and Validation capabilities in Semarchy xDM through user-defined plug-ins.

For detailed information about plug-in development and packaging, see the Semarchy xDM Plug-in Development Guide.

A Plug-in is delivered as a jar file bundle that must be deployed in each Semarchy xDM application instance running integration jobs that use the plug-in. You do not need to restart the server to take new or updated bundles into account.

These bundles are tagged with a version number. As a consequence, updating an existing plug-in with a newer version of this plug-in will automatically make the platform work with the newer plug-in version. The deployment process installs a new plug-in or replaces an existing plug-in version with a new one.

To deploy a plug-in:

  1. Open the Administration Console perspective.

  2. Double-click the Plug-ins node in the Administration view.

  3. Click the image Install or Update Plug-in button in the upper right corner of the Plug-ins editor. The Install/Update Plug-ins dialog opens.

  4. Click the Browse button and select the plug-in binary file. For example: com.acme.phoneStandardizer_1.0.0.jar.

  5. Click OK. A Status window shows the number of plug-ins installed or updated.

  6. Your session is closed to take this new plug-in into account. Click the link to restart the session on the Overview perspective.

  7. Open the Administration Console perspective.

  8. Double-click the Plug-ins node from the Administration view.

The plug-in now appears in the list, and can be used in the models and the integration jobs.

Make sure to install the plug-ins required by the jobs of a model before deploying the model. If a job starts before its required plug-ins are installed, then it will fail. The plug-in can be installed and the job resumed after the installation.

To uninstall a plug-in:

  1. Open the Administration Console perspective.

  2. Double-click the Plug-ins node in the Administration view.

  3. Select the plug-in in the list.

  4. Click the image Uninstall Selected Plug-ins button in the editor’s toolbar.

Managing Applications Configuration

The global behavior of the MDM Applications is configured from the Administration Console Perspective.

Changes performed in this configuration apply to all the applications started in the instance of Semarchy xDM.
The semarchyAdmin role is required to configure the global application parameters.

To set the Applications Configuration:

  1. Open the Administration Console perspective.

  2. Double-click the Applications Configuration node in the Administration view. the Applications Configuration editor opens.

This editor displays the global application parameters:

  • Configure the Server Base URL of your Semarchy xDM installation. This URL is seeded when creating the repository and used in workflow notifications to provide links to the server.

  • Set the CSV Export Limit and Excel Export Limit to define the maximum export size allowed in each format. Note that generating export files is resource consuming on the server. It is recommended to test the scalability of the new export limits.

  • Change the Header Logo that appears for all applications and in the welcome page by uploading a picture. Note that this picture is scaled to a height of 96 pixels.
    You can also change the Favicon used for this page.

Configuring the Logging

Logging is used to troubleshoot the Semarchy xDM platform issues and monitor the platform activity.

The default logging configuration set up in Semarchy works in most cases. Most of the times, you do not need to change this configuration.
You can reset the logging configuration to the default values by clicking the Restore Logging Configuration to Default button in the Logging Configuration editor toolbar.

By default, the important log events appear in the workbench Error view. To access this error view, in the Workbench, choose the Window > Show View…​ > Others and then select General > Error Log.

Semarchy xDM uses Apache Log4J to log its activity. This section provide key concepts and configuration samples to work with Log4J. Always refer to the Log4J Manual for reference information.

Log4J works with three key concepts: Appenders, Loggers and Log Levels.

  • The Appenders deliver log events to a destination. For example they write events to a file on the application server file system, to a message queue or to the Semarchy xDM Workbench Errors view.

  • The Loggers capture log events emitted by software certain components in Semarchy xDM.

  • The Log Levels define at the same time the granularity and importance of log events.

Each log events is emitted with a certain log level, is captured by a logger that sends it to an appender for delivery.

Setting the Logging Configuration

Appenders, loggers and log levels are configured through the Logging Configuration editor.

To set the Logging Configuration:

  1. Open the Administration Console perspective.

  2. Double-click the Logging Configuration node in the Administration view. The Logging Configuration editor opens.

  3. Change the configuration in the editor.

  4. Press CTRL-S to save this editor.
    The new logging configuration is immediately taken into account.

Understanding Log Levels

The Log Level define the granularity and importance of a log event. The log level is one of the following: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

These values are ordered from highest to lowest granularity, but also in order of importance: The DEBUG level is very granular and most messages are not relevant in the normal course of operations, whereas the ERROR and FATAL messages will appear less frequently but are more important.

When you capture events, via a logger, at a certain level, all events of higher importance are also captured. For example, when setting a logger to INFO, then the WARN, ERROR, and FATAL events are also captured.

When parameterizing a logger, it is possible to set the level to ALL to capture all log events, and to OFF to disable log capture.
INFO is typically a good level to start with, as it captures information messages as well as possible warnings and errors.

Defining Appenders

Appenders deliver log events to a destination. Appenders are defined with a name and an appender class (that is, a Java class the writes log events to a file, to a JMS message queue, to an email, etc).
Appender classes may have one or more properties to configure their behavior. They also support Layout properties to define the format of the logged event.

In the logging configuration, the appender is defined with the following syntax:

log4J.appender.<appender_name> = <appender_class>

Each property for this appender is defined with the following syntax:

log4J.appender.<appender_name>.<property_name> = <property_value>
Sample Appenders Configuration

Sample appender configurations are provided in this section.

Configuring an appender called FILE to write to a daily rolling log file.
# Appender Class
log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender 	(1)

# Appender Properties
log4j.appender.FILE.DatePattern='.'yyyy-MM-dd'.log'		(2)
log4j.appender.FILE.file=${user.home}/.semarchy/logfile.log 	(2)

log4j.appender.FILE.append=true 				(3)
log4j.appender.FILE.encoding=UTF-8 				(3)

# Appender Layout Configuration
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout 	(4)
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH\:mm s S}-%t-%x-%-5p-%-10c\:%m%n (4)
1 The DailyRollingFileAppender appender class creates a daily logging file. You can also use the FileAppender to write to a simple file.
2 The daily log file is located in the .semarchy/ sub-directory of the user home directory. Its name is logfile.log to which the date, formatted with the pattern, is concatenated.
3 These properties configure the appender behavior: Add log events at the end of the daily file, and use UTF-8 encoding.
4 Logged event are formatted using the PatternLayout formatting class and a conversion pattern.
This pattern renders log events in the following format:
2010-03-23-main—​DEBUG-log4jExample:Here is a debug message
Configuring an appender called PDE to log to the Workbench Error view
# Appender writing to the Workbench Error view
log4j.appender.PDE=com.semarchy.commons.log4j.appender.pde.PDEAppender 	(1)
log4j.appender.PDE.Threshold=INFO 					(2)
1 This appender class logs events into the Semarchy Workbench Error view.
2 Only log events more important that this threshold will be considered by the appender. Make sure to align this value with the loggers that send events to this appender.
Configuring an appender called SMTP to send log events by email
# Appender sending emails via an SMTP host (usually for important error messages)
log4j.appender.SMTP = org.apache.log4j.net.SMTPAppender 		(1)
log4j.appender.SMTP.SMTPHost = <smtp_host_name> 			(2)
log4j.appender.SMTP.SMTPPort = <smtp_host_port> 			(2)
log4j.appender.SMTP.From = email@my-domain.com  			(2)
log4j.appender.SMTP.To = admin1@my-domain.com,admin2@my-domain.com 	(3)
log4j.appender.SMTP.Subject = {mdm-full-product-name} Log Event 	(3)
log4j.appender.SMTP.layout =  org.apache.log4j.TTCCLayout 		(4)
1 This appender class sends log events by email.
2 Mail Server configuration. Note that additional properties may be set for the SMTP user, password, protocol, etc.
3 Email recipients and subject.
4 Log events are formatted using the TTCC Layout, a detailed format containing the time, thread, category and nested diagnostic context.

Defining Loggers

Loggers are named and organized as a hierarchy of classes emitting log events.
For example, com.semarchy is the parent of com.semarchy.mdm. The highest logger in the hierarchy is the rootLogger.

You configure loggers with a log level, and attach them to one or more appenders. Any event emitted by the class that is more important than (or as important as) the given log level is captured by the logger and sent to the appender.

Use the following syntax to attach a logger to appenders with a given log level:

log4j.logger.<loggerName> = <level>, <appender_name> [,<appender_name>, …]

For example, to send DEBUG, INFO, WARN, ERROR and FATAL events emitted by the com.semarchy.platform.setup.IPlatformManager class to the appender named PDE:

log4j.logger.com.semarchy.platform.setup.IPlatformManager=DEBUG, PDE

Logger inherit their appenders additively from their parents in the hierarchy. They can override the level provided by their parent.

In the following configuration, the com.semarchy.platform.setup.IPlatformManager logger inherits from the PDE appender of the com.semarchy logger, and sends ERROR events to the PDE appender in addition to the SMTP appender.

log4j.logger.com.semarchy=INFO, PDE
log4j.logger.com.semarchy.platform.setup.IPlatformManager=ERROR, SMTP

To prevent such inheritance, use the following syntax:

log4j.additivity.<logger_name> = false

For example, in the the following configuration, com.semarchy.platform.setup.IPlatformManager will no longer log into the PDE appender in addition to SMTP.

log4j.logger.com.semarchy=INFO, PDE
log4j.logger.com.semarchy.platform.setup.IPlatformManager=ERROR, SMTP
log4j.additivity.com.semarchy.platform.setup.IPlatformManager=false

The following configuration is the default one for Semarchy xDM.

Sample logger configuration for Semarchy xDM.
log4j.logger.com.semarchy=INFO
log4j.logger.com.semarchy.commons.sql=INFO
log4j.logger.com.semarchy.mdm.datahub.services.query.datamgr.IDataManager=INFO
log4j.logger.com.semarchy.platform.engine=INFO
log4j.logger.com.semarchy.platform.engine.PluginExecution=INFO
log4j.logger.com.semarchy.platform.engine.core.impl.product.SL4JExecutionMonitor=INFO
log4j.logger.com.semarchy.platform.product.notification.JobNotificationHandler=ERROR
log4j.logger.com.semarchy.platform.setup.IPlatformManager=INFO
log4j.logger.org=INFO
log4j.logger.org.apache.aries.blueprint=WARN
log4j.logger.org.apache.commons.beanutils.converters=WARN
log4j.logger.org.apache.cxf=WARN
log4j.logger.org.apache.cxf.interceptor.LoggingInInterceptor=WARN
log4j.logger.org.apache.cxf.interceptor.LoggingOutInterceptor=WARN
log4j.logger.org.apache.directory.shared.asn1.ber.Asn1Decoder=ERROR
log4j.logger.org.ops4j.pax.logging=WARN
log4j.logger.org.quartz=WARN
log4j.logger.org.springframework.jdbc.core.JdbcTemplate=INFO
log4j.rootLogger=INFO, PDE

The following table lists the various loggers available in Semarchy xDM:

Logger Name Description

com.semarchy

Root logger for the Semarchy platform.

com.semarchy.commons.sql

TRACE logs SQL queries made by the Semarchy workbench and administration console.

com.semarchy.mdm.datahub.services.query.datamgr.IDataManager

Logs database operations performed by the platform for the MDM applications while browsing data. DEBUG logs all queries and execution times.

com.semarchy.platform.engine.core.impl.DefaultStandaloneEngineImpl

Logs execution engine events. DEBUG logs all submitted jobs, and ERROR can be used to troubleshoot engine issues.

com.semarchy.platform.engine.core.impl.product.SL4JExecutionMonitor

Logs execution engine job processing. Use DEBUG to see execution engine details.

com.semarchy.platform.engine.PluginExecution

Logs enricher and validation plugins execution. DEBUG logs plugins feedback, TRACE logs the processing of every row.

com.semarchy.platform.integration.polling.IntegrationLoadDequeuer

Logs the Batch Poller activity. DEBUG logs every polled interval.

com.semarchy.platform.product.notification.JobNotificationHandler

Logs job notifications. ERROR traces notification failures.

com.semarchy.platform.setup.IPlatformManager

Logs platform status changes. INFO traces normal status changes. ERROR traces abnormal statuses.

org.apache.cxf.interceptor.LoggingInInterceptor

INFO logs incoming REST requests from the MDM applications to the server.

org.apache.cxf.interceptor.LoggingOutInterceptor

INFO logs outgoing REST responses from the server to the MDM applications.

 org.springframework.jdbc.core.JdbcTemplate

DEBUG traces the SQL queries made by the MDM applications.

Managing Execution

The Execution Engine processes the jobs submitted by the integration batch poller. It orchestrates the certification process for golden data.

Understanding Jobs, Queues and Logs

Jobs are processing units that run in the execution engine. There are two main types of jobs running in the execution engine:

  • Integration Jobs that process incoming batches to perform golden data certification.

  • Deployment Jobs that deploy new model editions in data locations.

Jobs are processed in Queues. Queues work in First-In-First-Out (FIFO) mode. When a job runs in the queue, the next jobs are queued and wait for their turn to run. To run two jobs in parallel, it is necessary to distribute them into different queues.

Queues are grouped into Clusters. There is one cluster per Data Location, named after the data location.

System Queues and Clusters

Specific queues and cluster exist for administrative jobs:

  • For each data location cluster, a specific System Queue called SEM_SYS_QUEUE is automatically created. This queue is used to run administrative operations for the data location. For example, this queue processes the deployment jobs updating the data structures in the data location.

  • A specific System Cluster cluster called SEM_SYS_CLUSTER, which contains a single SEM_SYS_QUEUE queue, is used to run platform-level maintenance operations.

Job Priority

As a general rule, integration jobs are processed in their queues, and have the same priority.
There are two exceptions to this rule:

  • Jobs updating the data location, principally model edition Deployment Jobs.

  • Platform Maintenance Jobs that updating the entire platform.

Model Edition Deployment Job

When a new model edition is deployed and requires data structure changes, DDL commands are issued as part of a job called DB Install<model edition name>. This job is launched in the in the SEM_SYS_QUEUE queue of the data location cluster.

This job modifies the tables used by the DML statements from the integration jobs. As a consequence, it needs to run while no integration job runs. This job takes precedence over all other queued jobs in the cluster, which means that:

  1. Jobs currently running in the cluster are completed normally.

  2. All the queues in the cluster, except the SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs remain in the queue and are no longer executed.

  3. The model edition deployment job is executed in the SEM_SYS_QUEUE.

  4. When this job is finished, the other queues return to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimum downtime of the integration activity while avoiding conflicts between integration jobs and model edition deployment.

Platform Maintenance Job

If a job is queued in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE queue, it takes precedence over all other queued jobs in the execution engine.

This means that:

  1. Jobs currently running in all the clusters are completed.

  2. All the clusters and queues except the SEM_SYS_CLUSTER/SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs are no longer executed in these queues/clusters.

  3. The job in the in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE is executed.

  4. When this job is finished, the other queues are moved to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimal disruption of the platform activity while avoiding conflicts between the platform activity and maintenance operations.

Queue Behavior on Error

When a job running in a queue encounters a run-time error, it behaves differently depending on the queue configuration:

  • If the queue is configured to Suspend on Error, the job hangs on the error point, and blocks the rest of the queued jobs. This job can be resumed when the cause of the error is fixed, or can be canceled by user choice.

  • If the queue is not configured to Suspend on Error, the job fails automatically and the next jobs in the queue are executed. The failed job cannot be restarted.

The integration jobs are processed in a FIFO mode, a job that is failed automatically or canceled by user choice cannot be restarted. To resubmit the source data for certification, the external load needs to be resubmitted entirely as a new load.
The integration job performs a commit after each task. As a consequence, when a job fails or is suspended, already processed entities have their golden data certified and committed in the hub.
A user can explicitly choose to halt a running job by suspending it. When such a use operation is performed, the job is considered in error and can be restarted or canceled.

Suspending a job on error is the preferred configuration under the following assumptions:

  1. All the data in a batch needs to be integrated as one single atomic operation.
    For example, due to referential integrity, it is not possible to integrate contacts without customers and vice versa. Suspending the job guarantees that it can be continued - after fixing the cause of the error - with the data location preserved in the same state.

  2. Batches and integration jobs are submitted in a precise sequence that represents the changes in the source, and need to be processed in the order they were submitted.
    For example, missing a data value change in the suspended batch that may impact the consolidation of future batches. Suspending the job guarantees that the jobs are processed in their exact submission sequence, and no batch is skipped without an explicit user choice.

There may be some cases when this behavior can be changed:

  • If the batches/jobs do not have strong integrity or sequencing requirement, then they can be skipped on error by default. These jobs can run in a queue where Suspend on Error is disabled.

  • If the integration velocity is critical for making golden data available as quickly as possible, it is possible to configure the queue running the integration job with Suspend on Error disabled.

Queue Status

A queue is in one the following statuses:

  • READY: The queue is available for processing jobs.

  • SUSPENDED: The queue is blocked because a job has encountered an error or was suspended by the user. This job remains suspended. Queued jobs are not processed until the queues becomes READY again, either when the job is cancelled or finishes successfully. For more information, see the Troubleshooting Errors section.

  • BLOCKED: When a job is running in the SEM_SYS_QUEUE queue of the cluster, the other queues are moved to this status. Jobs cannot be executed in a blocked queue and remain queued until the queue becomes READY again.

A cluster can be in one the following statuses:

  • READY: The cluster is not blocked by the SEM_SYS_CLUSTER cluster, and queues under this cluster can process jobs.

  • BLOCKED: The cluster is blocked when a job is running in the SEM_SYS_CLUSTER cluster. When a cluster is blocked, all its attached queues are also blocked.

Managing the Execution Engine and the Queues

Accessing the Execution Engine

To access the execution engine:

  1. In the Administration view, double-click the Execution Engine node.
    The Execution Engine editor opens.

The Execution Engine Editor

This editor displays the list of queues, grouped by clusters. If a queue is currently pending on a suspended job, it appear in red.

From the Execution Engine editor, you can perform the following operations:

Changing the Queue Behavior on Error

See the Troubleshooting Errors and the Queue Behavior on Error sections for more information about queue behavior on error and error management.

To change the queue behavior on error:

  1. In the Administration view, double-click the Execution Engine node. The Execution Engine editor opens.

  2. Select or de-select the Suspend on Error option for a queue to set its behavior on error or on a cluster to set the behavior of all queues in this cluster.

  3. Press CTRL+S to save the configuration. This configuration is immediately active.

Opening an Execution Console for a Queue

The execution console provides the details of the activity of a given queue. This information is useful to monitor the activity of jobs running in the queue, and to troubleshoot errors.

The content of the execution console is not persisted. Executions prior to opening the console are not displayed in this console. Besides, if the console is closed, its content is lost.

To open the execution console:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears.

  2. Select the queue, right-click and select Open Execution Console.
    The Console view for this queue opens. Note that it is possible to open multiple execution consoles to monitor the activity of multiple queues.

In the Console view toolbar you have access to the following operations:

  • The image Close Console button closes the current console. The consoles for the other queues remain open.

  • The image Clear Console button clears the content of the current console.

  • The image Display Selected Log button allows you to select one of the execution consoles currently open.

Suspending a Job Running in a Queue

To restart a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears..

  2. Select the queue that contains one Running Job.

  3. Right-click and then select Suspend Job.
    The job is suspending and the queue switches to the SUSPENDED status.

Suspending the job is an operation that should be performed with care, as respecting the sequence of the submitted job have strong impact on the consistency of the data in the hub.

Restarting a Suspended Job in a Queue

To restart a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.

  2. Select the suspended queue.

  3. Right-click and then select Restart Job.
    The job restarts from the failed step. If the execution console for this queue is open, the details of the execution are shown in the Console.

Canceling a Suspended Job in a Queue

To cancel a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.

  2. Select the suspended queue.

  3. Right-click and then select Cancel Job.
    The job is canceled, the queue become READY and starts processing queued jobs.
    In the job logs, this job appears in Error status.

Managing Jobs Logs

The job logs display the jobs being executed or executed in the past by the execution engine. Reviewing the job logs allows you to monitor the activity of these jobs and troubleshoot execution errors.

Accessing the Job Logs

To access the logs:

  1. Open the Administration Console perspective.

  2. In the Administration View, double click the Job Logs node.

  3. The Job Logs editor opens.

The Job Logs Editor

From this editor you can review the job execution logs and drill down into these logs.

The following actions are available from the Job Logs editor toolbar.

  • Use the image Refresh button to refresh the view.

  • Use the image Auto Fit Column Width button to adjust the size of the columns.

  • Use the image Apply and Manage User-Defined Filters button to filter the log. See the Filtering the Logs section for more information.

  • Use the image Purge Selection button to delete the entries selected in the job logs table. See the Purging the Logs section for more information.

  • Use the image Purge using a Filter button to purge logs using an existing or a new filter. See the Purging the Logs section for more information.

Drilling Down into the Logs

The Job Logs editor displays the log list. This view includes:

  • The Name, Start Date, End Date and Duration of the job as well as the name of its creator (Created By).

  • The Message returned by the job execution. This message is empty if the job is successful.

  • The rows statistics for the Job:

    • Select Count, Insert Count, Update Count, Deleted Count: number of rows selected, inserted, updated, deleted, merged as part of this job.

    • Row Count: Sum of all the Select, Insert, etc metrics.

To drill down into the logs:

  1. Double-click on a log entry in the Job Logs editor.

  2. The Job Log editor open. It displays all he information available in the job logs list, plus:

    • The Job Definition: This link opens the job definition for this log.

    • The Job Log Parameters: The startup parameters for this job. For example, the Batch ID and Load ID.

    • The Tasks: In this list, task groups are displayed with the statistics for this integration job instance.

  3. Double-Click on task group in the Tasks list to drill down into sub-task groups or task.

  4. Click on the Task Definition link to open the definition of a task.

By drilling down into the task groups down to the task, it is possible to monitor the activity of a job, and review in the definition the executed code or plug-in.

Filtering the Logs

To create a job log filter:

  1. In the Job Logs editor, click the image Apply and Manage User-Defined Filters button and then select Search. The Define Filter dialog opens.

  2. Provide the filtering criteria:

    • Job Name: Name of the job. Use the _ and % wildcards to represent one or any number of characters.

    • Created By: Name of the job creator. Use the _ and % wildcards to represent one or any number of characters.

    • Status: Select the list of job statuses included in the filter.

    • Only Include: Check this option to limit the filter to the logs before/after a certain number of executions or a certain point in time. Note that the time considered is the job start time.

  3. Click the Save as Preferred Filter option and enter a filter name to save this filter.

Saved filters appear when you click the Apply and Manage User-Defined Filters button.
You can enable of disable a filter by marking it as active or inactive from this menu. You can also use the Apply All and Apply None to enable/disable all saved filters.

Filters are saved in the user preferences and can be shared using preferences import/export.

To manage job log filters:

  1. Click the image Apply and Manage User-Defined Filters button, then select Manage Filters. The Manage User Filters editor opens.

  2. From this editor, you can add, delete or edit a filter, and enable disable filters for the current view.

  3. Click Finish to apply your changes.

Purging the Logs

You can purge selected job logs or all job logs returned by a filter.

To purge selected job logs:

  1. In the Job Logs editor, select the job logs you want to purge. Press the CTRL key to select multiple lines or the SHIFT key to select a range of lines.

  2. Click the image Purge Selection button.

  3. Click OK in the confirmation window.
    The selected job logs are deleted.

To purge filtered job logs:

  1. In the Job Logs editor, click the image Purge using a Filter button.

    • To use an existing filter:

      1. Select the Use Existing Filter option.

      2. Select a filter from the list and then press Finish.

    • To create a new filter:

      1. Select the Define New Filter option and then click Next.

      2. Provide the filter parameters, as explained in the Filtering the Logs section and then click Finish.

    • To purge all logs (no filter):

      1. Select the Purge All Logs (No Filter) option and then click Finish.

The jobs logs are purged.

Troubleshooting Errors

When a job fails, depending on the configuration of the queue into which this job runs, it is either in a Suspended or Error status.

The status of the job defines the possible actions on this job.

  • A job in Error cannot be continued or restarted. It can be reviewed for analysis, and possible fixes will only affect subsequent jobs.

  • A Suspended job blocks the entire queue, and can be restarted after fixing the problem, or cancelled.

You have several capabilities in Semarchy xDM to help you troubleshooting issues. You can drill down in the erroneous task to identify the issue or restart the job with the Execution Console activated

To troubleshoot an error:

  1. Open the Job Logs.

  2. Double-click the log entry marked as image Suspended or in image Error.

  3. Drill down into the Task Log, as explained in the Drilling Down into the Logs section.

  4. In the Task Log, review the Message.

  5. Click the Task Definition link to open the task definition and review the SQL Statements involved, or the plug-in called in this task.

Scheduling Data Purges

Data Purge helps you maintain a reasonable storage volume for the MDM hub and the repository by pruning the history of data changes and job logs.

Introduction to Data Purge

The MDM hub stores the lineage and history of the certified golden data, that is the data that led to the current state of the golden data.

Preserving the lineage and history is a master data governance requirement. It is key in a regulatory compliance focus. However, keeping this information may also create a large volume of data in the hub storage.

To make sure lineage and history are preserved according to the data governance and compliance requirements, model designers will want to define Data Retention Policy for the model.

When a model is deployed to a data location, a Purge Job is automatically created to handle data pruning according to the retention policy. The purge job prunes the lineage and history data according to the retention policy. Optionally, it prunes the job logs, batches, loads, direct authoring, duplicate manager and workflow instances when all their data is purged.

To keep a reasonable volume of information, administrators have to schedule regular executions of this job.

Configuring a Purge Schedule

To create a purge schedule:

  1. Open the Data Locations perspective.

  2. Expand the data location for which you want to configure a purge.

  3. Double-click the Purge node. The Purge Schedule editor opens.

  4. Select or un-select the Active checkbox to make the purge schedule active or inactive.

  5. Click the image Edit button, and set the schedule for the purge with a purge frequency (Monthly, Weekly, Daily) or as a Cron Expression.

  6. Click OK to save the schedule.

  7. Select the Purge Repository Artifacts option to prune the job logs, batches, loads, direct authoring, duplicate manager and workflow instances when all their data is purged.

  8. Press CTRL+S to save the editor.

Regardless of the frequency of the purges scheduled by the administrator, the data history retained is as defined by the model designer in the data retention policies.

Managing the Security

The application uses role-based security and privilege grants for accessing the Semarchy xDM features as well as the data contained in the MDM hub.

Understanding the Security Model

Platform-Level and Model-Level Security

There are two levels of security in Semarchy xDM:

  • Platform-Level Security defines access to the features of the platform. For example, access to the administrative features, or access to the design-time capabilities. Platform-level security sets platform users’ privileges (who can design models, monitor executions, manage security, etc.), and should be managed by the platform administrator.

  • Model-Level Security defines security privileges to access and modify data in the data locations. Defining these privileges is a data governance decision and should be defined as part of the data governance initiative. Defining Model Security is covered in the Securing Data chapter of the Semarchy xDM Developer’s Guide.

Role-Based Security

Both levels of security are role-based:

  • The Privileges (platform level/model level) are granted to Roles in Semarchy xDM.

  • These Roles are declared in Semarchy xDM. The roles declared in Semarchy xDM must map roles that pre-exist in the application server. These application server roles are created and granted to application server users as part of the application server configuration

  • Users logging in to Semarchy xDM use their application server credentials. Users, passwords, groups and roles are not owned or stored in Semarchy xDM.

Depending on the application server hosting the Semarchy xDM application, the roles/user association may be made through a concept of group: A user belongs to a group and the role is granted to the group.
Depending on its configuration, the application server may delegate user authentication and management in general to a security provider (SSO, LDAP, etc…).

Note that roles are not only used for security purposes. They are also used as email aliases for email notifications.

Security Context

When you log in to the Semarchy Workbench:

  1. You enter the user and password in the Semarchy xDM login window.

  2. This information is passed to the application server.

  3. The application server itself or its security provider (SSO, LDAP, etc.) authenticates the user, gets the list of roles associated to this user (possibly via groups) and returns this list of roles in the session’s Security Context.

  4. Semarchy xDM starts a session with this security context, allowing:

    • Certain platform features depending on the Platform-Level Privileges granted to the roles.

    • Certain data access/modification capabilities depending on the Model-Level Privileges granted to the roles.

Role Names are Case-Sensitive
The role names defined in Semarchy must exactly match the role names returned by the security provider.
For example, for a Tomcat installation, you define in the tomcat-user.xml file a role named FINANCEADMIN, and in Semarchy, you declare a FinanceAdmin role. These two roles will not match. Tomcat users with the FINANCEADMIN role will not be granted the privileges defined in Semarchy for the FinanceAdmin role.
Semarchy xDM enforces security at several layers in the application. Insufficient privileges for a user will reflect in the user interface as missing elements or disabled menus. For example, a user with no privileges on Data Location will not see any of the Data Location links in his Overview perspective.
Semarchy xDM does not store the users, password and and user/roles associations. All this critical information remains in the application sever or in the enterprise security provider.

Privilege Precedence

Privileges apply in order of precedence: Read/Write then Read then None. As a consequence, a user always has the best privileges associated to his roles.

For example: The user John has two user-defined roles granted to him:

  • ModelDesigner has Read privileges for Job and Job Log Administration and Read/Write for Model Design.

  • ProductionUser has Read/Write privileges for Job and Job Log Administration and None on for Model Design

The resulting privileges for John are Read/Write for both Job and Job Log Administration and Model Design.

Privileges Description

The following table describes the platform privileges you can grant to a role:

Platform Privilege Description

Data Location

Grants access to all components of the Data Locations perspective and to the Notification Servers and the Variable Value Providers in the Administration Console perspective. Write privileges are needed to create data locations and deploy new model editions. Write privileges are also required to create and modify variable value providers and notification servers.

Model Design

Grants access to all the components of the Model Administration (to manage model editions/version control) and Model Edition (to design models) perspective. Write privileges are needed to modify models, create new model editions and manage image libraries.

Execution Engine

Grants access to the Execution Engine and Integration Batch Poller components in the Administration Console Perspective. Write privileges are needed to start/stop and configure these components.

Job and Job Log Administration

Grants access to Job Logs and Job Definitions in the Administration Console Perspective. Write privileges are needed to purge the logs. Note that you need the Execution Engine privileges to restart jobs in queues.

Logging Configuration

Grants access to the Logging Configuration component in the Administration Console Perspective. Write privileges are needed to modify this configuration.

Plug-ins Administration

Grants access to the Plug-ins component in the Administration Console Perspective. Write privileges are needed to add new plug-ins.

Built-in Roles

The following roles are built in the platform:

  • semarchyConnect: This role must be granted for a user to log in. It should be granted by default to all users connecting to Semarchy xDM.

  • semarchyAdmin : This role has full access to all the features of the platform. semarchyAdmin is the only role that gives you access to the Roles in the Administration Console perspective.

When a creating a new model, a model-level privilege grant is automatically created for the semarchyAdmin role, giving this role full access to the data. By modifying this privilege grant, the model designer can reduce the privileges of the semarchyAdmin role on the data.
Be cautious when granting the semarchyAdmin role. This role defines a super user who can create roles, grant privileges and update the license information.

Managing Roles and Privileges

Creating the Roles and Users in the Application Server Security Realm

Before declaring a new role in Semarchy xDM, make sure that this role is defined in the application server and that a user is granted with this role and the semarchyConnect role to log in to Semarchy xDM.

The role/user creation procedure depends on the application server hosting Semarchy xDM. Please refer to your application server documentation for more information.

An example is given below for creating a role and a user for Apache Tomcat.

To configure a new role and user for Semarchy xDM:

  1. Stop the Apache Tomcat Server using the stop the Apache Tomcat server using <tomcat>/bin/shutdown.bat (Windows) or <tomcat>/bin/shutdown.sh (UNIX/Linux), where <tomcat> is the Apache Tomcat installation folder.

  2. Edit the <tomcat>/conf/tomcat-users.xml file.

  3. In the <tomcat-users> section, add the following lines (<password> is the password for this user):

     <role rolename="MDMDev">
     <user username="john" password="<password>" roles="semarchyConnect,MDMDev"/>
  4. Save the file.

  5. Restart the Apache Tomcat server using <tomcat>/bin/startup.bat (Windows) or <tomcat>/bin/startup.sh (UNIX/Linux).

A new role MDMDev is created. The user john is also created with the semarchyConnect built-in role and the MDMDev role.

Declaring the Roles in Semarchy xDM

To create new role:

  1. Open the Administration Console perspective.

  2. In the Administration View, double click the Roles node.

  3. In the Roles editor, right-click Roles table and select image New Role. The Install Role wizard opens.

  4. Enter the following information:

    • Name: Role name. This role name must exactly match the role name in the application server security configuration. For example: MDMDev.

    • Label: User-friendly label for the role. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

    • Email(s): Enter a comma-separated list of email addresses of recipients for notifications sent to this role.

  5. Click Next.

  6. Select the privileges to grant to this role. For example: Model Design: Read/Write, Job and Job Log Administration: Read.

  7. Click Finish.
    The role is created. You can connect a user with this role to test the set of privileges.

Make sure to use a role name that matches exactly (with the same case) a role name defined in the application server configuration.

Sample Roles

You can use the following role examples in a typical Semarchy xDM configuration:

Platform Privilege Dev Operator Deployer

Data Location

Read

Read

Read/Write

Model Design

Read/Write

None

Read

Execution Engine

Read

Read/Write

Read

Job and Job Log Administration

Read

Read/Write

Read

Logging Configuration

None

Read/Write

None

Plug-ins Administration

None

Read

Read/Write

These roles are given as examples and should be adapted to your environment’s requirements.