Table of Contents

Welcome to Semarchy xDM.
This guide contains information about using Semarchy to deploy and manage models and applications.

Preface

Overview

Using this guide, you will learn how to:

  • Manage the lifecycle and versions of the logical models and applications developed in Semarchy Application Builder.
  • Deploy these models and applications for run-time.
  • Manage the execution of the deployed models and applications
  • Configure the data locations hosting these deployments.

Audience

This document is intended for users interested in using Semarchy xDM for their data management initiatives.

To discover Semarchy xDM, you can watch our tutorials.
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available on-line.

Document Conventions

This document uses the following formatting conventions:

ConventionMeaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Introduction to Semarchy xDM

Semarchy xDM is the Intelligent Data Hub platform for Master Data Management (MDM), Reference Data Management (RDM), Application Data Management (ADM), Data Quality, and Data Governance.
It provides all the features for data quality, data validation, data matching, de-duplication, data authoring, workflows, and more.

Semarchy xDM brings extreme agility for defining and implementing data management applications and releasing them to production. The platform can be used as the target deployment point for all the data in the enterprise or in conjunction with existing data hubs to contribute to data transparency and quality.
Its powerful and intuitive environment covers all use cases for setting up a successful data governance strategy.

Introduction to Application Management

Application management consists in:

  • Managing models, which includes their lifecycle, versions, export and import.
  • Managing model deployments into data locations.
  • Managing and monitoring job executions.
  • Managing the configuration of the data locations.

Application management tasks are performed in the Semarchy Application Builder, mainly using the Management perspective.

The Semarchy xDM REST API exposes all the capabilities available in the Management user interface to support scripted deployment and management.

Introduction to the Semarchy Application Builder

Connecting

To access Semarchy Application Builder, you need an URL, a user name and password that have been provided by your Semarchy xDM administrator.

To log in to the Semarchy Application Builder:

  1. Open your web browser and connect to the URL provided to you by your administrator. For example http://<host>:<port>/semarchy/ where <host> and <port> represent the name and port of host running the Semarchy xDM application. The Login Form appears.
  2. Enter your user name and password.
  3. Click Log In. The Semarchy xDM Welcome page opens.
  4. Click the Application Builder button. The Semarchy xDM Application Builder opens on the Model Design view.

To log out of the Semarchy Application Builder:

  1. In the Semarchy Application Builder User Menu, select Log Out.

The Application Builder

The Semarchy Application Builder is the graphical interface used by Semarchy xDM model designers and managers. This user interface exposes information in panels called Views and Editors .
A given layout of views and editors is called a Perspective .

The Semarchy Application Builder

The following sections describe the components of the Application Builder user interface.

Perspectives

There are two perspectives in Semarchy Application Builder:

  • Model Design: this perspective is used to edit or view a model edition.
  • Management: this perspective is used to:
    • Create and manage data locations, and deploy model editions in these locations.
    • Manage model editions and branches.
    • Manage Job executions.

You switch from one perspective to another by clicking the Design or Management button in the Application Builder header.

This guide covers model and application management tasks, which use mainly the Management perspective. Tasks involving the Model Design perspective are detailed in the Semarchy xDM Developer’s Guide

Tree Views

When a perspective opens, a tree view showing the objects you can work with in this view appears on the left hand-side of the screen.
In this tree view you can:

  • Expand and collapse nodes to access child objects.
  • Double-click a node to open the object’s editor.
  • Right-click a node to access all possible operations with this object.

Outline

Certain perspectives includes an Outline view that appears on the left hand-side of the screen.
This view shows the object currently opened in the editor (and all its child objects) in a tree view.

You can use the same expand, double-click, right-click actions in the outline as in the tree view.

Editors

An object currently being viewed or edited appears in an editor in the central part of the screen.
You can have multiple editors opened at the same time, each editor appearing with a different tab.

Editor Organization

Editors are organized as follows:

  • The editor has a local toolbar which is used for editor specific operations. For example, refreshing or saving the content of an editor is performed from the toolbar.
  • The editor has a breadcrumb that allows navigating up in the hierarchy of objects.
  • The editor has a sidebar which shows the various sections and child objects attached to an editor. You can click in this sidebar to jump to one of these sections.
  • Properties in the editors are organized into groups, which can be expanded or collapsed.
Saving an Editor

When the object in an editor is modified, the editor tab is displayed with a star in the tab name. For example, Contact* indicates that the content of the Contact editor has been modified and needs to be saved.
To save an editor, either:

  • Click the Save button in the editor toolbar.
  • Use the CTRL+S key combination.

You can also use the Save All button in the tree view toolbar, or press SHIFT+CTRL+S to save all modified editors.

Closing an Editor

To close an editor:

  • Click the Close (Cross) icon on the editor’s tab.
  • Use the Close option in the editor’s context menu (right-click on the editor’s tab).

You can also use the Close All option in the editor’s context menu (right-click on the editor’s tab) to close all open editors .

Accelerating Editing with CamelCase

In the editors and dialogs in Semarchy Application Builder, the Auto Fill checkbox accelerates object creation and editing.
When this option is checked and the object name is entered using CamelCase, the object Label as well as the Physical Name is automatically generated.
For example, when creating an entity, if you type ProductPart in the name, the label is automatically filled in with Product Part and the physical name is set to PRODUCT_PART.

Deleting Objects

The Semarchy xDM model is designed to maintain its consistency.

When deleting an object (for example using the Delete context menu action) that is referenced by other objects, an alert Unable to Delete Object appears.

This dialog indicates which references prevent the object deletion. Remove these references before deleting this object.

Working with Other Views

Other views (for example: Progress, Validation Report) appear in certain perspectives. These views are perspective-dependent.

You can open or re-open a view using the Show View…​ User Menu item.

User Preferences

User preferences are available to configure the Application Builder behavior.

Setting the Preferences

Use the Preferences item in the User Menu to open the preference dialog page.

The following preferences are available in the Preferences dialog:

  • General Preferences:
    • Exit Confirmation prompt the user to confirm when leaving the Application Builder.
    • Date Format: Format used to display the date values in the Application Builder. This format uses Java’s Simple Date Format patterns.
    • DateTime Format: Format used to display the date and time values in the Application Builder. This format uses Java’s Simple Date Format patterns.
    • Link Perspective to Active Editor: Select this option to automatically switch to the perspective related to an editor when selecting this editor.
Exporting and Importing User Preferences

Sharing preferences between users is performed using preferences import/export.

To export user preferences:

  1. Select the Export item in the User Menu. The Export wizard opens.
  2. Select Export Preferences in the Export Destination and then click Next.
  3. Click the Download Preferences File link to download the preferences to your file system.
  4. Click Finish to close the wizard.

To import user preferences:

  1. Select the Import item in the User Menu. The Import wizard opens.
  2. Select Import Preferences in the Import Source and then click Next.
  3. Click the Open button to select an export file.
  4. Click OK in the Import Preferences Dialog.
  5. Click Finish to close the wizard.

Importing preferences replaces all the current user’s preferences by those stored in the preferences file.

Models Management

Semarchy xDM supports out of the box metadata versioning.
When working with a model in the Semarchy Application Builder, the developer works on a Model Edition (version of the model).

Model management includes all the operations required to manage the versions of the models.

Introduction to Model Management

Model Editions

Model Changes are developed, managed and deployed as Model Editions. This version control mechanism allows you to freeze versions (called Editions) of a model at design-time and deploy them for run-time in a Data Location.

A data location always runs a given model edition. This means that this data location contains data organized according to the model structure in the given edition, and that golden data in this data location is processed and certified according to the rules of the given edition.

Model editions are identified by a version number. This version number format is <branch>.<edition>. The branch and model numbers start at zero and are automatically incremented as you create new branches or editions.

Example 1. Model version number example

The first edition of a model in the first branch has the version [0.0]. The fourth edition of the CustomerAndFinancialMDM model in the second branch has version number 1.3, and is typically referred to as CustomerAndFinancialMDM [1.3].

Open and Closed Model Editions

A Model Edition is at a given point of time either Open or Closed for editing.
Branching allows you to maintain two or more parallel Branches (lines) of model editions.

  • An Open Model Edition can be modified, and is considered as being designed. When a milestone is reached and the design is considered complete, the model can be closed. When a model edition is closed, a new model edition is created and opened. This new model edition contains the same content as the closed edition.
  • A Closed Model Edition can no longer be modified and cannot be reopened on the same branch. You can only edit this closed edition by reopening it on a different branch.
  • When a model from a previously closed edition needs to be reopened for editing (for example for maintenance purposes), a new branch based on this old edition can be created and a first edition on this branch is opened. This edition contains the same content as the closed edition it originates from.

Actions on Model Editions

Model Editions support the following actions:

  • Creating a New Model creates the first branch and first edition for the model.
  • Closing and Creating a New Edition of the model freezes the model edition in its current state, and opens a new edition of the model for modification.
  • Branching allows you to maintain several parallel branches of the model. You create a branch based on an existing closed model edition when you want to fork the project from this edition, or create a maintenance branch.
  • Deployment, to install or update a model edition in a data location.
  • Export and Import model editions, to transfer them between repositories.

These tasks are explained and details in the next chapters of this guide.

Typical Model Lifecycle

A typical model lifecycle is described below.

  1. The project or model manager creates a new model and the first model edition in the first branch.
  2. Model and application designers edit the model metadata. They perform their design tasks, as explained in the in the Semarchy xDM Developer’s Guide.
  3. When the designers reach a level of completion in their implementation, they deploy the model edition for testing, and afterwards deploy it again while pursuing their implementation and tests. Such actions are typically performed in a development data location. Sample data can be submitted to the data location for integration in the hub.
  4. When the first project milestone is reached, the project or model manager:
    1. Closes and create a new model edition.
    2. Deploys the closed model edition or exports the model edition for deployment on a remote repository.
  5. The project can proceed to the next iteration (go to step 2).
  6. When needed, the project or model manager creates a new branch starting from a closed edition. This may be needed for example when a feature or fix needs to be backported to a close edition without taking all the changes done on later editions.
Example 2. A Chronological Example

The following example shows the chronological evolution of a model through editions and branching:

  • January: a new CustomerHub model is created with the first branch and the first edition. This is CustomerHub [0.0].
  • March: The design of this model is complete. The CustomerHub [0.0] edition is closed and deployed. The new model edition automatically opened is CustomerHub [0.1].
  • April: Minor fixes are completed on CustomerHub [0.1]. To deploy these to production, the model edition CustomerHub [0.1] is closed, deployed to production and a new edition CustomerHub [0.2] is created and is left open untouched for maintenance.
  • May: A new project to extend the original hub starts. In order to develop the new project and maintain the hub deployed in production, a new branch with a first edition in this branch is created, based on CustomerHub [0.1] (closed). Two models are now opened: CustomerHub [0.2] which will be used for maintenance, and CustomerHub [1.0] into which new developments are added.
  • June: Maintenance fixes need to take place on the hub deployed in production.CustomerHub [0.2] is modified, closed and sent to production. A new edition is created: Customer [0.3].
  • August: Then new project completes. CustomerHub [1.0] is now ready for release and is closed before shipping. A new edition CustomerHub [1.1] is created and opened.

The following schema illustrates the timeline for the edition and branches. Note that the changes in the two branches are totally decoupled. Stars indicate changes made in the model editions:

Month    : January       March      April   May        June     August
Branch 0 : [0.0] -***-> [0.1] -*-> [0.2] ----------*-> [0.3] ----------->
Branch 1 :                    +-branching-> [1.0] -**-**-****-> [1.1] --->

At that stage, two editions on two different branches remain and are open: CustomerHub [1.1] and CustomerHub [0.3].

Considerations for Models Editions Management

The following points should be taken into account when managing the model editions lifecycle:

  • No Model Edition Deletion: It is not possible to delete old model editions. The entire history of the project is always preserved.
  • Use Production Data Locations: Although deploying open model editions is a useful feature in development for updating quickly a model edition, it is not recommended to perform updates on data location that host production data, and it is not recommended to use development data locations for production. The best practice is to have Production Data Locations that only allow deploying closed model edition for production data.
  • Import/Export for Remote Deployment: It is possible to export and import model from both deployment and development repositories. Importing a model edition is possible in a Deployment Repository if this edition is closed.
  • Avoid Skipping Editions: When importing successive model editions, it is not recommended to skip intermediate editions, as it is not possible import them at a later time. For example, if importing edition 0.1 of a model, then importing edition 0.4, the intermediate editions - 0.2 and 0.3 - can longer be imported in this repository.

Detailed Model Lifecycle

This section describes the detailed model development and deployment lifecycle.

Initial Setup and Deployment in Development

The following steps are required when creating a new model in a development environment.

  1. Model managers or designers create a new model. This operation creates also the first edition of the model.
  2. Designers create the first iteration of the model, including the logical model, the certification process rules and the applications.
  3. Designers run a validation when the model is stable and ready for the first tests.
  4. Model manager or designers create a development data location, using the model edition, to deploy and test the model.

The model is deployed and ready for testing. Integration specialists can load data into the data location and use the generated data management applications to view and manage the data.

Making Changes in Development

After the first development round, designers will repeatedly make changes to the model edition and test them in the development environment.

To test these changes in the development data location:

  1. Designers run a model validation to make sure that the model is valid.
  2. Designers or model managers deploy the model edition again, replacing the existing model edition by an updated one (with the same version number).

The updated model is immediately ready for testing. After the update:

  • Integration specialists shoud consider running the data loads for the updated jobs to reprocess the incoming data as needed.
  • Application testers should make sure to click the Refresh option in the application’s user menu (in the upper right corner of the application) to force a full refresh of the run-time application.

Releasing the Model

When the model is complete and tested, it is ready for release.

To release a model:

  1. Designers or model managers close the model edition. This operation freezes the current model edition and opens a new one.
  2. Model managers deploy the closed model edition using one of the following methods:

Developing Iteratively after a Release

When you close a model edition, a new model edition (for example, with version number [0.1]) is automatically created.

You can proceed with your next project iteration, starting with this new model edition:

  1. Model designers make changes to this model edition in development until the next project iteration is ready for release.
  2. When ready, model managers release this model edition.

If fixes are required on a previously released model edition, model managers can branch this old model edition, modify, then release it.

Working with Model Editions and Branches

Creating a New Model

Creating a new model creates the model with the first model branch and model edition.

To create a new model:

  1. Open the the Model Design view by clicking the Design button in the Application Builder header.
  2. If you are connected to a model edition, click the image Switch Model button to close the connected model edition.
  3. In the Model Design view, click the New Model… link. The Create New Model wizard opens.
  4. In the Create New Model wizard, check the Auto Fill option and then enter the following values:
    • Name: Internal name of the object.
    • Label: User-friendly label for this object. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.
  5. In the Description field, optionally enter a description for the Model.
  6. Click Finish to close the wizard. The new model is created, and opened in the Model Design view.
A new model is automatically configured to run on the same target database technology as the repository. You can select a different Target Technology in the Model editor.

Closing and Creating a New Model Edition

This operation closes the latest open model edition in a branch and opens a new one. The closed model edition is frozen and can no longer be edited, but can be deployed to production environments.

It is only possible to close an open edition. This edition is the latest one from a branch.
It is only possible to close an edition that is valid.

To close and create a new edition:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to close.
  2. Right-click the latest model edition of the branch (indicated as opened) and select Close and Create New Edition.
  3. Click OK to confirm closing the model edition.
  4. The Enter a comment for this new model edition dialog, enter a comment for the new model edition. This comment should explain why this new edition was created.
  5. Click OK. The model is validated, then a new model edition is created and opened in the Model Design view.
Be cautious when closing a model edition. Closing an edition cannot be undone, and a closed edition cannot be reopened.

Branching a Model Edition

Branching a model edition enables restarting and modifying a closed edition of a model. Branching creates a new branch based on a given edition, and opens a first edition of this branch.

It is only possible to branch from closed model editions.
When creating a model, a first branch named <model_name>_root is created with the first model edition.

To create a new branch:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to branch from.
  2. Right-click the closed edition from which you want to branch from and select Create Model Branch From this Edition. The Create New Model Branch wizard opens.
  3. In the Create New Model Branch wizard, check the Auto Fill option and then enter the following values:
    • Name: Internal name of the object.
    • Label: User-friendly label for this object. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.
  4. In the Description field, optionally enter a description for the Model Branch.
  5. Click Finish to close the wizard.
  6. In the Model Branch Created dialog, click Yes to open the first edition of this new branch.
  7. The newly created edition opens in the Model Design view.

Target Technology

A model is designed for a given database Target Technology (Oracle, PostgreSQL or SQL Server). Although most of the model content is technology-agnostic, the artifacts generated in the data location schema, as well as the certification process, will use capabilities specific to that technology.

When you create a model, it is automatically configured for the technology of the repository into which it is created. When you design the model, some of the capabilities, for example, the database functions available in SemQL, will depend on that technology.

You can configure the Target Technology value in the Model editor.

Make sure to check the model’s target technology when you start working with a model, or when you import a model from another repository. If you change this technology later, make sure to validate the model in order to have a list of possible issues due to the technology change.

Model Localization

When designing a model, labels, descriptions and other user-facing text strings, are entered to provide a user-friendly experience. These strings are natively externalized in Semarchy xDM, and can be translated (localized) in any language.

A user connecting an application created with Semarchy xDM will see these strings (label of the entities, attributes, list of values, etc.) translated in the locale of his web browser if such translation is available. If no translation is available in his locale for a given text, the default string (for example, the label or description specified in the model) is used. These default strings are the base translation.

Make sure you translate the entire model in a given language to avoid partially translated user interfaces.
Model localization only takes care of the strings defined in the model. Built-in strings such as 'Inbox', 'Filter' or 'Global Search', can be defined using Custom Translations at platform-level. See the Managing Custom Translations section in the Semarchy xDM Administration Guide for more information.

Translation Bundles

Strings translation is performed using Translation Bundles, attached to model editions. A translation bundle is a properties file that contains a list of key/value pairs corresponding to the strings localized in a given locale. The translation bundle file is named translations_<locale>.properties, where <locale> is the locale of the translation.

The following example is a sample of a translation bundle file for the English language (translations_en.properties). In this file, the label for the Employee entity is the string Staff Member, and its description is A person who works for our company.

...
Entity.Employee.Label=Staff Member
Entity.Employee.Description=A person who works for our company.
Attribute.Employee.FirstName.Label=First Name
Attribute.Employee.FirstName.Description=First name of the employee
Attribute.Employee.Picture.Label=<New Key TODO>
...

Translating a Model

To translate a model:

  1. The translation bundles are exported for the language(s) requiring translation in a single zip file.
  2. Each translation bundle is translated by a translator using his translation tooling.
  3. The translated bundles are re-imported into the model edition (either each file at a time, or as a single zip file).

To export translation bundles:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to localize.
  2. Right-click and then select Export Translation Bundles…. The Export Translation Bundles wizard opens.
  3. Select the languages you want to translate.
  4. Select Export Base Bundle if you also want to export the base bundle for reference. The base bundle contains the default strings, and cannot be translated.
  5. Select the Encoding for the exported bundles. Note that the encoding should be UTF-8 unless the language you want to translate or the translation tooling has other encoding requirements.
  6. Select in Values to Export the export type:
    • All exports all the keys with their current translated value. If a key is not translated yet, the value exported is the one specified by the Default Values option.
    • New keys only exports only the keys not translated yet.
    • All except removed keys exports all keys available, excluding those with no corresponding objects in the model. For example, the key for the description of an attribute that was deleted from a previous model edition will not be exported.
  7. Select in Default Values the value to set for new keys (keys with no translation in the language).
    • Use values for base bundle sets the value to the base bundle value.
    • Use the defined tag sets the value to the tag specified in the field at the right of the selection (defaults to <New Key TODO>).
    • Leave Empty set the value to an empty string.
  8. Click OK to download the translation bundles in a zip format and then Close to close the wizard.

To import translation bundles:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to localize.
  2. Right-click and then select Import Translation Bundles…. The Import Translation Bundles wizard opens.
  3. Click the Open button and select the translation file to import. This file should be either a properties file named translations_<locale>.properties or a zip file containing several of these properties files.
  4. In the Language to Import table, select the language translations you want to import.
  5. Select the Encoding for the import. Note that this encoding should correspond to the encoding of the properties files you import.
  6. Select Cleanup Removed Keys During Import if you want to remove the translations for the keys that are no longer used in the model. This cleanup removes translations no longer used by the model.
  7. Click Finish to run the import.

The translations for the model edition in the selected languages are updated with those contained in the translation bundles. If the Cleanup Removed Keys During Import was selected, translations in these languages no longer used in the model are removed.

Translation and Model Edition Lifecycles

The lifecycle of the translations is entirely decoupled from the model edition and deployment lifecycle:

  • It is possible to modify the translations of open or closed model editions, including deployed model editions in production data locations.
  • Translation changes on deployed model editions are taken into account dynamically when a user accesses an application defined in this model edition.
Decoupling the translation lifecycle from the model edition avoids binding the critical model development and release process to the translation process, as the latter frequently is managed by a separate team. This also allows adding new translations or fixing translations without having to re-deploy a new model edition.
When creating a new model edition, the translations from the previous model edition are not copied to the next edition. It is necessary to export and import translations between editions.

Deployment

This process is the deployment in a run-time environment (for production or development) of a model designed with its certification process.

Introduction to Deployment

Deployment consists in deploying a Model Edition in a Data Location (a database schema accessed via a JDBC datasource).

Once this model edition is deployed, it is possible to load, access and manage data in the data location using the applications defined in the model.

In this process, the following components are involved:

  • A Data Location is a database schema into which successive Model Editions will be deployed. This data location is declared in Semarchy xDM, and uses a JDBC datasource defined in the application server.
  • In a data location, a Deployed Model Edition is a model version deployed at a given time in a data location. As an model evolves over time, for example to include new entities or functional areas, new model editions are created then deployed. The Deployment History tracks the successive model editions deployed in the data location.

In the deployment process, you can maintain as many data locations as you want in Semarchy xDM, but a data location is always attached to one repository. You can deploy successive model editions into a data location, but only the latest model edition deployed is active in the data location.

Data Location Types

There are two types of data locations. The type is selected when the data location is created and cannot be changed afterwards:

The data location types are:

  • Development Data Location: A data location of this type supports deploying open or closed model editions. This type of data location is suitable for testing models in development and quality assurance environments.
  • Production Data Location: A data location of this type supports deploying only closed model editions. This type of data location is suitable for deploying data hubs in production environments.
Be cautious when choosing the data location type, as it will determine the type of deployment operations that can be done. It is recommended to use only Production Data locations for Production and User Acceptance Test environments.

Data Location Contents

A Data Location contains the hub data, stored in the schema accessed using the data location’s datasource. This schema contains database tables and other objects generated from the model edition.

The data location also refers three type of jobs (stored in the repository):

  • Installation Jobs: The jobs for creating or modifying, in a non-destructive way, the data structures in the schema.
  • Integration Jobs: The jobs for certifying data in these data structures, according to the model job definitions.
  • Purge Jobs: The jobs for purging the logs and data history according to the retention policies.

Creating a Data Location

A data location is a connection to a database schema via a JDBC datasource defined in the application server running Semarchy xDM. Make sure the administrator of this application server creates this datasource, and the database administrator creates the schema for you before creating the data location in Semarchy Application Builder.

To create a new data location:

  1. In the Management view, right-click the Data Locations node and select New Data Location. The Create New Data Location wizard opens.
  2. In the Create New Data Location wizard, check the Auto Fill option and then enter the following values:
    • Name: Internal name of the object.
    • Label: User-friendly label for this object. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.
    • JNDI Datasource Name: Select the JDBC datasource pointing to the schema that will host the data location.
    • In the Description field, optionally enter a description for the Data Location.
    • Select the Location Type for this data location.
    • Select the Deployed Model Edition: This model edition is the first one deployed in the data location.
  3. Click Finish to close the wizard. The data location is created and the first model edition deploys.

To delete a data location:

  1. In the Management view, expand the Data Locations node, right-click the data location and select Delete. The Delete Data Location wizard opens. In this wizard, you only delete the data location definition in Semarchy xDM. Deleting the data stored in the data location is optional.
    • If you do not want to delete all the data in the data location schema, click Finish. The data location is deleted but the data is preserved.
    • If you want to delete all the data in the data location schema:
      1. Select the Drop the content of the database schema to delete the content of the schema. Note that with this option, you choose to delete all the data stored in the hub, which cannot be undone.
      2. Click Next. The wizard lists the objects that will be dropped from the schema.
      3. In the next wizard step, enter DROP to confirm the choice, and then click Finish. The data location as well as the schema content are deleted.
Deleting a data location is an operation that cannot be undone.
Deleting the data location as well as the schema content is a convenient mechanism to reset the hub content at the early stages of the model design.

Deploying a Model Edition

After the initial model edition is deployed, it is possible to deploy other model editions. This single mechanism is used for example to:

  • Update the deployed model edition with the latest changes performed in an open model edition.
  • Deploy a new closed model version to a production or test environment.
  • Revert a deployed model edition to a previous edition.
Deploying open model editions is only possible in a Development Data Location.

To deploy a model edition:

  1. In the Management view, expand the Data Locations node, right-click the data location node and select Deploy Model Edition….
  2. If you have unsaved editors, select those to save when prompted.
  3. In the The Deploy Model Edition wizard, select in the Deployed Model Edition the model edition you want to deploy.
  4. Leave the Generate Job Definition option checked to generate new integration jobs.
  5. Click Next. The changes to perform on the data location, to support this new model edition, are computed. A second page shows the SQL script to run on the schema to deploy this model edition.
  6. Click Finish to run the script and close the wizard.
    The model edition deploys the jobs first and then runs the SQL code to create or modify the database objects. You can follow this second operation in the Console view at the bottom of the Application Builder.

The new model edition is deployed and the previous model deployment appears under the Deployment History node in the data location.

Deploying a model edition does not modify data already in place in the data location.
Although it is recommended to update both the jobs and schemas at the same time, you may want to update the data structure first then the jobs later. For example, if the data you have in the data editions using this model edition is not fit for the new version of the jobs. In that case, you may want to run some transformation on the data with the updated data structures before updating the jobs.
Another use case for not deploying the job definition is when you know that the new and old job definitions are similar and you want to preserve the existing job logs.
It is not possible to deploy a model edition in a data location that requires an upgrade.
When deploying a model edition, Semarchy lists the database objects in the data location schema and detects those it needs to create or modify. If you need to create database objects (such as indexes) in the data location schema, make sure to prefix their name with USR_. Semarchy always ignores objects named with this prefix.

After multiple deployments, you may decide to remove old elements in the Deployment History.

To remove historized deployments:

  1. In the Management view, expand the Deployment History node under the data location.
  2. Select one or many historized deployments (hold Shift for multiple selection).
  3. Right-click and select Delete.

The selected historized deployments are deleted.

Advanced Deployment Techniques

Moving Models at Design-Time

At design-time, it is possible to move models from one repository to another design repository using Export/Import:

  • Export is allowed from a design repository, but also from a deployment repository.
  • Import is possible in a design repository, either:
    • to create a new model from the import
    • or to overwrite an existing open model edition.
Import works if the model was exported using the same version of Semarchy xDM. Exporting a model from a product version and importing to a different product version (even a more recent one) is not supported. Make sure that both environement have the same version prior to an export/import operation.
Exporting a Model Edition

Export a model edition to download an XML file which may be imported into another repository.

To export the model documentation, refer to the Exporting the Model Documentation section in the Semarchy xDM Developer’s Guide.

To export a model edition:

  1. In the Management view of the Management perspective, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to export.
  2. Select the model edition you want to export, right click and select Export Model Edition…​.
  3. In the Model Edition Export dialog, select an Encoding for the export file.
  4. Click OK to download the export file on your local file system.
  5. Click Close.
Importing to a New Model

To import and create a new model:

  1. Open the the Model Design view by clicking the Design button in the Application Builder header.
  2. If you are connected to a model edition, click the image Switch Model button to close the connected model edition.
  3. In the Model Design view, click the New Model from import... The Import to a New Model wizard opens.
  4. Click the Open button and select the export file.
  5. Click Finish to perform the import. The newly imported model opens in the Model Design view.
When importing a model from a different repository, make sure that the correct Target Technology is set for this model. For example, a model developed for the Oracle target technology will not work if you attempt to deploy it on a PostgreSQL data location.
Importing on an Existing Model

To import and replace an existing model:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to replace.
  2. Select the open model edition you want to replace, right click and select Import Replace…. The Import-Replace Model Edition wizard opens.
  3. Click the Open button and select the export file.
  4. Click Finish to perform the import.
  5. Click OK to confirm the deletion of the existing model.

The existing model edition is replaced by the content of the export file.

Deploying to a Remote Location

Frequently, the deployment environment is separated from the development environment. For example, the development/QA and production sites are located on different networks or locations. In such cases, it is necessary to use export and import to transfer the model edition before performing the deployment in production.

A remote deployment consists in moving a closed model edition:

  • From a design repository or a deployment repository used for Testing/UAT purposes;
  • To a deployment repository.
Remote Deployment Architecture

In this configuration, two repositories are created instead of one:

  • A Design repository for the development and QA site, with Development data locations attached to this repository.
  • A Deployment repository for the production site. Production data locations are attached to this repository.

The process for deploying a model edition in this configuration is given below:

  1. The model edition is closed in the design repository.
  2. The closed model edition is exported from the design repository to an export file.
  3. The closed model edition is imported from the export file into the deployment repository.
  4. The closed model edition is deployed from the deployment repository into a production data location.
Exporting a Model Edition

To export a model edition:

  1. In the Management view, expand the Model Administration node, then expand the model and the model branch containing the edition that you want to export.
  2. Select the closed model edition you want to export, right click and select Export Model Edition.
  3. In the Model Edition Export dialog, select an Encoding for the export file.
  4. Click OK to download the export file on your local file system.
  5. Click Close.
Importing a Model Edition in a Deployment Repository

To import a model edition in a deployment repository:

  1. Open the Model Design perspective.
  2. Select the Import Model Edition link. The Import Model Editions wizard opens.
  3. Click the Open button and select the export file.
  4. Click Next.
  5. Review the content of the Import Preview page and then click Finish.
When importing a model edition, the root model and the branches containing this model edition are automatically created as needed.
When importing successive versions of model editions in a deployment repository, it is not recommended to skip intermediate versions, as it is not possible to import these intermediate versions later. For example, if importing version 0.1 of a model, then importing version 0.4, the intermediate versions - 0.2, 0.3 - can no longer be imported into this repository.
Elements not Exported with the Model Edition

When exporting then importing a model to a different repository, note that some elements are not included in the model and need to be reconfigured in the remote environment. These elements are listed below.

Elements used by the model and applications:

  • Applications Common Configuration
  • Image Libraries
  • Installed Plug-ins
  • Role definitions
  • Variable Value Providers

Elements not used by the model, but required for operations:

  • Notification Servers
  • Notification Policies
  • Continuous Loads
  • Purge Schedules
  • Batch Poller Configuration

Data Location Status

A data location status is:

  • Ready: A data location in this status can be accessed in read/write mode, accepts incoming batches and processes its current batches.
  • Maintenance: A data location in this status cannot be accessed. It does not accept incoming batches but completes its current batches. New loads cannot be initialized and existing loads cannot be submitted. Continous loads stop processing incoming loads and keep them on hold.

When moving a data location to a Maintenance status, the currently processed batches continue until completion. Loads submitted after the data location is moved to Maintenance will fail. They can be kept open and submitted later, when the data location is restored to a ready status.

When a data location is in maintenance mode, it appears with a disabled avatar in the welcome page.
Changing a Data Location Status

To set a data location status to maintenance:

  1. In the Management view, expand the Data Locations node, right-click the data location and select Set Status to Maintenance.
  2. Click OK in the confirm dialog.

The data location is moved to Maintenance mode.

To set a data location status to ready:

  1. In the Management view, expand the Data Locations node, right-click the data location and select Set Status to Ready.

The data location is moved to a ready state.

Using the Maintenance Mode

The Maintenance status can be used to perform maintenance tasks on the data locations.
For example, if you want to move the data location to a model edition with data structure changes that mandate manual DML commands to be issued on the hub tables, you may perform the following sequence:

  1. Move the data location to Maintenance mode.
  2. Let the currently running batches complete. No batch can be submitted to this edition.
  3. Deploy the new model edition.
  4. Perform your DML commands.
  5. Move the data location from Maintenance to the Ready status. Batches can now be submitted to this data location.

Using this sequence, you prevent batches being submitted while the hub is in Maintenance.

The data location is automatically set to maintenance when deploying a model edition and then automatically reverted to ready state.

Managing Execution

An Integration Job is a job executed by Semarchy xDM to integrate and certify source data into golden records. This job is generated from the rules (enrichment, validation, etc) defined in the model and deployed with the model edition.

The Execution Engine processes the integration jobs submitted to the Integration Batch Poller when new data or data changes are published to the data location. The engine also process maintenance jobs such as the Deployment Jobs and Purge Jobs.

Understanding Jobs, Queues and Logs

Jobs are processing units that run in the execution engine. There are three main types of jobs running in the execution engine:

  • Integration Jobs that process incoming batches to perform golden data certification.
  • Deployment Jobs that deploy new model editions in data locations.
  • Purge Jobs: The jobs for purging the logs and data history according to the retention policies.

Jobs are processed in the engine’s Queues. Queues work in First-In-First-Out (FIFO) mode. When a job runs in the queue, the next jobs are queued and wait for their turn to run. To run two jobs in parallel, it is necessary to distribute them into different queues.

Queues are grouped into Clusters. There is one cluster per Data Location, named after the data location.

System Queues and Clusters

Specific queues and cluster exist for administrative jobs:

  • For each data location cluster, a specific System Queue called SEM_SYS_QUEUE is automatically created. This queue is used to run administrative operations for the data location. For example, this queue processes the deployment jobs updating the data structures in the data location.
  • A specific System Cluster cluster called SEM_SYS_CLUSTER, which contains a single SEM_SYS_QUEUE queue, is used to run platform-level maintenance operations.

Job Priority

As a general rule, integration jobs are processed in their queues, and have the same priority.
There are two exceptions to this rule:

  • Jobs updating the data location, principally model edition Deployment Jobs.
  • Platform Maintenance Jobs that updating the entire platform.
Model Edition Deployment Job

When a new model edition is deployed and requires data structure changes, DDL commands are issued as part of a job called DB Install<model edition name>. This job is launched in the in the SEM_SYS_QUEUE queue of the data location cluster.

This job modifies the tables used by the DML statements from the integration jobs. As a consequence, it needs to run while no integration job runs. This job takes precedence over all other queued jobs in the cluster, which means that:

  1. Jobs currently running in the cluster are completed normally.
  2. All the queues in the cluster, except the SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs remain in the queue and are no longer executed.
  3. The model edition deployment job is executed in the SEM_SYS_QUEUE.
  4. When this job is finished, the other queues return to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimum downtime of the integration activity while avoiding conflicts between integration jobs and model edition deployment.

Platform Maintenance Job

If a job is queued in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE queue, it takes precedence over all other queued jobs in the execution engine.

This means that:

  1. Jobs currently running in all the clusters are completed.
  2. All the clusters and queues except the SEM_SYS_CLUSTER/SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs are no longer executed in these queues/clusters.
  3. The job in the in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE is executed.
  4. When this job is finished, the other queues are moved to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimal disruption of the platform activity while avoiding conflicts between the platform activity and maintenance operations.

Queue Behavior on Error

When a job running in a queue encounters a run-time error, it behaves differently depending on the queue configuration:

  • If the queue is configured to Suspend on Error, the job hangs on the error point, and blocks the rest of the queued jobs. This job can be resumed when the cause of the error is fixed, or can be canceled by user choice.
  • If the queue is not configured to Suspend on Error, the job fails automatically and the next jobs in the queue are executed. The failed job cannot be restarted.
The integration jobs are processed in a FIFO mode, a job that is failed automatically or canceled by user choice cannot be restarted. To resubmit the source data for certification, the external load needs to be resubmitted entirely as a new load.
The integration job performs a commit after each task. As a consequence, when a job fails or is suspended, already processed entities have their golden data certified and committed in the hub.
A user can explicitly choose to halt a running job by suspending it. When such a use operation is performed, the job is considered in error and can be restarted or canceled.

Suspending a job on error is the preferred configuration under the following assumptions:

  1. All the data in a batch needs to be integrated as one single atomic operation.
    For example, due to referential integrity, it is not possible to integrate contacts without customers and vice versa. Suspending the job guarantees that it can be continued - after fixing the cause of the error - with the data location preserved in the same state.
  2. Batches and integration jobs are submitted in a precise sequence that represents the changes in the source, and need to be processed in the order they were submitted.
    For example, missing a data value change in the suspended batch that may impact the consolidation of future batches. Suspending the job guarantees that the jobs are processed in their exact submission sequence, and no batch is skipped without an explicit user choice.

There may be some cases when this behavior can be changed:

  • If the batches/jobs do not have strong integrity or sequencing requirement, then they can be skipped on error by default. These jobs can run in a queue where Suspend on Error is disabled.
  • If the integration velocity is critical for making golden data available as quickly as possible, it is possible to configure the queue running the integration job with Suspend on Error disabled.

Queue Status

A queue is in one the following statuses:

  • READY: The queue is available for processing jobs.
  • SUSPENDED: The queue is blocked because a job has encountered an error or was suspended by the user. This job remains suspended. Queued jobs are not processed until the queues becomes READY again, either when the job is cancelled or finishes successfully. For more information, see the Troubleshooting Errors section.
  • BLOCKED: When a job is running in the SEM_SYS_QUEUE queue of the cluster, the other queues are moved to this status. Jobs cannot be executed in a blocked queue and remain queued until the queue becomes READY again.

A cluster can be in one the following statuses:

  • READY: The cluster is not blocked by the SEM_SYS_CLUSTER cluster, and queues under this cluster can process jobs.
  • BLOCKED: The cluster is blocked when a job is running in the SEM_SYS_CLUSTER cluster. When a cluster is blocked, all its attached queues are also blocked.

Managing the Execution Engine and the Queues

Accessing the Execution Engine

To access the execution engine:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The Execution Engine editor opens.

The Execution Engine Editor

This editor displays the list of queues grouped by clusters. Queue currently pending on suspended jobs appear in red.

The list of queues and clusters displays the following information:

  • Cluster/Queue Name: the name of the cluster or queue.
  • Status: Status of the queue or cluster. A queue can be either READY, SUSPENDED or BLOCKED. A cluster may be in a BLOCKED or READY status.
  • Queued Jobs: For a queue, the number of jobs queued in this queue. For a cluster number of jobs queued in all the queues of this cluster.
  • Running Jobs: For a queue, the number of jobs running in this queue (1 or 0). For a cluster, the number of jobs running in all the queues of this cluster.
  • Suspend on Error: Defines the behavior of the queue on job error. See the Troubleshooting Errors section for more information.

From the Execution Engine editor, you can perform the following operations:

Stopping and Starting the Execution Engine

To stop and start the execution engine:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The Execution Engine editor opens.
  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the execution engine.
Stopping the execution engine does not kill running jobs. The engine stops after all running jobs are completed. Beside, the content of the queues is persisted. When the execution engine is restarted, the execution of queued jobs proceeds normally.

Changing the Queue Behavior on Error

See the Troubleshooting Errors and the Queue Behavior on Error sections for more information about queue behavior on error and error management.

To change the queue behavior on error:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The Execution Engine editor opens.
  2. Select or de-select the Suspend on Error option for a queue to set its behavior on error or on a cluster to set the behavior of all queues in this cluster.
  3. Press CTRL+S to save the configuration. This configuration is immediately active.

Opening an Execution Console for a Queue

The execution console provides the details of the activity of a given queue. This information is useful to monitor the activity of jobs running in the queue, and to troubleshoot errors.

The content of the execution console is not persisted. Executions prior to opening the console are not displayed in this console. Besides, if the console is closed, its content is lost.

To open the execution console:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The execution engine editor appears.
  2. Select the queue, right-click and select Open Execution Console.
    The Console view for this queue opens. Note that it is possible to open multiple execution consoles to monitor the activity of multiple queues.

In the Console view toolbar you have access to the following operations:

  • The image Close Console button closes the current console. The consoles for the other queues remain open.
  • The image Clear Console button clears the content of the current console.
  • The image Display Selected Log button allows you to select one of the execution consoles currently open.

Suspending a Job Running in a Queue

To restart a suspended job in a queue:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The execution engine editor appears.
  2. Select the queue that contains one Running Job.
  3. Right-click and then select Suspend Job.
    The job is suspending and the queue switches to the SUSPENDED status.
Suspending the job is an operation that should be performed with care, as respecting the sequence of the submitted job have strong impact on the consistency of the data in the hub.

Restarting a Suspended Job in a Queue

To restart a suspended job in a queue:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.
  2. Select the suspended queue.
  3. Right-click and then select Restart Job.
    The job restarts from the failed step. If the execution console for this queue is open, the details of the execution are shown in the Console.

Canceling a Suspended Job in a Queue

To cancel a suspended job in a queue:

  1. In the Management view, expand the Job Executions node and double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.
  2. Select the suspended queue.
  3. Right-click and then select Cancel Job.
    The job is canceled, the queue become READY and starts processing queued jobs.
    In the job logs, this job appears in Error status.
Canceling a job may leave the data in the data location in an invalid state. This operation is provided as a commodity, focusing on development and test environments. In production environments, jobs should not be canceled. If a job is suspended due to an infrastructure issue (for example, tablespace full, network failure), or if a long-running job is suspended due to suspected performance issues, the infrastructure or database state should be fixed (for example, by extending the tablespaces, restoring the network access, or recomputing the database statistics), and then the job should be restarted.

Managing the Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform, and starts the integration jobs on the execution engine. The polling action is performed on a schedule configured in the batch poller.

Stopping and Starting the Integration Batch Poller

To stop and start the integration batch poller:

  1. In the Management view, expand the Job Executions node and double-click the Integration Batch Poller node. The Integration Batch Poller editor opens.
  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the integration batch poller.
Stopping the batch poller does not kill running jobs, and does not prevent new batches to be submitted. When this component is stopped, the submitted batches are simply not taken into account and no jobs is queued on the execution engine until the batch poller is restarted.

Configuring the Integration Batch Poller

The integration batch poller configuration determines the frequency at which submitted batches are picked up for processing.

To configure the integration batch poller:

  1. In the Management view, expand the Job Executions node and double-click the Integration Batch Poller node.
  2. In the Integration Batch Poller editor, choose in the Configuration section the polling frequency:
    • Weekly at a given day and time.
    • Daily at a given time.
    • Hourly at a given time.
    • Every n second.
    • With a Cron Expression.
  3. Press CTRL+S to save the configuration.
It is not necessary to restart the integration batch poller to take into account the configuration changes.

In the Advanced section, set optionally the following logging parameters:

  • Job Log Level: Select the logging level that you want for the jobs:
    • No Logging disables all logging. Jobs and tasks are no longer traced in the job log. Job restartability is not possible. This level is not recommended.
    • No Tasks only logs job information, and not the task details. This mode supports job restartability.
    • Exclude Skipped Tasks (default) logs job information and task details, except for the tasks that are skipped.
    • Include All Tasks logs job information and all task details.
  • Execution Monitor Log Level: Logging level [1…3] for the execution console for all the queues.
  • Enable Conditional Execution: A task may be executed or skipped depending on a condition set on the task. For example, a task may be skipped depending on parameters passed to the job. Disabling this option prevents conditional executions and forces the engine to process all the tasks.
Deployment repositories are created with a Job Log Level value set to No Task. Other repositories are created with no configured value, and use the Exclude Skipped Tasks default value.

Managing Jobs Logs

The job logs display the jobs being executed or executed in the past by the execution engine. Reviewing the job logs allows you to monitor the activity of these jobs and troubleshoot execution errors.

Accessing the Job Logs

To access the logs:

  1. In the Management view, expand the Job Executions node and double-click the Executions node.
  2. The Job Logs editor opens.

The Job Logs Editor

From this editor you can review the job execution logs and drill down into these logs.

The following actions are available from the Job Logs editor toolbar.

  • Use the image Refresh button to refresh the view.
  • Use the image Auto Fit Column Width button to adjust the size of the columns.
  • Use the image Apply and Manage User-Defined Filters button to filter the log. See the Filtering the Logs section for more information.
  • Use the image Purge Selection button to delete the entries selected in the job logs table. See the Purging the Logs section for more information.
  • Use the image Purge using a Filter button to purge logs using an existing or a new filter. See the Purging the Logs section for more information.

Drilling Down into the Logs

The Job Logs editor displays the log list. This view includes:

  • The Name, Start Date, End Date and Duration of the job as well as the name of its creator (Created By).
  • The Message returned by the job execution. This message is empty if the job is successful.
  • The rows statistics for the Job:
    • Row Count: Sum of all the Select, Insert, etc metrics.
    • Insert Count, Update Count, Deleted Count: number of rows selected, inserted, updated, deleted, merged as part of this job.

To drill down into the logs:

  1. Double-click on a log entry in the Job Logs editor.
  2. The Job Log editor open. It displays all he information available in the job logs list, plus:
    • The Job Definition: This link opens the job definition for this log.
    • The Job Log Parameters: The startup parameters for this job. For example, the Batch ID and Load ID.
    • The Current Task information, if the job is still running.
    • The Tasks: In this list, task groups are displayed with the statistics for this integration job instance.
  3. Double-Click on task group in the Tasks list to drill down into sub-task groups or task.
  4. Click on the Task Definition link to open the definition of a task.

By drilling down into the task groups down to the task, it is possible to monitor the activity of a job, and review in the definition the executed code or plug-in.

Filtering the Logs

To create a job log filter:

  1. In the Job Logs editor, click the image Apply and Manage User-Defined Filters button and then select Search. The Define Filter dialog opens.
  2. Provide the filtering criteria:
    • Job Name: Name of the job. Use the _ and % wildcards to represent one or any number of characters.
    • Created By: Name of the job creator. Use the _ and % wildcards to represent one or any number of characters.
    • Status: Select the list of job statuses included in the filter.
    • Only Include: Check this option to limit the filter to the logs before/after a certain number of executions or a certain point in time. Note that the time considered is the job start time.
  3. Click the Save as Preferred Filter option and enter a filter name to save this filter.

Saved filters appear when you click the Apply and Manage User-Defined Filters button.
You can enable of disable a filter by marking it as active or inactive from this menu. You can also use the Apply All and Apply None to enable/disable all saved filters.

Filters are saved in the user preferences and can be shared using preferences import/export.

To manage job log filters:

  1. Click the image Apply and Manage User-Defined Filters button, then select Manage Filters. The Manage User Filters editor opens.
  2. From this editor, you can add, delete or edit a filter, and enable disable filters for the current view.
  3. Click Finish to apply your changes.

Purging the Logs

You can purge selected job logs or all job logs returned by a filter.

To purge selected job logs:

  1. In the Job Logs editor, select the job logs you want to purge. Press the CTRL key to select multiple lines or the SHIFT key to select a range of lines.
  2. Click the image Purge Selection button.
  3. Click OK in the confirmation window.
    The selected job logs are deleted.

To purge filtered job logs:

  1. In the Job Logs editor, click the image Purge using a Filter button.
    • To use an existing filter:
      1. Select the Use Existing Filter option.
      2. Select a filter from the list and then press Finish.
    • To create a new filter:
      1. Select the Define New Filter option and then click Next.
      2. Provide the filter parameters, as explained in the Filtering the Logs section and then click Finish.
    • To purge all logs (no filter):
      1. Select the Purge All Logs (No Filter) option and then click Finish.

The jobs logs are purged.

The REST API provides an endpoint to script and automate the log purge.

Troubleshooting Errors

When a job fails, depending on the configuration of the queue into which this job runs, it is either in a Suspended or Error status.

The status of the job defines the possible actions on this job.

  • A job in Error cannot be continued or restarted. It can be reviewed for analysis, and possible fixes will only affect subsequent jobs.
  • A Suspended job blocks the entire queue, and can be restarted after fixing the problem, or cancelled.

You have several capabilities in Semarchy xDM to help you troubleshooting issues. You can drill down in the erroneous task to identify the issue or restart the job with the Execution Console activated

To troubleshoot an error:

  1. Open the Job Logs.
  2. Double-click the log entry marked as image Suspended or in image Error.
  3. Drill down into the Task Log, as explained in the Drilling Down into the Logs section.
  4. In the Task Log, review the Message.
  5. Click the Task Definition link to open the task definition and review the SQL Statements involved, or the plug-in called in this task.

Configuring Data Locations

In addition of the deployed model editions and the job execution logs, the data locations also contains the configuration of:

  • The Continuous Loads, used by integration specialists to push data into the data location in a continuous way.
  • The Job Notifications Policies, sent under certain conditions when an integration job completes for administration, monitoring, or integration automation purposes.
  • The Data Purge Schedule, to reduce the data location storage volume by pruning the history of data changes and job logs.

This chapter explains how to configure these items.

Configuring Continuous Loads

Continuous loads enable integration developers to push data into the data location in a continuous way without having to take care of Load Initialization or Load Submission.

With continuous loads:

  • Integration developers do not need to initialize and submit individual external loads. They directly load data into the hub using the Load ID or Name of the continuous load.
  • At regular intervals, Semarchy xDM automatically creates then submits an external load with the data loaded in the continuous load. This external load is submitted with a program name, a job, and a submitter name.
  • The continuous load remains, with the same Load ID and Name. Subsequent data loads made with this continuous load are processed at the next interval.

Continuous loads are configured and managed by the data location manager. Unlike external loads, they cannot be created, submitted or canceled via integration points.

To configure a continuous load:

  1. In the Management view, expand the Data Locations node, then expand the data location for which you want to configure a continuous load.
  2. Right-click the Continuous Loads node and select New Continuous Load. The Create New Continuous Load wizard opens.
  3. Enter the following values:
    • Active: Check this option to make the continuous load active. Only active loads integrate data at a regular interval.
    • Name: Name of the continuous load. This name is used to uniquely identify this load.
    • Program Name: This value is for information only. It describes the submitted external loads.
    • On Submit Job: Integration job submitted with the external loads. This job is selected among those available in the deployed model edition.
    • Submit Interval: Interval in second between submissions.
    • Submit as: name of the user submitting the external loads. This user may or may not be a user defined in the security realm of the application server.
  4. Click Finish to close the wizard. The Continuous Load editor opens.
  5. In the Description field, optionally enter a description for this load.
  6. Press CTRL+S to save the editor.

Note the Load ID and Name values of the continuous load as you will need them for integrating data using this load.

Using the Name instead of the Load ID in your integration processes and flows gives the flexibility to use the same integration process or flow definition regardless of the data location. When deploying a model to another data location (for example, to move from development to production), you just need to create another continuous load with the same Name.

The data location manager can deactivate a continuous load to prevent it from processing its data.

To activate or deactivate continuous loads:

  1. In the Management view, expand the Data Locations node, then expand the data location for which you want to configure a continuous load.
  2. Double-click the Continuous Loads node. The Data Location editor opens on the Continuous Loads tab.
  3. Select one or more continuous loads in the list, and then click the Activate or Deactivate button in the toolbar.
  4. Press CTRL+S to save the editor.
You do not need to restart any other component after creating, activating or deactivating a continuous load. The changes are immediately taken into account.
When deploying a new model edition that deprecates a job, continuous loads using this job are automatically made inactive. They must be updated to use the updated integration job and then reactivated by the data location manager.

Configuring Job Notifications Policies

Notifications tell users or applications when a job completes or when operations are performed into workflows, for example, when task is assigned to a role.

There are two types of notifications:

  • Job Notifications issued under certain conditions when an integration job completes. These notifications are used for administration, monitoring, or integration automation. These notifications are configured with Notification Policies in the data locations.
  • Workflow Notifications are emails sent to users when operations are performed in a workflow. They are configured in workflow transitions and tasks.

Both families of Notifications are issued via Notification Servers.

Notifications Servers Types

Notifications recipients may be users or systems. The type of notification sent as well as the recipient depends on the type of notification server configured.

Each notification server uses a Notification Plug-in that:

  • defines the configuration parameters for the notification server,
  • defines the configuration and form of the notification,
  • sends the notifications via the notification servers.

Semarchy xDM is provided with several built-in notification plug-ins:

  • JavaMail: The notification is sent in the form of an email via a Mail Session server configured in the application server, and referenced in the notification server. For more information about configuring Mail Session, see the Semarchy xDM Installation Guide.
  • SMTP: The notification is sent in the form of an email via a SMTP server entirely configured in the notification server.
  • File: The notification is issued as text in a file stored in a local directory or in a FTP/SFTP file server.
  • HTTP: The notification is issued as a GET or POST request sent to a remote HTTP server. Use this server type to call a web service with the notification information.
  • JMS: The notification is issued as a JMS message in a message queue.
It is possible to develop additional plug-ins to issue other type of notifications. See the Semarchy xDM Plug-in Development Guide for more information about plug-in development.

A single notification server having either the JavaMail or SMTP type can be used to send Workflow Notifications. This server is flagged as the Workflow Notification Server

Any servers can be used to send Job Notifications. Each Job Notification Policy specifies the notification server it uses.

Configuring Notification Servers

Notification servers are configured at platform-level. Refer to the Configuring Notification Servers section in the Semarchy xDM Administration Guide for more details about notification servers configuration.

Configuring a Job Notification Policy

With a notification server configured, it is possible to create notification policies using this server.

To create a notification policy:

  1. In the Management view, expand the Data Locations node, then expand the data location for which you want to configure the policy.
  2. Right-click the Job Notification Policies node and select image New Job Notification Policy. The Create New Job Notification Policy wizard opens.
  3. In the first wizard page, enter the following information:
    • Name: Internal name of the notification policy.
    • Label: User-friendly label for the notification policy. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.
    • Notification Server: Select the notification server that will be used to send these email notifications.
    • Use Complex Condition: Check this option to use a freeform Groovy Condition. Leave it unchecked to define the condition using a form.
  4. Click Next.
  5. Define the job notification condition. This condition apply to a completing job.
    • If you have checked the Use Complex Condition option, enter the Groovy Condition that must be true to issue the notification. See Groovy Condition for more information.
    • If you have not checked the Use Complex Condition option, use the form to define the condition to issue the notification.
      • Job Name Pattern: Name of the job. Use the _ and % wildcards to represent one or any number of characters.
      • Notify on Failure: Select this option to send notification when a job fails or is suspended.
      • Notify on Success: Select this option to send notification when a job completes successfully.
      • … Count Threshold: Select the maximum number of errors, inserts, etc. allowed before a notification is sent.
        If you define a Job Name Pattern, Notify on Failure and a Threshold, a notification is sent if a job matching the pattern fails or to reaches the threshold.
  6. Click Next.
  7. Define the job notification Payload. This payload is a text content, but you can use Groovy also to programmatically generate it. See Groovy Template for more information.
    This payload has a different purpose depending on the type of notification:
    • JavaMail or SMTP: The body of the email
    • File: the content written to the target file.
    • JMS: the payload of the JMS message.
    • HTTP: The content of a POST request.
  8. Click Next.
  9. Define the Notification Properties. These properties depend on the type of notification server:
    • JavaMail or SMTP:
      • Subject: Subject of the email. The subject may be a Groovy Template
      • To, CC: List of recipients of this email. These recipients are roles. Each of these roles points to a list of email addresses.
      • Content Type: Email content type. For example: text/html, text/plain. This content type must correspond to the generated payload.
    • File:
      • Path: Path of the file in the file system. The path may be a Groovy Template. Make sure to use only forward slashes / for this path. Note that this path is a relative path from the Notification Server’s Root Path location. For example, if you set the Path to /new and the Notification Server Root Path to /work/notifications, then the notification files are stored in the /work/notifications/new folder.
      • Append: Check this option to append the payload to the file. Otherwise, the file is overwritten.
      • Charset: Charset used for writing the file. Typically UTF-8, UTF-16 or ISO-8859-1.
      • File Name: Name of the file to write. the file name may be a Groovy Template.
      • Root Path: Provide the root path for storing the notification file.
    • HTTP:
      • Method: HTTP request method (POST or GET)
      • Request Path: Path of the request in the HTTP server. The request path may be a Groovy Template
      • Parameters: HTTP Parameters passed to the request in the form a list of property=value pairs separated by a & character. If no parameter is passed and the method is GET, all the notification properties are passed as parameters. The parameters may be a Groovy Template
      • Headers: HTTP Parameters passed to the request as header=value pairs, with one header per line.
      • Content Type: Content type of the payload. For example: text/html, text/plain. This content type must correspond to the generated payload.
      • Failure Regexp: If the server returns an HTTP Code 200, the response payload is parsed with this regular expression. If the entire payload matches this expression, then the notification is considered failed. For example, to detect the NOTIFICATION FAILED string in the payload, the Failure Regexp value should be (.*)NOTIFICATION FAILED(.*).
    • JMS:
      • JMS Destination: JNDI URL of the JMS topic or queue. The URL is typically java:comp/env/jms/queue/MyQueue if a queue factory is declared as jms/queue/MyQueue in the application server. The destination may be a Groovy Template
      • Message Type: Type of JMS Message sent: TextMessage, MapMessage or Message. See Message Types for more information. When using a MapMessage, the payload is ignored and all properties are passed in the MapMessage.
      • Set Message Properties: Check this option to automatically set all notification properties as message properties. Passing properties in this form simplifies message filtering.
  10. Press CTRL-S to save the configuration.

Using Groovy for Notifications

The Groovy scripting language is used to customize the notification. See http://groovy-lang.org/documentation.html for more information about this language.

Groovy Condition

When using a complex condition for triggering the notification, the condition is expressed in the form of a Groovy expression that returns true or false. If this condition is true, then the notification is triggered.

This condition may use properties of the job that completes. Each property is available as a Groovy variable.

The available properties are described in the Job Notification Properties section.

You can use the image Edit Expression button and open the condition editor.
In the condition editor:

  • Double-click one of the Properties in the list to add it to the condition.
  • Click the Test button to test the condition against the notification properties provided in the Test Values tab.
  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample conditions are given below:

Trigger a notification if a job has got errors.
ErrorCount > 0
Trigger a notification for batches in status DONE, triggered by a workflow which name contains "Product".
BatchStatus == 'DONE' && WorkflowName.find("Product") != null
Trigger a notification if the batch has processed the "Customers" or "Contacts" entities. EntityNames is a list of the names of the entities processed by the job.
EntityNames.contains("Customers")  || EntityNames.contains("Contacts")
Groovy Template

You can use Groovy to customize some elements of the notification, such as the Payload, the subject or the name of the JMS destination of the notification.

In these cases, a Groovy Template is used generate a string output from the notification properties.

In the template:

  • The notification properties are available using the the $<property_name> syntax.
  • You can also use Groovy code surrounded with <% %> tags.
  • You can use the <%= %> syntax to output a string generated by Groovy.

Use the image Edit Expression button to open the expression editor to modify a Groovy template. In the template editor:

  • Double-click one of the Properties in the list to add it to the template. It is added with the $<property_name> syntax.
  • Click the Test button to test the template against the notification properties provided in the Test Values tab.
  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample templates are given below:

Generating a notification text file named after the Batch ID.
File Name: NotificationFile_${BatchId}$.txt
Generated email subject that contains the Job Name and Batch Status
Job ($JobName) is finished as: $BatchStatus.
Creates a message with the job name, and extra content if the batch status is not DONE.
Job ($JobName) is complete.
<% if (BatchStatus != 'DONE')  { %> Please reviews the completed batch : $BatchStatus. <% } %>
Generates an HTML content with a formatted list of entities.
<p>Job ($JobName) is complete.</p>
<p>Entities:</p>
<ul>
<% EntityNames.each() { entityName-> %>
    <li>$entityName</li>
<% } %>
</ul>

Job Notification Properties

The following table lists the properties available for job notifications.

PropertyNameDescription

BatchId

Batch ID

The ID of the batch that has submitted the job.

BatchStatus

Batch Status

Status of the batch and job:

  • DONE: The job completed successfully with no erroneous data detected by the validations.
  • ERROR: The job did not complete successfully, it was canceled by a user.
  • SUSPENDED: The job is suspended, either by a user or due to an error. It awaits for a user intervention.
  • WARNING: The job completed successfully, but some records have caused validation errors.

BatchSubmitter

Batch Submitter

The user who submitted this batch.

StartDate

Startup Date

Startup timestamp of the batch

EndDate

End Date

Completion timestamp of the batch

JobName

Job Name

Name of the job submitted for this batch.

QueueName

Queue Name

Name of the queue that ran the job

DataLocationName

Data Location Name

Name of the data location into which the load was created and the batch submitted.

EntityNames

Entity Names

List of the entities processed by the job.

TotalRowCount

Total Row Count

The total number of rows counted by the tasks in this job instance.

ErrorCount

Error Count

The total number of errors counted by the tasks in this job instance.

InsertCount

Insert Count

The total number of inserts counted by the tasks in this job instance. 

UpdateCount

UpdateCount

The total number of updates counted by the tasks in this job instance.

DeleteCount

DeleteCount

The total number of deletes counted by the tasks in this job instance.

MergeCount

MergeCount

The total number of merged records counted by the tasks in this job instance.

LoadId

Load ID

The ID of the load that was submitted to create this batch.

ProgramName

Program Name

Name of the program that created the load.

LoadDescription

Load Description

The description given to the load at creation time.

ModelName

Model Name

Name of the deployed model containing the job.

ModelBranch

Model Branch

Branch number of the deployed model.

ModelEdition

ModelEdition

Edition (version) number of the deployed model.

JobParameters

Job Parameters

List of the parameters of this job.

WorkflowName

Workflow Name

Name of the workflow that triggered this job.

ActivityLabel

Activity Label

Label of the workflow instance that triggered this job.

ActivityPriority

Activity Priority

Priority of the workflow instance that triggered this job: HIGH, LOW or NORMAL.

ActivityInitiator

Activity Initiator

The user who created the workflow instance that triggered this job.

JobRestartNumber

Job Restart Number

The number of times this job was restarted.

ActivityLastComment

Activity Last Comment

Last comment provided when submitting the workflow instance.

ServerBaseUrl

Server Base URL

Base URL of the Semarchy xDM server, as configured by the administrator.

Scheduling Data Purges

Data Purge helps you maintain a reasonable storage volume for the data location and the repository by pruning the history of data changes and job logs.

Introduction to Data Purge

The data location stores the lineage and history of the certified golden data, that is the data that led to the current state of the golden data.

Preserving the lineage and history is a master data governance requirement. It is key in a regulatory compliance focus. However, keeping this information may also create a large volume of data in the hub storage.

To make sure lineage and history are preserved according to the data governance and compliance requirements, model designers will want to define Data Retention Policy for the model.

When a model is deployed to a data location, a Purge Job is automatically created to handle data pruning according to the retention policy. The purge job prunes the lineage and history data according to the retention policy. Optionally, it prunes the job logs, batches, loads, direct authoring, duplicate manager and workflow instances when all their data is purged.

To keep a reasonable volume of information, data location managers have to schedule regular executions of this job.

Configuring a Purge Schedule

To create a purge schedule:

  1. In the Management view, expand the Data Locations node.
  2. Expand the data location for which you want to configure a purge.
  3. Double-click the Purge node. The Purge Schedule editor opens.
  4. Select or un-select the Active checkbox to make the purge schedule active or inactive.
  5. Click the image Edit button, and set the schedule for the purge with a purge frequency (Monthly, Weekly, Daily) or as a Cron Expression.
  6. Click OK to save the schedule.
  7. Select the Purge Repository Artifacts option to prune the job logs, batches, loads, direct authoring, duplicate manager and workflow instances when all their data is purged.
  8. Press CTRL+S to save the editor.
Regardless of the frequency of the purges scheduled by the data location manager, the data history retained is as defined by the model designer in the data retention policies.

Scripting Deployment and Management

All application management tasks are exposed by the REST API.

Administrative operations such as creating repositories and creating/importing user roles are also available in the REST API. See the Scripting Administration chapter in the Semarchy xDM Administration Guide for more information.

Scripted Management Overview

The REST API exposes endpoints for the following management operations:

  • Models: List, create, delete, import and export models and model editions. Manage model branches and editions as well as translations.
  • Data Locations: List, create, delete, upgrade data locations, deploy models, manage data location status and purge job logs.
  • Notification Policies: Configure, export, and import (replace) notification policies.
  • Continuous Loads: Configure, export, and import (replace) continuous load definitions.
  • Purge Schedule: Get and set the configuration of the purge schedule.
For details about using the endpoints, see the built-in REST API documentation.

REST API Documentation

The REST API exposes its built-in documentation as a link in the Welcome page. In addition, this documentation is available for tools as an OpenAPI specification.

To Access the REST API Documentation:

  1. Log in to Semarchy xDM as a user with one of the Application Design, Application Management and Platform Administration platform privileges.
  2. Click the REST API link on the Welcome page.
  3. In the upper-right menu, select Model and Application Management.

The documentation exposes the endpoint description and provides request and response samples.

Appendices

Appendix A: Cron Expressions

A cron expression defines the schedule to trigger an action. It is a string comprised of 6 or 7 fields separated by white space. Fields can contain any of the allowed values, along with various combinations of special characters for that field.

Cron fields

The fields are as follows:

Field NameMandatoryAllowed ValuesSpecial Characters

Seconds

YES

0-59

, - * /

 Minutes

YES

0-59

, - * /

Hours

YES

0-23

, - * /

Day of month

YES

1-31

, - * ? / L W

Month

YES

1-12 or JAN-DEC

, - * /

Day of week

YES

1-7 or SUN-SAT

, - * ? / L #

Year

NO

empty, 1970-2099

, - * /

Day of week in Semarchy xDM cron expressions have a value range of 1-7 or SUN-SAT, and not 0-6.

Special Characters

The Special Characters available in cron expressions are listed below:

  • * : All values within a field. For example, * in the Minutes field means "every minute".
  • ? : No specific value. For example, to run the 12th of the month, ignoring the day of the week, put "12" in the Day of month, and "?"" in the Day of week.
  • - : Used to specify ranges. For example, "9-12" in the hour field means "the hours 9, 10, 11 and 12".
  • , : Used to specify additional values. For example, "MON,WED,FRI" in the Day of week means "Mondays, Wednesdays, and Fridays".
  • / : Used to specify increments. For example, "0/15" in the seconds field means "the seconds 0, 15, 30, and 45". "5/15" in the seconds field means "the seconds 5, 20, 35, and 50". "1/3" in the Day of month field means "every 3 days starting on the first day of the month".
  • L: Last has different meaning in each of the two fields in which it is allowed. For example:
    • Day of month: the last day of the month
    • Day of week: "L" means Saturday. When used after a number x, it means "the last x day of the month" - for example "6L" means "the last friday of the month".
  • W: Used to specify the weekday (Monday-Friday) nearest the given day. As an example, if you were to specify "25W" as the value for the Day of month field, the meaning is: "the nearest weekday to the 25th of the month".
  • #: Used to specify "the nth" XXX day of the month. For example, the value of "6#3" in the Day of the week field means "the third Friday of the month".
Semarchy xDM uses the Quartz Scheduler for cron triggers. The full cron syntax is available here.

Examples

ExpressionSchedule

0 0 12 * * ?

12pm (noon) every day

0 25 10 ? * *

10:25am every day

0 25 10 * * ?

10:25am every day

0 25 10 * * ? *

10:25am every day

0 25 10 * * ? 2005

10:25am every day during the year 2005

0 * 14 * * ?

Every minute starting at 2pm and ending at 2:59pm, every day

0 0/5 14 * * ?

Every 5 minutes starting at 2pm and ending at 2:55pm, every day

0 0/5 14,18 * * ?

Every 5 minutes starting at 2pm and ending at 2:55pm, AND every 5 minutes starting at 6pm and ending at 6:55pm, every day

0 0-5 14 * * ?

Fire every minute starting at 2pm and ending at 2:05pm, every day

0 10,44 14 ? 3 WED

2:10pm and at 2:44pm every Wednesday in the month of March.

0 25 10 ? * MON-FRI

10:25am every Monday, Tuesday, Wednesday, Thursday and Friday

0 25 10 25 * ?

10:25am on the 25th day of every month

0 25 10 L * ?

10:25am on the last day of every month

0 25 10 L-2 * ?

10:25am on the 2nd-to-last last day of every month

0 25 10 ? * 6L

10:25am on the last Friday of every month

0 25 10 ? * 6L 2002-2005

10:25am on every last friday of every month during the years 2002, 2003, 2004 and 2005

0 25 10 ? * 6#3

10:25am on the third Friday of every month

0 0 12 1/5 * ?

12pm (noon) every 5 days every month, starting on the first day of the month.

0 11 11 11 11 ?

Every November 11th at 11:11am.