Configure Semarchy xDM for high availability

Semarchy xDM can be configured to support enterprise-scale deployment and high-availability.

Semarchy xDM supports the clustered deployment of the Semarchy xDM web application for high availability and failover. For example, a clustered deployment can be set up to support a large number of concurrent users performing data access and authoring operations.

Reference architecture for high-availability

In a clustered deployment, only one node of the Semarchy xDM application manages and runs certification processes. This node is the Active Node. A set of clustered Semarchy xDM applications serves users accessing the user interfaces (Application Builder, Dashboard Builder, xDM Discovery, Configuration or the data management applications) as well as applications accessing data locations via integration points. These are Passive Nodes. The back-end databases hosting the repository and data locations are deployed in a database cluster, and an HTTP load balancer is used as a front-end for users and applications.

The reference architecture for such a configuration is described in the following figure: ha architecture

This architecture is composed of the following components:

  • HTTP Load Balancer: This component manages the sessions coming from within the enterprise network or from the Internet (typically via a Firewall). This component may be a dedicated hardware load balancer or a software solution, which distributes the incoming sessions on the passive nodes running in the JEE application server cluster.

  • JEE Application Server Cluster + Passive Semarchy xDM Platforms: A Semarchy xDM application instance is deployed on each node of this cluster, which is scaled to manage the volume of incoming requests. In the case of a node failure, the other nodes remain available to serve the sessions. The Semarchy xDM applications deployed in the cluster are Passive Nodes. Such a node provides access to the Semarchy user interfaces and integration endpoints but is unable to manage batches and jobs.

  • JEE Server + Active Semarchy xDM Platform: This single JEE server hosts the only complete Semarchy xDM platform of the architecture. This Active Node is not accessible to users or applications. Its sole purpose is to poll the submitted batches and process jobs. The Active Node is not necessarily part of the same cluster containing the Passive Nodes.

  • Database Cluster: This component hosts the Semarchy xDM Repository and the Data Locations databases/schemas in a clustered environment. Both active and passive nodes of the Semarchy xDM Platform connect to this cluster using platform datasources.

In this architecture:

  • Design-time or administrative operations are processed by the passive nodes in the JEE application server cluster.

  • Operations performed on the Data Hubs (data access, steppers, or external loads) are also processed by the passive nodes, but the resulting batches and jobs are always processed by the single active node.

Only one Active Node must be configured. Multiple active nodes are not supported.
For Passive Nodes, built-in clustering capabilities offered by application servers such as Apache Tomcat Clustering are not supported.
The xDM Dashboard and xDM Discovery components run identically on active or passive nodes. For example, the Discovery profiling processes can run even on passive nodes.

Load balancing

Load balancing ensures optimal usage of the resources for a large number of users and applications simultaneously accessing Semarchy xDM.

Load balancing is performed at two levels:

  • The HTTP load balancer distributes the incoming requests on the nodes of the JEE application server cluster.

  • The JDBC datasource configuration distributes database access to the repository and the data locations on the database cluster nodes. In PostgreSQL and SQL Server environments, use the datasource configuration to enable load balancing on the multiple nodes of the cluster.

Clustered mode

Semarchy xDM runs by default in Clustered Mode that enables nodes to automatically retrieve configuration changes and model deployments.

In this mode, the following changes apply automatically to all the nodes in a cluster:

  • Model deployments in data locations. This automatically refreshes the applications and REST API.

  • Logging configuration changes.

  • New plug-in deployment or plug-in updates.

  • New or updated custom translations.

The engine, batch poller, purge schedules, continuous loads, notifications, and notification server configurations are not affected since they run and can be configured only on the active node.

Failure and recovery

In the reference architecture, failover is managed for both user and application sessions.

The following table describes the behavior and the required recovery actions in case of a failure in the various points of the architecture.

Failure type Behavior and required actions

Database Failure

In the event of a database cluster node failure, other nodes can recover and process the incoming database requests.

Passive Node Failure

If one of the nodes of the JEE application server cluster fails:

  • Application sessions are moved to the other active nodes.

  • User sessions to this node are automatically restarted on other active nodes.

The only information not recovered is the content of the un-saved editors for the user sessions. All the other content is saved in the repository or the data locations. Transactions attached to steppers, for example, are saved in the data locations and not lost.

Active Node Failure

The purpose of the active node is to process batches and jobs.
If this server fails:

  • Jobs running the queues are halted;

  • Queued jobs remain in their queue;

  • Incoming batches remain pending for the batch poller to process them.

The active node must be restarted automatically or manually to fully recover from a failure.

When it is restarted, the platform resumes its normal course of activity with no user action required.

A Failure of the Active Node does not impact the overall activity of users or applications, as these rely on the (clustered) Passive Nodes. The only impact of such a failure may be a delay in the processing of data changes.

Configure Semarchy xDM for high-availability

Active vs. passive nodes

The Semarchy xDM server comes in two flavors corresponding to two Web Application Archive (WAR) files:

  • The Active Node (semarchy.war) includes the active application to deploy on the single active node. This WAR includes the batch poller and the engine and can trigger and process the submitted batches.

  • The Passive Node (semarchy-passive.war) includes the passive application to deploy on all the passive nodes of the cluster. This WAR does not include the batch poller and engine services. It is unable to trigger or process submitted batches.

Both these files are in the semarchy-mdm-install-<version tag>.zip archive file, in the mdm-server folder.

Install and configure Semarchy xDM

The overall installation process for a high-availability configuration is similar to the general installation process:

  1. Create the repository and data location databases/schemas in the database cluster.

  2. Configure the application server security for both the cluster and the active node.

  3. Configure the datasources for the nodes/cluster.
    Refer to your Oracle Database and Application Server documentation for more information about configuring RAC JDBC Datasources. If you are using PostgreSQL or SQL Server, refer to the JDBC driver documentation to configure the datasource for high-availability and load balancing.

  4. Deploy the applications:

    1. Deploy the active node.
      The architecture only supports one active node and there is no need to load balance it. The active node can be deployed in the semarchy context using the semarchy.war file.
      The active node is available on the https://active-host:active-host-port/semarchy/ URL.

    2. Deploy multiple passive nodes behind the load balancer.
      You can deploy the passive nodes using the same context as the active node, or a different context.

      • Deploying with the same context
        When creating a passive node using semarchy-passive.war, rename this file to semarchy.war before deployment. This keeps the same deployment name (semarchy) for the active and the passive nodes and usually simplifies load balancing configuration.
        In this configuration, the passive nodes are available behind the load balancer on the https://load-balancer-host:load-balancer-port/semarchy/ URL.

      • Deploying with a different context
        When creating a passive node using semarchy-passive.war, keep the semarchy-passive.war file name.
        In this configuration, the passive nodes are available behind the load balancer on the https://load-balancer-host:load-balancer-port/semarchy-passive/ URL.

When deploying multiple nodes, make sure to use the same startup configuration for all the nodes as they all connect to the same repository.

Configure the HTTP load balancer

Semarchy xDM requires that you configure HTTP Load Balancing with Sticky Sessions (also known as session persistence or session affinity). In this mode, requests from existing sessions are consistently routed to the same server. This is mandatory for the Semarchy xDM user interfaces, but not for integration points.
For example, for Amazon Web Services (AWS) deployments, sticky sessions are configured in the Load Balancer.