Configure Platform Datasources

A Platform Datasource defines a connection to a database schema used by Semarchy xDM features. Such a connection uses Java Database Connectivity (JDBC).

Each datasources has an associate Connection Pool to provide Semarchy xDM features with ready-to-use connections.

Overview

Administrators configure platform datasources in the Configuration module for usage by specific platform features, such as data locations, dashboards, xDM Discovery, variable value providers, plug-ins, etc.

Datasources for Dashboards

The following types of databases are supported for creating datasources for dashboards in the Dashboard Builder module:

  • Oracle

  • Postgres

  • MySQL

  • SQLServer

  • Snowflake

Specific Connections

Certain database connections performed by Semarchy xDM are not configured as platform datasources:

Datasource Usages

A datasource is configured with specific database credentials. As such, it has access to a certain number of resources and operations in the database.

You can configure for a datasource a set of usages. Usages restrict the features of Semarchy xDM that can use the datasource.

Datasource Usage Example

A CustomerDataHub datasource is configured to deploy a data location and has the Data Location Usage. The required database privileges for a data location include DDL and CRUD operations on all tables in the schema.

Such privileges are excessive for dashboarding. As a consequence, this datasource should not have the Dashboard Usage.

A second CustomerDataDashboards datasource should be created, using different credentials and fewer privileges - for example, SELECT on a subset of the tables -. This second datasource would have the Dashboard Usage.

Datasources defined for usage with xDM Dashboard or xDM Discovery should preferably authenticate with users having limited privileges. Typically, they should only have read access, and possibly only to a subset of the tables and views.

Connection Pools

Each datasource has a built-in connection pool. The connection pool maintains a set of ready-to-use connections. Features using a datasource request available connections from this pool as needed, and release these connections into the pool when done.

The pool creates new connections or removes idle connections automatically, based on the Minimum Idle, Maximum Pool Size, as well as the Advanced Properties.

Configure a Datasource

A JDBC driver is required to connect a database. Drivers are provided by the database vendors as jar files, that must be installed in the application server. For example, in the lib/ sub-folder of the Tomcat installation. See the application server specific installation instructions for more details.

Datasource configuration is a platform-level administrative tasks performed in Semarchy Configuration.

To create a datasource:

  1. In Configuration, from the navigation drawer, select Datasources.
    The list of Datasources opens.

  2. Click the Add Datasource floating action button in the lower-right corner of the screen.

  3. In the New Datasource dialog, enter the datasource Name.

  4. Select the Database Type corresponding to your database engine technology. If your technology is not listed, select Other.

  5. Click CREATE.

  6. In the Datasource editor, set the following properties:

    • Driver: The fully qualified Java class name of the JDBC driver. For example, org.postgresql.Driver for a PostgreSQL database

    • URL: The JDBC connection URL. This URL depends on the driver. For example, a PostgreSQL database URL has the following format: jdbc:postgresql://<host>:<port>/<database>.

    • Username and Password: The credentials used to establish the connection.

  7. Set the Minimum Idle and Maximum Pool Size to configure the size of the connection pool.

  8. Optionally configure the Advanced Properties of the connection pool.

  9. Optionally configure the Driver Properties specific to the database technology. Refer to the driver documentation for the comprehensive list of properties.

  10. Select the Datasource Usages that you want to allow. properties.

  11. Save the datasource, and then click the check datasource Test button to test it.

Changes performed in the datasource editor immediately apply.

Advanced Properties

The following table lists the advanced properties to configure datasources.

Connection Timeout

Maximum number of milliseconds to wait to establish a connection. The default value is set to 30000 (30 seconds).

Idle Timeout

Amount of time in milliseconds allowed for a connection to sit in the pool before being retired. Set to 0 to disable idle connections retirement. The default value is set to 600000 (10 minutes).

Keep Alive Time

Frequency of the attempts to keep idle connections alive, to prevent database/network timeout, possibly using a connection test. Set this property to 0 to disable that feature.

Max Lifetime

Maximum lifetime of a connection before it is retired from the pool. The default value is set to 1800000 (30 minutes).

Connection Test Query

SQL query used to test or keep a connection alive. When unset, the Connection.isValid() JDBC method is used.

Minimum Idle

Minimum number of idle connections to maintain in the pool. The default value is set to 1.

Maximum Pool Size

Maximum number of idle and in-use connections in the pool. This property must be greater than the Minimum Idle property. The default value is set to 8.

Connection Initialization SQL

SQL query executed when creating a connection and adding it to the pool.

Validation Timeout

Maximum number of milliseconds to wait for a connection test to succeed. The default value is set to 5000 (5 seconds)

Driver Properties

When configuring datasources for certain technologies, specific driver properties must be set.

Note that these properties are automatically set when creating a new datasource.

Oracle

A datasource to connect an Oracle database must be configured with the oracle.jdbc.J2EE13Compliant property set to true, to make the Oracle driver behave in a Java EE-compliant manner.

When this option is not set, errors such as the following one will be raised in the application log.

com.semarchy.mdm.runtime.data.InvalidDataAccessResourceUsageException: java.lang.RuntimeException: Unexpected DB value .... (Class oracle.sql.TIMESTAMP for logicalType TIMESTAMP)

PostgreSQL

The default fetch size of the Postgresql JDBC driver is zero. This means a select statement with no limit or pagination logic returns the entire recordset to the application server and store it in a buffer, possibly causing excessive memory consumption.

It is recommended to set the defaultRowFetchSize property to 2000 for PostgreSQL datasources, particularly for those used for data locations.

Configure Replicas

When deploying Semarchy xDM in distinct environments, you can configure, for each datasource, a different configuration for each environment.

To configure environment-specific configurations for a datasource, you can create Replicas for this datasource, each replica is identified by an Environment ID.

Datasource Replicas for Development and Production

In your original environment, you address a DataHub datasource with a given URL and specific credentials. In the various production environments (PROD_US_WEST, PROD_US_EAST, etc) you want to use different URLs and credentials.

You can configure that datasource by default for the original environment, and configure replicas for the various production environments

Create a Replica

To create a replica:

  1. In the datasource editor, select the Replicas tab. The list of replicas opens.

  2. Click the Add Replica button.

  3. In the New Replica dialog, enter the Environment ID.
    Note that you cannot have two replicas with the same environment ID for a given datasource.

  4. The replica editor opens, seeded with the configuration of the base datasource.
    Modify this configuration to meet the environment requirements. The replica exposes the same properties as the base datasource.

  5. Save the datasource, and then click the check datasource Test button to test the replica configuration.

Use Replicas

When starting an Semarchy xDM instance, you can specify the environment into which the instance runs using the xdm.datasources.environmentid startup configuration property.
You can set this property:

  • Using an environment variable:

    export XDM_DATASOURCES_ENVIRONMENTID=PROD
  • Using a Java system property:

    -Dxdm.datasources.environmentid=PROD
  • In the startup configuration file:

    xdm.datasources.environmentid=PROD

When the instance starts:

  • With no environment ID set, each datasource uses its base configuration, and replicas are not used.

  • with an environment ID set, each datasource uses the configuration of the replica for this environment ID instead of the base datasource configuration. If no replica exists for the environment ID, the base datasource configuration is used instead.

Database Read Replicas

You can use datasource replicas to configure and run Semarchy xDM in environments into which you have Database Read Replicas, that is read-only copies of writable database instances.

Database Read Replicas are supported:

  • In the dashboard datasources.

  • In the data location datasources, only for read-only operations such as querying the hub data with the REST API. Actions requiring to write to the data location using the REST API or applications UI will fail.

Configure Datasources using the REST API

Endpoints are available on the Semarchy xDM REST API to consult and configure datasources.

For more details, refer to the REST API documentation.

Datasource Tuning

When configuring datasources for Semarchy xDM, more specifically in production environments, you must take into account several considerations listed below.

Connection Pool Sizing

Datasources configured for Semarchy use connection pools, sized according to the expected usage.

In the normal course of operation of Semarchy:

  • Each simultaneous user session connected to and interacting with an MDM application makes one connection to the data location datasource and possibly one to the repository when working with steppers.

  • Each simultaneous integration job makes one connection to the data location datasource, plus one to the repository datasource.

  • Each chart rendered by xDM Dashboard makes one connection to the datasource containing the data when it renders.

  • Each profiling process running in xDM Discovery makes one connection to the datasource containing the data it profiles and one connection to the repository.

  • Each chart rendered on an xDM Discovery profiling chart makes one connection to the repository.

You must configure the pool size as a trade-off between two directions:

  • If the pool creates and keeps too many connections, it will be ready to serve any client immediately, but it will also overuse resources by pro-actively creating too many connections.

  • If the pool does not have enough connections available, then the pool may be exhausted at certain times, causing client requests to wait until a connection becomes available in the pool. In a nutshell, the more users try to access a small pool and the longer their queries, the more they will have to wait between each click.

The connection pool size for each datasource is configured using the Minimum Idle and Maximum Pool Size parameters.

Connection Management

When Semarchy takes a connection from the connection pool, it assumes that this connection is a valid one. In certain situations, the connection may be invalid at the database side (for maintenance reasons, etc.). It is a good practice to configure the pool to test connections before serving them.

This capability is enabled using the Connection Test Query parameter, which provides a SQL query used to test or keep a connection alive. If this parameter is unset, the Connection.isValid() JDBC method is used to test the connection before serving it from the pool.