Configure Datasources

A Platform Datasource defines a connection to a database schema used by Semarchy xDM features. Such a connection uses Java Database Connectivity (JDBC).

Each datasources has an associate Connection Pool to provide Semarchy xDM features with ready-to-use connections.

Administrators configure datasources in the Semarchy xDM platform for usage by specific platform features, such as data locations, dashboards, xDM Discovery, variable value providers, plug-ins, etc.

Specific Database Connections

Certain database connections performed by Semarchy xDM are not configured as platform datasources:

Datasource Usages

A datasource is configured with specific database credentials. As such, it has access to a certain number of resources and operations in the database.

You can configure for a datasource a set of usages. Usages restrict the features of Semarchy xDM that can use the datasource.

Datasource Usage Example

A CustomerDataHub datasource is configured to deploy a data location and has the Data Location Usage. The required database privileges for a data location include DDL and CRUD operations on all tables in the schema.

Such privileges are excessive for dashboarding. As a consequence, this datasource should not have the Dashboard Usage.

A second CustomerDataDashboards datasource should be created, using different credentials and fewer privileges - for example, SELECT on a subset of the tables -. This second datasource would have the Dashboard Usage.

Datasources defined for usage with xDM Dashboard or xDM Discovery should preferably authenticate with users having limited privileges. Typically, they should only have read access, and possibly only to a subset of the tables and views.

Connection Pool

Each datasource has a built-in connection pool. The connection pool maintains a set of ready-to-use connections. Features using a datasource request available connections from this pool as needed, and release these connections into the pool when done.

The pool creates new connections or removes idle connections automatically, based on the Configuration Properties.

Configure a Datasource

To connect a database, a driver is required. Drivers are provided by the database vendors as jar files that must be installed in the application server. For example, in the lib/ sub-folder of the Tomcat installation. See the application server specific installation instructions for more details.

Datasource configuration is a platform-level administrative tasks performed in Semarchy Configuration.

To create a datasource:

  1. In Configuration, select Datasources in the navigation drawer.
    The list of Datasources opens.

  2. Click on the Add Datasource floating action button in the lower-right corner of the screen.

  3. In the New Datasource dialog, enter the datasource Name.

  4. Select the Database Type corresponding to your database engine technology. If your technology is not listed, select Other.

  5. Click CREATE.

  6. In the Datasource editor, set the following properties:

    • Driver: The fully qualified Java class name of the JDBC driver. For example, org.postgresql.Driver for a PostgreSQL database

    • URL: The JDBC connection URL. This URL depends on the driver. For example, a PostgreSQL database URL has the following format: jdbc:postgresql://<host>:<port>/<database>.

    • Username and Password: The credentials used to establish the connection.

  7. Set the Minimum Idle and Maximum Pool Size to configure the size of the connection pool.

  8. Select the Datasource Usages that you want to allow.

  9. Optionally configure the Advanced Properties of the connection pool

  10. Optionally configure the Driver Properties. See Database-Specific Driver Properties for required driver properties per database technology, and refer to the driver documentation for the comprehensive list of applicable properties.

  11. Save the datasource, and then click the check datasource Test button to test it.

Changes performed in the datasource editor immediately apply.

Database-Specific Driver Properties

When configuring datasources for certain technologies, specific driver properties must be set.

Note that these properties are automatically set when creating a new datasource.

Oracle

A datasource to connect an Oracle database must be configured with the oracle.jdbc.J2EE13Compliant property set to true, to make the Oracle driver behave in a Java EE-compliant manner.

When this option is not set, errors such as the following one will be raised in the application log.

com.semarchy.mdm.runtime.data.InvalidDataAccessResourceUsageException: java.lang.RuntimeException: Unexpected DB value .... (Class oracle.sql.TIMESTAMP for logicalType TIMESTAMP)

PostgreSQL

The default fetch size of the Postgresql JDBC driver is zero. This means a select statement with no limit or pagination logic returns the entire recordset to the application server and store it in a buffer, possibly causing excessive memory consumption.

It is recommended to set the defaultRowFetchSize property to 2000 for PostgreSQL datasources, particularly for those used for data locations.

Configuration Properties

The following table lists the configuration properties for datasources.

Connection Timeout

Maximum number of milliseconds to wait to establish a connection. The default value is set to 30000 (30 seconds).

Idle Timeout

Amount of time in milliseconds allowed for a connection to sit in the pool before being retired. Set to 0 to disable idle connections retirement. The default value is set to 600000 (10 minutes).

Keep Alive Time

Frequency of the attempts to keep idle connections alive, to prevent database/network timeout, possibly using a connection test. Set this property to 0 to disable that feature.

Max Lifetime

Maximum lifetime of a connection before it is retired from the pool. The default value is set to 1800000 (30 minutes).

Connection Test Query

SQL query used to test or keep a connection alive. When unset, the Connection.isValid() JDBC method is used.

Minimum Idle

Minimum number of idle connections to maintain in the pool. The default value is set to 1.

Maximum Pool Size

Maximum number of idle and in-use connections in the pool. This property must be greater than the Minimum Idle property. The default value is set to 8.

Connection Initialization SQL

SQL query executed when creating a connection and adding it to the pool.

Validation Timeout

Maximum number of milliseconds to wait for a connection test to succeed. The default value is set to 5000 (5 seconds)

Configure Datasources using the REST API

Endpoints are available on the Semarchy xDM REST API to consult and configure datasources.

For more details, refer to the REST API documentation.

Datasource Tuning

When configuring datasources for Semarchy xDM, more specifically in production environments, you must take into account several considerations listed below.

Connection Pool Sizing

Datasources configured for Semarchy use connection pools, sized according to the expected usage.

In the normal course of operation of Semarchy:

  • Each simultaneous user session connected to and interacting with an MDM application makes one connection to the data location datasource and possibly one to the repository when working with steppers.

  • Each simultaneous integration job makes one connection to the data location datasource, plus one to the repository datasource.

  • Each chart rendered by xDM Dashboard makes one connection to the datasource containing the data when it renders.

  • Each profiling process running in xDM Discovery makes one connection to the datasource containing the data it profiles and one connection to the repository.

  • Each chart rendered on an xDM Discovery profiling chart makes one connection to the repository.

You must configure the pool size as a trade-off between two directions:

  • If the pool creates and keeps too many connections, it will be ready to serve any client immediately, but it will also overuse resources by pro-actively creating too many connections.

  • If the pool does not have enough connections available, then the pool may be exhausted at certain times, causing client requests to wait until a connection becomes available in the pool. In a nutshell, the more users try to access a small pool and the longer their queries, the more they will have to wait between each click.

The connection pool size for each datasource is configured using the Minimum Idle and Maximum Pool Size parameters.

Connection Management

When Semarchy takes a connection from the connection pool, it assumes that this connection is a valid one. In certain situations, the connection may be invalid at the database side (for maintenance reasons, etc.). It is a good practice to configure the pool to test connections before serving them.

This capability is enabled using the Connection Test Query parameter, which provides a SQL query used to test or keep a connection alive. If this parameter is unset, the Connection.isValid() JDBC method is used to test the connection before serving it from the pool.