Configure the Database Schemas

This section explains how to configure the databases/schemas for the Semarchy xDM repository and data locations.

Semarchy xDM requires the following four elements set on your database to be installed and boot properly:

  • Repository Storage: Internal storage used by Semarchy xDM to store the design-time and run-time information (recommended name: semarchy_repository)

  • Repository User: User to log in to the Repository storage with write privileges (recommended name: semarchy_repository).

  • Repository Read-only User: User to log in to the Repository storage with read-only priviliges (recommended name: semarchy_repository_readonly). It is used by Semarchy xDM Discovery’s built-in profile dashboards.

  • At least one Data Location Storage: Contains the database tables and other objects generated from the model edition of your application (Examples: customer_mdm, product_data_location).

Configure the Repository Storage

Repository Storage

Before installing Semarchy xDM, you must create a storage for the repository. You can create it manually or use your database administration interface for this purpose. In this section, we provide sample scripts for creating this storage. Make sure to adapt this script to your database configuration.

Example 1. Create the repository schema (Oracle)
CREATE USER <repository_user> IDENTIFIED BY <repository_password>
 DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
GRANT CONNECT, RESOURCE TO <repository_user>;

-- The following command should be used for Oracle 12c and above
GRANT UNLIMITED TABLESPACE TO <repository_user>;
Example 2. Create a database and the repository schema (PostgreSQL)
/* Create a database for the repository and data locations */

CREATE DATABASE <postgresql_database_name> WITH ENCODING 'UTF8';

/*
 * Disconnect and then reconnect using:
 *  the JDBC URL:  jdbc:postgresql://<host>:<port>/<postgresql_database_name>
 *  or using psql with the following command: psql -U postgres <postgresql_database_name>
 */

CREATE SCHEMA extensions;
GRANT USAGE ON SCHEMA extensions TO PUBLIC;
ALTER DEFAULT PRIVILEGES IN SCHEMA extensions GRANT EXECUTE ON FUNCTIONS TO PUBLIC;
ALTER DATABASE <postgresql_database_name> SET SEARCH_PATH TO "$user",public,extensions;

CREATE EXTENSION IF NOT EXISTS "uuid-ossp"     with schema extensions;
CREATE EXTENSION IF NOT EXISTS "fuzzystrmatch" with schema extensions;

/* Create the repository user and schema */

CREATE USER <repository_user> WITH PASSWORD '<repository_password>';

/* Use the following syntax for PostgreSQL 9 */
-- CREATE USER <repository_user> WITH UNENCRYPTED PASSWORD '<repository_password>';

/*
 * The following command is required only for PostgreSQL running on Amazon RDS.
 * It grants access to the repository to the RDS superuser.
 */
-- GRANT <repository_user> TO <rds_superuser_name>

CREATE SCHEMA <repository_user> AUTHORIZATION <repository_user>;
Example 3. Create a repository database, login, and user (SQL Server)
/* Create a database for the repository */

CREATE DATABASE <repository_database_name>
GO

/* Configure the database */

ALTER DATABASE <repository_database_name> SET READ_COMMITTED_SNAPSHOT ON;
GO

ALTER DATABASE <repository_database_name> SET QUOTED_IDENTIFIER ON;
GO

/* Create a login to connect the database */

CREATE LOGIN <repository_user> WITH PASSWORD='<repository_password>', DEFAULT_DATABASE=<repository_database_name>
GO

/* Add a user for that login in the database */

USE <repository_database_name>
GO

CREATE USER <repository_user> FOR LOGIN <repository_user>
GO

/* Make this user database owner */

ALTER ROLE db_owner ADD MEMBER <repository_user>
GO
Store the values of the <repository_user>, <repository_password>, and the <repository_database_name> (for SQL Server), as you will need them later for the startup configuration.

Repository Read-only User

In addition to the repository storage and the credentials with full access to the repository schema, a user with limited access to the repository must be created.

This user is used by xDM Discovery’s built-in profile dashboards and should have at least read-only access to the tables in the repository schema storing the profiling metrics. The tables have the PRF_ prefix.

Since the repository tables are not created yet, you must run the GRANT statements for this user on the repository tables after the repository creation. These grants are also provided in the post-installation steps.
Example 4. Create the repository read-only user (Oracle)
-- Create the repository read-only user
CREATE USER <repository_readonly_user>
 IDENTIFIED BY <repository_readonly_password>
 DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;

-- Grant minimum access to the user
GRANT CONNECT TO <repository_readonly_user>;

-- Run the following commands after the repository creation.
-- Grant select privileges on the profiling tables
GRANT SELECT ON <repository_user>.PRF_PROFILING TO <repository_readonly_user>;
GRANT SELECT ON <repository_user>.PRF_TABLE TO <repository_readonly_user>;
GRANT SELECT ON <repository_user>.PRF_COLUMN TO <repository_readonly_user>;
GRANT SELECT ON <repository_user>.PRF_DIST_VALS TO <repository_readonly_user>;
GRANT SELECT ON <repository_user>.PRF_DIST_PATTERNS TO <repository_readonly_user>;

-- The <repository_readonly_user> user must use the following statement to default his schema to the repository.
-- ALTER SESSION SET CURRENT_SCHEMA = <repository_user>;
Example 5. Create the repository read-only user (PostgreSQL)
/* Create the repository read-only user */
CREATE USER <repository_readonly_user> WITH PASSWORD '<repository_readonly_password>';

/* Use the following syntax for PostgreSQL 9 */
-- CREATE USER <repository_readonly_user> WITH UNENCRYPTED PASSWORD '<repository_readonly_password>';

GRANT CONNECT ON DATABASE <postgresql_database_name> to <repository_readonly_user>;

/* Setting the search path to include the repository */
ALTER ROLE <repository_readonly_user> SET SEARCH_PATH TO "$user", <repository_user>,public,extensions;

/* Run the following commands after the repository creation. */
/* Grant select privileges on the profiling tables */
GRANT USAGE ON SCHEMA <repository_user> TO <repository_readonly_user>;
GRANT SELECT ON TABLE
    <repository_user>.PRF_PROFILING,
    <repository_user>.PRF_TABLE,
    <repository_user>.PRF_COLUMN,
    <repository_user>.PRF_DIST_VALS,
    <repository_user>.PRF_DIST_PATTERNS
TO <repository_readonly_user>;
Example 6. Create the repository read-only user (SQL Server)
/* Create the repository read-only user */
CREATE LOGIN <repository_readonly_user> WITH PASSWORD='<repository_readonly_password>', DEFAULT_DATABASE=<repository_database_name>
GO

/* Add a user for that login in the database */

USE <repository_database_name>
GO

CREATE USER <repository_readonly_user> FOR LOGIN <repository_readonly_user>;
GO

/* Grant minimum access to the user */
GRANT CONNECT TO <repository_readonly_user>;

/* Run the following commands after the repository creation. */
/* Grant select privileges on the profiling tables */
GRANT SELECT ON PRF_PROFILING TO <repository_readonly_user>;
GRANT SELECT ON PRF_TABLE TO <repository_readonly_user>;
GRANT SELECT ON PRF_COLUMN TO <repository_readonly_user>;
GRANT SELECT ON PRF_DIST_VALS TO <repository_readonly_user>;
GRANT SELECT ON PRF_DIST_PATTERNS TO <repository_readonly_user>;
Store the values of the <repository_readonly_user> and <repository_readonly_password> as you will need them later for the startup configuration.

Configure the Data Locations Storage

You do not need to create the data locations' databases/schemas at installation time, but it is recommended to plan them as part of the installation and configuration effort. You can create them manually or use your database administration interface for this purpose. In this section, we provide a sample script for creating a data location database/schema. Make sure to adapt this script to your database configuration and duplicate it to create the storage for all data locations.

Example 7. Create a data location schema (Oracle)
CREATE USER <data_location_user_name> IDENTIFIED BY <data_location_user_password>
 DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP;
GRANT CONNECT, RESOURCE, CREATE VIEW TO <data_location_user_name>;

-- The following command should be used for Oracle 12c and above
GRANT UNLIMITED TABLESPACE TO <data_location_user_name>;
The CREATE VIEW privilege is required to deploy Database Views.
Example 8. Create a data location schema (PostgreSQL)
CREATE USER <data_location_user_name> WITH PASSWORD '<data_location_user_password>';

/* Use the following syntax for PostgreSQL 9 */
-- CREATE USER <data_location_user_name> WITH UNENCRYPTED PASSWORD '<data_location_user_password>';

/*
 * The following command is required only for PostgreSQL running on Azure or Amazon RDS.
 * It grants access to the data location to the Azure or RDS superuser.
 */
-- GRANT <data_location_user_name> TO <azure|rds_superuser_name>

CREATE SCHEMA <data_location_user_name> AUTHORIZATION <data_location_user_name>;
Example 9. Create a data location database, login, and user (SQL Server)
/* Create a database for the data location */

CREATE DATABASE <data_location_database_name>
GO

/* Configure the database */

ALTER DATABASE <data_location_database_name> SET READ_COMMITTED_SNAPSHOT ON;
GO

ALTER DATABASE <data_location_database_name> SET QUOTED_IDENTIFIER ON;
GO

/* Create a login to connect the database */

CREATE LOGIN <data_location_user_name> WITH PASSWORD='<data_location_user_password>', DEFAULT_DATABASE=<data_location_database_name>
GO

/* Add a user for that login in the database */

USE <data_location_database_name>
GO

CREATE USER <data_location_user_name> FOR LOGIN <data_location_user_name>
GO

/* Make this user database owner */

ALTER ROLE db_owner ADD MEMBER <data_location_user_name>
GO
Store the values of the <data_location_user_name>, <data_location_user_password>, and <data_location_database> (for SQL Server), as you will need them later for creating the data location datasource.

Database-Specific Considerations

General Considerations

Configure the database with a charset that supports all languages, such as AL32UTF8 for Oracle and UTF8 for PostgreSQL. Semarchy xDM uses specific characters for storing internal information. Applications in Semarchy xDM natively support multi-lingual data without preventing users from entering accented characters (or Cyrillic or Arabic or Chinese). A database configured for a specific language or with a limited charset may not function optimally with Semarchy xDM.

Oracle

Repositories and data locations should be located in separate schemas. However, they do not necessarily need to be located in the same database.

Semarchy xDM ships with an Oracle JDBC driver (ojdbc8.jar) for Oracle Database version 12c Release 2 (12.2.x). This driver is strongly recommended for all recent database versions. If you are using an older Oracle version (11g), it is recommended to review the compatibility of this driver with your Oracle database version and possibly install an older driver version instead (ojdbc6 or ojdbc7).

PostgreSQL

Repositories and data locations should be located in separate schemas.

SQL Server

The configuration presented above uses logins defined in the database. Change this script to use Windows or AD logins as needed.

Semarchy xDM does not support schemas for SQL Servers. One database is required for each repository and data location, and each user used to log in to these databases should have the db_owner (dbo) role.

Be cautious of the collation when configuring the database instance:

  • Collation defines the code page, the case (CS/CI), and accent (AS/AI) sensitivity:

    • It has strong impacts on comparison functions, order by clauses, and the overall execution performances.

    • During a comparison, a character that is out of the collation codepage is considered as an unknown character and is always different from another unknown character, which may cause issues in comparisons.

  • The collation selected for the database hosting the repository and should be selected carefully to support all the possible characters that you plan to use or store in the hub. In addition, Semarchy xDM internally uses the following special characters (identified with their Unicode number): £ (U+00A3), $ (U+0024), ¤ (U+00A4), • (U+2022). These characters should be also supported by the collation.

You may consider using a UTF-8 collation if supported by your SQL Server version.
  • It is recommended to select a case-sensitive collation for the repository and data locations database. Using a case-insensitive collation may cause unexpected results in search operations.

  • Note that the SEM_NORMALIZE function used is made collation-proof and forces the Latin1_General_100_CS_AS_KS_WS_SC collation.

SQL Server repository and data locations databases should also be configured as follows for Semarchy xDM:

  • QUOTED_IDENTIFIER should be set to ON to force SQL Server to follow the ISO rules for identifiers and literal values quoting, using the following command:

    ALTER DATABASE <database_name> SET QUOTED_IDENTIFIER ON;
  • READ_COMMITTED_SNAPSHOT should be set to ON to allow connections to access the previous (committed) version of the records being modified instead of waiting for them to be unlocked, using the following command:

    ALTER DATABASE <database_name> READ_COMMITTED_SNAPSHOT ON;

Size and Maintain the Databases/Schemas

Repository

The following considerations should be taken into account when sizing the repository database/schema:

  • The repository contains a small volume of information if you exclude the execution log. It can be sized by default from 100 to 200Mb depending on the number of models and number of model versions stored in it.

  • The execution log is the larger part of the data stored in the repository. The volume of data generated in the log depends on the number of job executions executed daily. It is recommended to monitor the job execution and resize the repository according to the log history you want to preserve.

  • To maintain the logs at a reasonable volume, it is recommended to purge them regularly. You should configure reasonable retention policies in the models for your data and schedule regular data purges in the data locations.

Data Location

The following considerations should be taken into account when sizing the data location databases/schema:

  • For each entity of the deployed model, the data location contains several tables that correspond to the data in the various stages of the certification process. Some of these tables also contain the history of the previous iterations of the process.

  • The volume required in the data location depends on the following factors:

    • Number of entities

    • Number of source records pushed for these entities

    • Number of new or updated source records pushed for these entities

A recommended original sizing is to add the source data volume pushed for each entity by all publishers plus one data authoring (the overall input) and multiply it by a factor of 10. It is recommended after the original sizing to monitor the size of the data location in the normal course of operations and adjust the sizing appropriately.

The same sizing considerations apply to both the data and temporary tablespaces/filegroups in the case of the data locations, as the database engine is used for most of the processing effort in the certification process.

Data Retention Policies can be created to define the volume of data to retain in the data locations, and Data Purges can be scheduled to trigger the pruning of unnecessary data.