Semarchy Data Management

This source extracts metadata from Semarchy xDM Data Management

Overview

This source connects to the Semarchy Data Management server to retrieve the data location assets (entities, attributes, certification jobs, etc.), then to the underlying database hosting this data location to retrieve the underlying physical assets (tables, columns).

The underlying data location database is configured as an inner_source.

This source supports:

  • Metrics retrieval for data location assets. For example, the number of golden records for entities.

  • Stateful Ingestion for both data location and underlying physical assets.

  • Data Profiling to collect table, row, and column statistics for the underlying physical assets.

  • Set the Domain for the underlying physical assets.

  • Filter Assets for the underlying physical assets.

Sample Recipe

Example 1. Semarchy Data Management Source sample recipe.
source:
  type: semarchy-xdm
  config:
    xdm_base_url: 'http://localhost:8080'
    xdm_dataloc: CustomerB2CDemo
    xdm_api_key: <api-key>
    # xdm_api_username: <user>
    # xdm_api_password: <password>
    # disable_ssl_verification: true

    inner_source:
        # Configure the inner source depending on the underlying
        # data location database.
        type: postgres
        config:
          host_port: localhost:5432
          database: semarchyDemoDatabase
          username: username
          password: password
          include_tables: true
          include_views: true
          profiling:
            enabled: true
            profile_table_level_only: false
          schema_pattern:
            allow:
            - semarchy_customer_b2c_mdm

sink:
  # sink config

Parameters

The following table lists the source parameters.

Parameter

Description

xdm_base_url

Base URL of the Semarchy Data Management application server.

xdm_dataloc

Name of the data location to harvest.

xdm_api_key

Semarchy Data Management API Key parameter to connect to Data Management server. Use this API Key instead of the xdm_api_username and xdm_api_password parameters.

api-key authentication will require semarchyConnect & semarchyAdmin roles in xDM.

xdm_api_username

Semarchy Data Management user. You can also use the xdm_api_key parameter to connect to Semarchy Data Management

xdm_api_password

This user’s password.

disable_ssl_verification

Option to disable SSL verification.

inner_source

Database source configuration corresponding to the underlying database of the data location. This configuration is a regular PostgreSQL, Oracle or Microsoft SQL Server source configuration.

In this configuration, make sure to use the schema_pattern to limit the inner source harvesting to the tables located in the data location schema.

Supported Version

This harvester is compatible with xDM 2023.1.8 LTS and above, xDM 2024.1.0 LTS and above.