Integrate with Purview
This page explains how to integrate Semarchy xDM with Microsoft Purview.
Purview is a data governance service and data catalog on Microsoft Azure, designed to store and manage both physical and logical metadata assets.
With xDM’s Purview connector, metadata from Semarchy data hubs can be synchronized with Purview, linking logical model assets (i.e., entities and attributes) and their corresponding physical assets (i.e., tables and columns), and thereby enabling end-to-end data lineage.
Overview
The Purview connector converts Semarchy metadata into Purview assets as follows:
-
xDM instances, data locations, entities, attributes, and relationships are converted into Purview entities using Semarchy-specific asset types.
-
Entities and attributes are related to the corresponding physical tables (
GD_
,MD_
, etc.) and columns, which are previously scanned by Purview’s built-in scanners for Microsoft SQL Server, PostgreSQL, or Oracle. These tables and columns are enriched with information from Semarchy. -
A process is created to associate the physical tables and depict the certification process for each entity.
Configure Purview
Configure a collection
The Purview connector creates xDM assets in a collection. It is recommended to create a dedicated collection for these assets, and take note of the collection’s name.
Configure the REST API
The Purview connector leverages the REST API to create and update assets in Purview.
To enable this connectivity, follow the instructions in the Purview documentation to configure and use the REST API.
While configuring the REST API, make sure to collect the following details:
-
The tenant ID: sign in to the Azure portal, browse
, and scroll down to the Tenant ID section where you can find your tenant ID. -
The name of the Purview account (e.g., SemarchyDemoPurview).
-
The Application (client) ID and client secret of the application registered in Microsoft Entra ID and assigned to a data plane role (Data Curator) for the Purview account.
Configure Semarchy xDM
The Purview connector retrieves metadata from Semarchy xDM through a REST API, and authenticates using an API key.
In Semarchy xDM, create an API key with the Application Management and Repository Information privileges.
Make sure that you have the following information to connect your Semarchy instance:
-
The API key.
-
The xDM instance’s URL, typically
http://<host>:<port>/semarchy
. From xDM’s Welcome page, navigate to to find the base URL.
Scan the data location schema
During execution, the Purview connector searches for the physical assets (tables, columns) within the data locations cataloged in Purview to correlate them with the logical assets.
Prior to initiating the Purview connector, you must scan the xDM data location schemas using Purview’s built-in scanners and harvest metadata on the physical assets (table and column definitions).
To scan a data location schema:
-
In Purview, register a new data source pointing to the data location schema.
-
Scan this data source.
After the scan, the table and columns are visible in Purview. -
Search for a
GD_
table corresponding to your data location to confirm that the scan was successful. -
Navigate to the schema (the container) hosting the tables, and note the Qualified Name of that schema. For example:
postgresql://servers/176.159.263.21:15432/dbs/postgres/schemas/semarchy_product_retail_mdm
Repeat the previous steps for each data location deployed for the Semarchy xDM instance.
Each data source technology has specific configuration steps outlined in the official Purview documentation. For example, some sources require storing the database password in Azure Key Vault. |
Deploy the Purview connector
The Purview connector retrieves metadata from deployed model editions in an xDM instance, creates logical assets in the Purview collection that you created, and links them with the previously scanned physical assets.
The Purview connector is packaged as a Docker image, which can be deployed and executed as an Azure Functions application.
Prepare the configuration file
The Azure Functions app is configured during deployment using a JSON configuration file.
To prepare the configuration file:
-
Download the sample Azure Functions app configuration file (
create-azure-function-app-settings.json
). -
Edit this file and set the configuration properties as indicated below.
Property | Value | ||
---|---|---|---|
|
The Purview account name you retreived when configuring the Purview REST API (e.g., |
||
|
The tenant ID you retreived when configuring the Purview REST API. |
||
|
The application (client) ID created when configuring the Purview REST API. |
||
|
The client secret created when configuring the Purview REST API. |
||
|
The qualified name of the Purview entity representing the container (database schema) hosting the tables of the data location named Create one property for each Semarchy xDM data location to synchronize. Example
To synchronize two data locations named
|
||
|
The name of the collection you configured for the xDM assets (e.g., |
||
|
The Purview connector creates Semarchy-specific assets types before the Semarchy assets. Set to |
||
|
The Purview connector updates the tables and columns with descriptions based on Semarchy metadata. Set this property to |
||
|
Set this property to |
||
|
Cron schedule for the Purview connector execution (e.g., |
||
|
The Semarchy instance URL, typically |
||
|
The API key to connect the Semarchy instance. |
||
|
Set to
|
{
"xdmPurviewAccount": "SemarchyDemoPurview",
"xdmPurviewTenantId": "758077ec-66b9-441c-9537-b0939cb3dfe8",
"xdmPurviewClientId": "xxxxxxxxx",
"xdmPurviewClientSecret": "xxxxxxxxx",
"xdmDataLocationPurviewQualifiedName_CustomerB2CDemo": "postgresql://servers/176.159.263.21:15432/dbs/postgres/schemas/semarchy_customer_b2c_mdm",
"xdmDataLocationPurviewQualifiedName_ProductRetailDemo": "postgresql://servers/176.159.263.21:15432/dbs/postgres/schemas/semarchy_product_retail_mdm",
"xdmPurviewCollectionName": "xDM Assets",
"xdmPurviewSkipTypes": "true",
"xdmPurviewSkipPhysicalAssetsUpdate": "true",
"xdmPurviewDryRun": "true",
"xdmPurviewConnectorSchedule": "0 * * * * *",
"xdmInstanceUrl": "http://176.159.263.21:10081/semarchy",
"xdmInstanceApiKey": "xxxxxxxxx",
"scheduledXdmPurviewConnectorDisabled": "true",
"FUNCTIONS_WORKER_RUNTIME": "java"
}
Deploy the Azure Function
-
Download the Azure function creation (
create-azure-function-app.sh
) script. -
Review and amend the script depending on your Azure and local environments.
-
Run the script to create all the resources required to run the connector.
This script creates or updates the following resources:
|
This script expects and uses the |
./azure-function-script.sh
--name <azure-function-name>
--storage-account <storage-account-name>
--resource-group <resource-group-name>
--docker-image-tag <docker-image-tag>
Main script parameters:
-
--name
(required): name of the Azure Functions app to deploy. This name must be unique (it is used for the endpoint URL:https://<name>.azurewebsites.net/api/
) -
--storage-account
(required): name of the Azure storage account to create or update (it must be unique, and contain 3 to 24 characters numbers and lowercase letters only). -
--resource-group
(required): name of the Azure resource group into which the function is created. If it does not exist, the script creates the group, provided that the account has sufficient privileges. -
--docker-image-tag
: Docker image tag to use for the connector (defaults tolatest
). The tag must correspond to the version of the Semarchy instance from which the connector will harvest metadata.
To see all available parameters, run |
./create-azure-function-app.sh
--name xdm-purview-connector
--storage-account xdmstorageaccount
--resource-group xdm-group
--docker-image-tag 2023.2.0
For macOS users
The script uses the
|
Run the Purview connector Manually
Once configured with a Cron schedule, the Azure Function runs the Purview connector automatically.
You can also run operations using the REST endpoints exposed by the function.
To use the Azure Function endpoints:
-
Retrieve the endpoint URL from the Azure Function.
The code
query parameter may be passed as a Request Header namedx-functions-key
. -
Use a REST client to perform the request.
-
Review the Semarchy assets created in the collection
To avoid execution conflicts between your manual operations and the scheduled executions of the Purview connector, make sure to change the Azure Functions configuration and set the scheduledXdmPurviewConnectorDisabled property to false before executing Purview connector manual operations.
|
POST
operation
The POST
operation runs the Purview connector manually. It accepts the properties listed below in JSON format within the request body.
Name | Description | ||
---|---|---|---|
|
Set this property to |
||
|
Set this property to |
||
|
The Purview connector updates the tables and columns with descriptions based on Semarchy metadata. Set this property to |
||
|
Array of data location names to synchronize. Use this option to selectively synchronize a subset of the data location. This list defaults to all Semarchy data locations in the
|
||
|
The name of the collection you configured to receive the Semarchy assets. This parameter defaults to the corresponding property set in the function application configuration. |
POST
operation sample request body{
"dryRun": true,
"skipTypes": false,
"skipPhysicalAssetsUpdate": false,
"dataLocationNames": ["CustomerB2CDemo"],
"purviewCollectionName": "{mdm-regular-product-name}"
}
The process duration may vary, typically ranging from 5 to 10 minutes, depending on chosen options and the size or number of data locations. |
DELETE
operation
The DELETE
operation deletes all assets created by the Purview connector. It accepts the properties listed below in JSON format within the request body.
Name | Description | ||
---|---|---|---|
|
Executes a dry run if set to |
||
|
If set to
|
POST
operation sample request body{
"dryRun": true,
"deletePhysicalAssets": false
}
DELETE operations cannot be undone. |