Harvesting

Semarchy xDG uses harvesting as the mechanism to collect metadata from sources and publish assets to Semarchy xDG.

Overview

Harvesting is performed by the harvesting client, provided as a Docker image. This harvesting clients run recipe files that contain the configuration to collect metadata and metrics and push them to Semarchy xDG.

Recipe files

Recipes are YAML files that contain:

  • the source configuration to collect metadata. This configuration differs depending on your source system. Refer to Sources for the configuration details for each supported source technology.

  • the sink configuration to publish this metadata as assets. Refer to Sinks for the configuration details for each type of sink.

  • optionnally, transformers to transform this metadata prior to publishing. Refer to Transformers for more information about transformers.

The following recipe harvests metadata from a PostgreSQL database, and sends this metadata to Semarchy xDG.

Example 1. Sample PostgreSQL recipe: postgresql.yaml file.
source: (1)
  type: postgres
  config:
    host_port: localhost:5432
    database: semarchyDemoDatabase
    username: username
    password: password

sink: (2)
  type: "datahub-rest"
  config:
    server: "https://<your-tenant-name>.semarchy.net/api/xdg/v1/catalog" (3)
    token: "<your-personal-access-token>" (4)
1 Source Configuration. Set the connection information to your source database in the config element.
2 Sink Configuration.
3 The server property must point to your Semarchy xDG site.
4 Create a personal access token and set it in the token property.

Run recipes

To run the above recipe, with the harvesting client configured, use the following command:

./xdg-harvest.sh -c postgresql.yaml

You can monitor your harvesting from the Semarchy xDG user interface.