Introduction to Semarchy xDM Discovery

Semarchy xDM Discovery enables data architects and business users to gather metrics and profile any source data to prepare a data management initiative.

Profiling source data

xDM Discovery connects to datasources containing tables, profiles the data in the tables and persists these profiles in the Semarchy xDM Repository. Users can analyze these profiles using built-in dashboards available from xDM Discovery.

For advanced profiling, data architects can seed, from the built-in dashboards, fully customizable Semarchy xDM Dashboard applications.

Profiles metrics

xDM Discovery scans data tables and gathers in the profiles the following metrics:

  • Table metrics:

    • Number of records

  • Column metrics:

    • Lowest/highest values

    • Most frequent value

    • Uniqueness

    • Null count

    • Minimum, maximum and average value length

    • Distinct value number and distribution

    • Pattern distribution and lowest/highest values

Using these metrics, data architects have a clear assessment of the data quality. From this assessment, they can discuss and infer the structure and data rules to apply to the data hub entities. xDM Discovery helps in the design phase of the data hub, before and while implementing the model in Semarchy xDM.

Architecture overview

xDM Discovery uses the following components:

  • The Datasources are data stores containing the data to profile. A datasource may be directly connected to the application data to profile, or a staging location into which application data is loaded using an integration layer.

  • The Semarchy xDM repository stores the datasources' definitions and configuration as well as their profiling metrics.

  • The xDM Discovery Engine runs in the Semarchy xDM platform. It is a multi-threaded engine that runs profiling processes to gather datasources' metrics - the profiles - into the repository. This engine runs the profiling processes on demand or according to schedules.

  • The xDM Discovery User Interface allows data architects and business users to define and configure datasources, manage profiling processes, and browse profiles.

The xDM Discovery User Interface comes with built-in charts and dashboards to browse the profiles. These charts and dashboards can be forked into Semarchy xDM Dashboard for further customization. Alternately, you can connect a third-party visualization platform to create your own dashboards and visualizations on top of the profiles.