xDM Discovery tables reference

xDM Discovery stores profiling metrics in the Semarchy xDM repository, using the tables described in this document. Leverage this reference information for adapting charts or creating custom visualizations.

PRF_PROFILING

This table contains information about the profiling status of the tables.

Column Name Description

DATA_SOURCE

text

Name of the datasource.

TABLE_NAME

text

Name of the table.

STATUS

text

Profiling status of the table in the profiling process.

  • DONE: ready (the latest profiling process ran successfully).

  • IN PROGRESS: profiling (a profiling process is running for the table).

  • QUEUED: queued (a profiling process is queued for the table).

  • ERROR: failed (the latest profiling process failed).

  • CANCELING: temporary status while the engine processes a user cancellation.

  • CANCELED: canceled (the latest profiling process was stopped by the user).

  • NONE: corresponds to a profile that was never executed.

PROGRESS

number

Progress of a running profiling process (1…​ 100).

ERROR_MESSAGE

text

Error message when the profiling process has failed.

QUEUED_DATE

datetime

Timestamp when the profiling process was queued.

START_DATE

datetime

Timestamp when the profiling process was started.

END_DATE

datetime

Timestamp when the profiling process was finished.

PRF_TABLE

This table contains profiling metrics for the tables.

Column Name Description

DATA_SOURCE

text

Name of the datasource.

TABLE_NAME

text

Name of the table.

NUM_ROWS

number

Number of rows in the tables.

UPDATE_DATE

datetime

Timestamp when the profile was updated.

PRF_COLUMN

This table contains profiling metrics for the columns.

These metrics are reported for the entire table. They are not limited by the table’s profiling option.
Column Name Description

DATA_SOURCE

text

Name of the datasource.

TABLE_NAME

text

Name of the table.

COLUMN_NAME

text

Name of the profiled column.

ORDINAL_POSITION

number

Ordinal position of the column in the table.

DATA_TYPE

text

Datatype of the column.

NUM_ROWS

number

Number of rows in the table.

LOW_VAL

text

Lowest value found in the column for all records.

HIGH_VAL

text

Highest value found in the column for all records.

AVG_VAL

text

Average value found for the column for all records.

MOST_FREQ_VAL

text

Most frequent value found for the column for all records.

MIN_LENGTH

number

Minimum value length found for the column for all records.

MAX_LENGTH

number

Maximum value length found for the column for all records.

AVG_LENGTH

number

Average value length found for the column for all records.

NUM_NULLS

number

Number of null value occurrences found in the column for all records.

NUM_DIST_VALS

number

Number of distinct occurrences (different values) found in the column for all records.

NUM_UNIQUE_VALS

number

Number of singular occurrences (unique values) found in the column for all records.

NUM_UNIQUE_COUNT

number

Number of records with values unique for the column.

By design, NUM_UNIQUE_VALS and NUM_UNIQUE_COUNT yield identical results. This duplication is intentional to mirror NUM_NON_UNIQUE_VALS and NUM_NON_UNIQUE_COUNT, which are two distinct metrics.

NUM_NON_UNIQUE_VALS

number

Number of repeated occurrences (non-unique values) found in the column for all records.

NUM_NON_UNIQUE_COUNT

number

Number of records with non-unique values for the column.

PERCENTILE

text

Not loaded yet. Reserved for future use.

Not all metrics are available for all datatypes.

The following table lists the metrics available depending on the column datatype.

Metric Applicable datatype

NUM_ROWS, NUM_NULLS

All

LOW_VAL, HIGH_VAL

text, numeric, date, and datetime

AVG_VAL

numeric, date, and datetime

MOST_FREQ_VAL

text, numeric, date, and datetime (boolean)

MIN_LENGTH, MAX_LENGTH, AVG_LENGTH

text, longtext, binary

NUM_DIST_VALS, NUM_UNIQUE_VALS, NUM_UNIQUE_COUNT, NUM_NON_UNIQUE_VALS, NUM_NON_UNIQUE_COUNT

text, numeric, date, and datetime (boolean)

PERCENTILE

Not available yet.

Value distribution

text, numeric, date, and (datetime)

Pattern distribution

text

PRF_DIST_VALS

This table contains the list of distinct values in the profiled columns.

Only a subset of the column values are reported. The distinct values reported for each column are limited by the table’s Maximum distinct values stored profiling option.
Column Name Description

DATA_SOURCE

text

Name of the datasource.

TABLE_NAME

text

Name of the table.

COLUMN_NAME

text

Name of the profiled column.

DIST_VAL

text

Column value.

NUM_ROWS

number

Number of occurrences of the value found in the table.

PRF_DIST_PATTERNS

This table contains the list of distinct patterns found in the values of the profiled columns.

Only a subset of the column value patterns are reported. The patterns reported for each column are limited by the table’s Maximum patterns stored profiling option.
Column Name Description

DATA_SOURCE

text

Name of the datasource.

TABLE_NAME

text

Name of the table.

COLUMN_NAME

text

Name of the profiled column.

DIST_PATTERN

text

Pattern found in the column values.

NUM_ROWS

number

Number of occurrences of the pattern in the column.

HIGH_VAL

text

Highest value matching this pattern in the column.

LOW_VAL

text

Lowest value matching this pattern in the column.