Filter Assets

Semarchy xDG harvesters supports filtering the assets to harvest using patterns.

You can filter databases, schemas, tables, views, etc, using regular expressions on the asset names you want to allow or deny in the harvesting process.

This feature only applies to specific sources. The source page indicates whether it is supported.

Sample Recipe

The following sample recipe configures harvesting a subset of the assets using regular expression patterns.

Example 1. Filtering sample recipe.
source:
    type: postgres # A source that support assets filtering.
    config:
        # Connection parameters for the source
        # ...
        # Harvest all databases but temp
        database_pattern: "{'allow': ['.*'],
                            'deny': ['temp'],
                            'ignoreCase': True}"

        # Harvest only the customerB2C schema (case sensitive)
        schema_pattern: "{'allow': ['customerB2C'],
                           'deny': ['.*'],
                           'ignoreCase': False}"

        # Harvest all table but those prefixed with MTA_ (case sensitive)
        table_pattern: "{'allow': ['.*'],
                         'deny': ['semarchy.public.customerB2C.MTA_.*'],
                         'ignoreCase': False}"
sink:
  # sink configuration

Configure Assets Filters

The following source parameters configure assets filtering using patterns:

database_pattern

Lists of regular expressions patterns to define the databases to include (allow) or exclude (deny) in the harvesting process.
Default value is {'allow': ['.*'], 'deny': [], 'ignoreCase': True}.
The ìgnoreCase option ignores case sensitivity during pattern matching. Note that database patterns are not used if the database is provided by another configuration parameter.

schema_pattern

Lists of regular expressions patterns to define the schemas to include (allow) or exclude (deny) in the harvesting process.
The resular expression is only for the schema name. For example, to match all tables in the customerB2C schema, use the customerB2C regex.
Default value is {'allow': ['.*'], 'deny': [], 'ignoreCase': True}.
The ìgnoreCase option ignores case sensitivity during pattern matching.

table_pattern

Lists of regular expressions patterns to define the tables to include (allow) or exclude (deny) in the harvesting process.
The resular expression should match the full table name (database.schema.table). For example, to match all tables in the customerB2C schema of public schema the semarchy database, use the semarchy.public.customerB2C.* regex.
Default value is {'allow': ['.*'], 'deny': [], 'ignoreCase': True}.
The ìgnoreCase option ignores case sensitivity during pattern matching.

view_pattern

Lists of regular expressions patterns to define the views to include (allow) or exclude (deny) in the harvesting process.
The resular expression should match the full view name (database.schema.view). For example, to match all views in the customerB2C schema of public schema the semarchy database, use the semarchy.public.customerB2C.* regex.
Default value is {'allow': ['.*'], 'deny': [], 'ignoreCase': True}.
The ìgnoreCase option ignores case sensitivity during pattern matching. Note that this option defaults to the table_pattern value, if set.