Replicator Rdbms to Snowflake

Tools

Description

Replicates data to Snowflake using Snowpipe. It also provides a possibility to replicate the list of changes registered in the CDC tables into a target database for later analysis.

Parameters

Parameter Mandatory Description

Name

True

Name given to the tool.

Drop Tables

False

Enabling this parameter drops the target tables before processing.

Tables to be dropped are calculated from the list of replicated source tables.

A drop is performed on the target for each of those tables.

Create Pk

False

When this parameter is enabled, the replicator replicates the primary key defined on the source tables in the target tables.

Drop Pk

False

When this parameter is enabled, the replicator drops the primary keys of target tables before processing.

Primary keys to be dropped are calculated from the list of source tables replicated.

A drop is performed on the target for all the corresponding primary keys defined on the source.

Create Fk

False

When this parameter is enabled, the replicator tries to replicate foreign keys defined on source tables in target tables.

Foreign keys are created after loading the data to make sure data is loaded in primary tables before creating them.

Drop Fk

False

When this parameter is enabled, the replicator drops the foreign keys of target tables before processing.

Foreign keys to be dropped are calculated from the list of source tables replicated.

A drop is performed on the target for all corresponding foreign keys defined on the source.

Enable Constraints

False

When this parameter is enabled, the replicator tries to enable the primary foreign key constraints on target tables. This is performed at the end of the replication, i.e., after replicating the structure and data.

This can be used together with the Disable Constraints parameter if you want to disable primary and foreign key constraints before processing and then re-enable them at the end. This can be useful to avoid unicity checks while loading tables.

Disable Constraints

False

When this parameter is enabled, the replicator tries to disable primary and foreign key constraints on target tables.

This is performed at the very beginning of the replication, i.e., before replicating the structure and data.

This can be used together with the Enable Constraints parameter if you want to disable primary and foreign keys constraints before processing and then re-enable them at the end. This can be useful to avoid unicity checks while loading tables.

Add Comments

False

Enables adding comments.

Delete Tables

False

When this parameter is enabled, the replicator removes the target table data before processing.

Truncate Tables

False

When this parameter is enabled, the replicator truncates the target tables before processing.

Normalize Column Names

False

When this parameter is enabled, the column names are normalized.

Use CDC

False

When this parameter is enabled, the corresponding CDC table is used as source. This requires that the CDC be set up on the source table before.

Drop Snowpipes

False

Allows dropping existing snowpipes.

Synchronize Deletions From CDC

False

Only applies if the Use CDC and Load Data parameters are enabled.

Enable this parameter if the records deleted on the source table need to be deleted on the target table.

Create Stages

False

Enabling this parameter creates stages before the replication.

Drop Stages

False

Enabling this parameter drops the stages at the end of replication.

Create Tables

False

When this parameter is set to true, the replicator tries to create the tables on the target. The table structure is retrieved from the source metadata. Note that this does not imply replicating data of the tables.

When this parameter is set to for historization, the replicator creates two additionnal columns CDC_TIMESTAMP and CDC_CHANGE_TYPE in the target table that can be later loaded using historize from Cdc option in the Load Data parameter.

Load Data

True

This parameter determines whether the source data should be replicated on the target. Possible options are:

  • false: Source data is not replicated.

  • true: Source data is replicated on the target.

  • synchronize: Target data is synchronized with the source data.

    The goal of this mode is to keep a target environment synchronized with a source environment.

    All the source data is first loaded into temporary work tables on target, and target tables are then synchronized with those. Existing data is updated and new data is inserted.

  • historize from cdc: Latest modifications from the CDC table are inserted into the target table with two technical columns CDC_TIMESTAMP (which represents the date of the modification) and CDC_CHANGE_TYPE (which represents the change type Delete/Insert/Update).

Load Method

True

Specifies the method to use for loading data into Snowflake. Possible options are:

  • copy: The COPY statement is used to load the temporary files into Snowflake.

    The data is first extracted from the source table into a temporary delimited file.

    The file is then sent to Snowflake (or another supported storage provider) and loaded into the target table from a stage.

  • generic: The data is sent through a jdbc connection and inserted into the target table with a standard INSERT statement.

  • storage: The data is first extracted from the source table into a temporary delimited file. The file is then sent to Snowflake (or another supported storage provider).

Query

True

Path to the metadata nodes on which to perform the replication.

The value to specify depends on what is the source Metadata, and the syntax to use is XPath. It must start with $SOURCE. This corresponds to the metadata dropped on the replicator.

The following examples illustrate some of the most commonly used types of nodes:

* If the metadata dropped is a database node: $SOURCE/schema/datastore

* If the metadata dropped is a schema / folder / hierarchical file node: $SOURCE//(datastore | file | record)

* If the metadata dropped is a table or file node: $SOURCE

NOTE: $SOURCE corresponds to the metadata link dropped on the template.

Create Snowpipes

True

Allows initializing Snowpipes so that they can be later used by the Snowpipe load method. Possible options are:

  • true: One Snowpipe per target table is created. The Snowpipe definition is based on the source table structure.

  • false: No Snowpipes are created.

  • for historization: One Snowpipe per target table is created. The Snowpipe definition contains two technical columns CDC_CHANGE_TYPE and CDC_TIMESTAMP.

    This option should be used if you want later use the Snowpipe to load historized CDC data, i.e., when the Load Data parameter is set to historize from cdc.

Snowpipe Acknowledge Mode

True

Defines how the replicator checks the status of the files loaded through Snowpipe when using the standaloneIngestOperation. Possible options are:

* enabled: The replicator checks that the files have been properly ingested before proceeding. + This implies waiting and performing status checks with the Snowpipe APIs. + If at least one file has not been correctly ingested the replicator throws an error listing the files that have failed. The files that were successfully ingested are automatically deleted from the Snowflake stage.

* disabled: The replicator proceeds without checking the files' ingestion status. The files that were sent to the Snowflake stages are not deleted automatically.

Force Table Case

True

Determines how the table name case is handled by the replicator. Possible options are:

  • false: Table name case is preserved as on the source metadata.

  • lower: Table name is forced to lowercase on the target.

  • upper: Table name is forced to uppercase on the target.

Force Column Case

True

Determines how the column name case is handled by the replicator. Possible options are:

  • false: Column name case is preserved as on the source metadata.

  • lower: Column name is forced to lowercase on the target.

  • upper: Column name is forced to uppercase on the target.

Force Constraint Case

True

Determines how constraint name case is handled by the replicator. Possible options are:

* false: Constraint name case is preserved as on the source metadata. * lower: Constraint name is forced to lowercase on the target. * upper: Constraint name is forced to uppercase on the target.

Create Table Syntax

False

Syntax used to create table statements on the target.

These are the keywords added at the beginning of the statement. They can be useful to add hints to create table statements for instance.

Select Column Pattern

False

Offers the possibility to change the expression used for columns in the select query performed to load source data into the target.

Insert Values Pattern

False

Only applies if the Load Method parameter is set to generic.

Applies a transformation on the insert into the target table.

Column Transformation Rules

False

Rules used to change the name of source columns on the fly based on matching expressions.

If the expression matches while computing the column name, the replacement is performed. Multiple column transformation rules can be specified by separating them with a ';' character.

The rules should be specified in the XPath language using the following syntax: <matching_expression>/<final_column_name>

Example 1. Changing column name based on its name
tech:name()='CUS_LAST_NAME'/'CUSTOMER_LAST_NAME' ⇒all columns named 'CUS_LAST_NAME' on the source will be called 'CUSTOMER_LAST_NAME' on the target
Example 2. Changing column name based on its datatype
@TYPE_NAME = 'VARCHAR'/concat(tech:name(),'_V') ⇒ when a column of VARCHAR datatype is encountered, the '_V' suffix is added at the end of the column name
Example 3. Changing multiple column names based on its datatypes
@TYPE_NAME = 'VARCHAR'/concat(tech:name(),'_V');@TYPE_NAME = 'INTEGER'/concat(tech:name(),'_I') ⇒ same as the previous example, except that we have defined two transformation rules here, separated with ";"
Example 4. Adding a prefix on all columns
1=1/concat('REPL_',tech:name()) ⇒ '1=1' expression will always be true so matching for all columns

Ignored Tables

False

Comma-separated list of tables to ignore from the replication. The names must be fully qualified table names.

For example, for Microsoft SQL Server: [CATALOG].[SCHEMA].[TABLE01],[CATALOG].[SCHEMA].[TABLE02]

Field Separator

False

ASCII field separator used to separate fields in exported temporary files.

String Delimiter

False

ASCII character used to delimit strings in the exported file.

Target Object Prefix

False

Defines a prefix to add to target object names.

The prefix is added to table names and constraint names (PK, AK, FK, CK, Indexes).

Temporary Object Prefix

False

Defines a prefix to be added to temporary object names. This can be used when loading data in the synchronize mode.

Split Parallel Degree

False

Number of subsets into which the data in the source tables should be split and loaded in parallel based on the values in a column selected by expression in the Split By Pattern parameter.

Split By Pattern

False

Pattern used to select a column that is used to divide data in the source table into subsets.

[NOTE] ==== The selected column must be of numeric type. By default, it is the column that you have selected by positioning SPLIT_BY tag in the metadata.

If you do not have a valid numeric column, you can try to use an expression that returns a unique numeric value for each row. ====

Transaction Type

False

Type of transaction. Possible values are:

  • AUTOCOMMIT: Transactions are committed as soon as they execute successfully.

  • NOCOMMIT: Data is inserted into the target table in one transaction that is committed at the end of replication once all the queries are successfully executed.

Batch Size

False

JDBC Batch size (number of lines) used when loading data into target tables.

Larger values provide better performance but consume more memory in the runtime as the data is stored in the memory before being loaded.

Fetch Size

False

JDBC Fetch size (number of lines) used when querying source database.

Larger values provide better performance but consume more memory in the runtime as the data is stored in the memory before being sent to the target.

Cdc Subscriber

False

Subscriber that should be used by the replication process, when the Use Cdc parameter is enabled.

Cdc Max Timestamp

False

Limits the CDC data being consumed up to a certain time.