Replicator Rdbms to Snowflake

Tools

Description

This tool replicates data to Snowflake, optionally using Snowpipe.

You can also use this tool to replicate a list of changes from Change Data Capture tables to a target database, for later analysis.

Parameters

Parameter Required Default Description

Name

Yes

Replicator Rdbms to Snowflake

Friendly display name for the tool.

Table 1. Switches
Parameter Required Default Description

Drop Tables

Off

Enabling this parameter drops the target tables before processing.

Tables to be dropped are calculated from the list of replicated source tables.

A drop is performed on the target for each of those tables.

Create Pk

Off

When this parameter is enabled, the replicator replicates the primary key defined on the source tables in the target tables.

Drop Pk

Off

When this parameter is enabled, the replicator drops the primary keys of target tables before processing.

Primary keys to be dropped are calculated from the list of source tables replicated.

A drop is performed on the target for all the corresponding primary keys defined on the source.

Create Fk

Off

When this parameter is enabled, the replicator tries to replicate foreign keys defined on source tables in target tables.

Foreign keys are created after loading the data to make sure data is loaded in primary tables before creating them.

Drop Fk

Off

When this parameter is enabled, the replicator drops the foreign keys of target tables before processing.

Foreign keys to be dropped are calculated from the list of source tables replicated.

A drop is performed on the target for all corresponding foreign keys defined on the source.

Enable Constraints

Off

When this parameter is enabled, the replicator tries to enable the primary foreign key constraints on target tables. This is performed at the end of the replication, that is, after replicating the structure and data.

This can be used together with the Disable Constraints parameter if you want to disable primary and foreign key constraints before processing and then re-enable them at the end. This can be useful to avoid unicity checks while loading tables.

Disable Constraints

Off

When this parameter is enabled, the replicator tries to disable primary and foreign key constraints on target tables.

This is performed at the start of replication, that is, before replicating the structure and data.

This can be used together with the Enable Constraints parameter if you want to disable primary and foreign key constraints before processing and then re-enable them at the end. This can be useful to avoid unicity checks while loading tables.

Add Comments

Off

Enables adding comments.

Delete Tables

Off

When this parameter is enabled, the replicator removes the target table data before processing.

Truncate Tables

Off

When this parameter is enabled, the replicator truncates the target tables before processing.

Normalize Column Names

Off

When this parameter is enabled, the column names are normalized.

Use CDC

Off

When this parameter is enabled, the corresponding CDC table is used as the source. This requires that the CDC be set up on the source table before.

Drop Snowpipes

Off

Allows dropping existing snowpipes.

Synchronize Deletions From CDC

Off

Only applies if the Use CDC and Load Data parameters are enabled.

Enable this parameter if the records deleted on the source table need to be deleted on the target table.

Create Stages

On

Enabling this parameter creates stages before the replication.

Drop Stages

On

Enabling this parameter drops the stages at the end of replication.

Empty Field As Null

Off

Enabling this parameter indicates makes the Snowflake loader treat empty values as NULL values when loading data.

Table 2. Other fields
Parameter Required Default Description

Create Tables

false

When this parameter is set to true, the replicator tries to create the tables on the target. The table structure is retrieved from the source metadata. This does not imply replicating data of the tables.

When this parameter is set to for historization, the replicator creates two additional columns CDC_TIMESTAMP and CDC_CHANGE_TYPE in the target table that can be later loaded using historize from Cdc option in the Load Data parameter.

Load Data

Yes

false

This parameter determines whether the source data should be replicated on the target. Possible options are:

  • false: source data is not replicated.

  • true: source data is replicated on the target.

  • synchronize: Target data is synchronized with the source data.

    The goal of this mode is to keep a target environment synchronized with a source environment.

    All the source data is first loaded into temporary work tables on target, and target tables are then synchronized with those. Existing data is updated and new data is inserted.

  • historize from cdc: latest modifications from the CDC table are inserted into the target table with two technical columns CDC_TIMESTAMP (which represents the date of the modification) and CDC_CHANGE_TYPE (which represents the change type Delete/Insert/Update).

Load Method

Yes

copy

Specifies the method to use for loading data into Snowflake. Possible options are:

  • copy: the COPY statement is used to load the temporary files into Snowflake.

    The data is first extracted from the source table into a temporary delimited file.

    The file is then sent to another supported storage provider, such as Snowflake, and loaded into the target table from a stage.

  • generic: the data is sent through a JDBC connection and inserted into the target table with a standard INSERT statement.

  • snowpipe: use Snowpipe to load data from files as soon as they are available in a stage, rather than manually executing COPY statements on a schedule to load larger batches.

  • snowpipe streaming: use Snowpipe Streaming to load rows as soon as they are available in a stage, rather than manually executing COPY statements on a schedule to load larger batches.

  • storage: the data is extracted from the source table into a temporary delimited file, then sent to another supported storage provider, such as Snowflake.

Query

Yes

$SOURCE//(datastore | file | record)

Path to the metadata nodes on which to perform the replication.

The value to specify depends on what is the source Metadata, and the syntax to use is XPath. It must start with $SOURCE. This corresponds to the metadata dropped on the replicator.

The following examples show some of the most commonly used types of nodes:

  • If the metadata dropped is a database node: $SOURCE/schema/datastore

  • If the metadata dropped is a schema / folder / hierarchical file node: $SOURCE//(datastore | file | record)

  • If the metadata dropped is a table or file node: $SOURCE

$SOURCE corresponds to the metadata link dropped on the template.

Create Snowpipes

Yes

false

Allows initializing Snowpipes so that they can be later used by the Snowpipe load method. Possible options are:

  • true: one Snowpipe per target table is created. The Snowpipe definition is based on the source table structure.

  • false: no Snowpipes are created.

  • for historization: one Snowpipe per target table is created. The Snowpipe definition contains two technical columns CDC_CHANGE_TYPE and CDC_TIMESTAMP.

    This option should be used if you want later use the Snowpipe to load historized CDC data, that is, when the Load Data parameter is set to historize from cdc.

Snowpipe Acknowledge Mode

Yes

enabled

Defines how the replicator checks the status of the files loaded through Snowpipe when using the standaloneIngestOperation. Possible options are:

  • enabled: the replicator checks that the files have been properly ingested before proceeding.

    This implies waiting and performing status checks with the Snowpipe APIs.

    If at least one file has not been correctly ingested the replicator throws an error listing the files that have failed. The files that were successfully ingested are automatically deleted from the Snowflake stage.

  • disabled: the replicator proceeds without checking the files' ingestion status. The files that were sent to the Snowflake stages are not deleted automatically.

Force Table Case

Yes

false

Determines how the table name case is handled by the replicator. Possible options are:

  • false: table name case is preserved as on the source metadata.

  • lower: table name is forced to lowercase on the target.

  • upper: table name is forced to uppercase on the target.

Force Column Case

Yes

false

Determines how the column name case is handled by the replicator. Possible options are:

  • false: column name case is preserved as on the source metadata.

  • lower: column name is forced to lowercase on the target.

  • upper: column name is forced to uppercase on the target.

Force Constraint Case

Yes

false

Determines how constraint name case is handled by the replicator. Possible options are:

  • false: constraint name case is preserved as on the source metadata.

  • lower: constraint name is forced to lowercase on the target.

  • upper: constraint name is forced to uppercase on the target.

Create Table Syntax

Create table

Syntax used to create table statements on the target.

These are the keywords added at the beginning of the statement. They can be useful to add hints to create table statements for instance.

Select Column Pattern

[columnName]

Offers the possibility to change the expression used for columns in the select query performed to load source data into the target.

Insert Values Pattern

[value]

Only applies if the Load Method parameter is set to generic.

Applies a transformation on the insert into the target table.

Column Transformation Rules

Rules used to change the name of source columns in real time based on matching expressions.

If the expression matches while computing the column name, the replacement is performed. Multiple column transformation rules can be specified by separating them with a ';' character.

The rules should be specified in the XPath language using the following syntax: <matching_expression>/<final_column_name>

Changing column name based on its name
tech:name()='CUS_LAST_NAME'/'CUSTOMER_LAST_NAME' ⇒all columns named 'CUS_LAST_NAME' on the source will be called 'CUSTOMER_LAST_NAME' on the target
Changing column name based on its datatype
@TYPE_NAME = 'VARCHAR'/concat(tech:name(),'_V') ⇒ when a column of VARCHAR datatype is encountered, the '_V' suffix is added at the end of the column name
Changing multiple column names based on its datatypes
@TYPE_NAME = 'VARCHAR'/concat(tech:name(),'_V');@TYPE_NAME = 'INTEGER'/concat(tech:name(),'_I') ⇒ same as the previous example, except that we have defined two transformation rules here, separated with ";"
Adding a prefix on all columns
1=1/concat('REPL_',tech:name()) ⇒ '1=1' expression is always true so matching for all columns

Ignored Tables

Comma-separated list of tables to ignore from the replication. The names must be fully qualified table names.

For example, for Microsoft SQL Server:

Field Separator

ASCII field separator used to separate fields in exported temporary files.

String Delimiter

ASCII character used to delimit strings in the exported file.

Null Value Replacement

\N

String used to convert to SQL Null values. The generated snowflake query escapes \ characters automatically.

Timestamp Format

yyyy-MM-dd HH:mm:ss.SSS

Format used for timestamp columns in the exported file, and when creating Snowflake stages.

Target Object Prefix

Defines a prefix to add to target object names.

The prefix is added to table names and constraint names (PK, AK, FK, CK, Indexes).

Temporary Object Prefix

S_

Defines a prefix to be added to temporary object names. This can be used when loading data in the synchronize mode.

Split Parallel Degree

0

Number of subsets into which the data in the source tables should be split and loaded in parallel based on the values in a column selected by expression in the Split By Pattern parameter.

Split By Pattern

{(ref:columns()[tech:tag()='SPLIT_BY'])[1]/@name}

Pattern used to select a column that is used to divide data in the source table into subsets.

The selected column must be of numeric type. By default, it is the column that you have selected by positioning SPLIT_BY tag in the metadata.

If you do not have a valid numeric column, you can try to use an expression that returns a unique numeric value for each row.

Transaction Type

AUTOCOMMIT

Type of transaction. Possible values are:

  • AUTOCOMMIT: Transactions are committed as soon as they execute successfully.

  • NOCOMMIT: Data is inserted into the target table in one transaction that is committed at the end of replication once all the queries are successfully executed.

Batch Size

30000

JDBC Batch size (number of lines) used when loading data into target tables.

Larger values provide better performance but consume more memory in the runtime as the data is stored in the memory before being loaded.

Fetch Size

30000

JDBC Fetch size (number of lines) used when querying source database.

Larger values provide better performance but consume more memory in the runtime as the data is stored in the memory before being sent to the target.

Cdc Subscriber

defaultSubscriber

Subscriber that should be used by the replication process, when the Use Cdc parameter is enabled.

Cdc Max Timestamp

Limits the CDC data being consumed up to a certain time.

Snowpipe Streaming Channel Name

${/CORE_SESSION_NAME}$

When Load Method is set to snowpipe streaming, sets the channel name that streams source data to target Snowflake tables. The default value uses the process name.

If you want to run multiple instances of your process in parallel, use a unique ID for the channel name, such as the session ID (${/CORE_SESSION_ID]$).

Snowpipe Report Timeout

150000

When using Snowpipe, how much time to wait for an operation to finish before stopping with an error. The value is defined in milliseconds, and the default is two and a half minutes.