Getting started with the Snowflake database

This page contains information to help you get started with Snowflake in Semarchy xDI.

Connect to your data

The database structure can be entirely reversed in metadata and then used in mappings and processes to design and adapt the business rules to meet the user’s requirements.

You can refer to Connect to your data page which explains the procedure.

Below is an example of reversed Snowflake metadata.

getting started snowflake metadata overview

Temporary storage

Overview

To optimize data loading into Snowflake, Semarchy xDI uses a temporary storage location to store temporary files before loading them into Snowflake.

When necessary, source data is first extracted to temporary files which are sent into this temporary storage location, before being loaded into Snowflake using the dedicated Snowflake loaders.

Currently, Semarchy xDI supports the following storage locations:

Internal Snowflake Stage
External Storage
- Amazon S3
- Google Cloud
- Microsoft Azure Storage

Storage definition

The storage information is defined in the Snowflake metadata.

On the Snowflake metadata server node:

Select the Storage tab.
Define the Storage Method property.
Define the External Storage property if you are using external storage.
(Optional) Change the Compression Behavior.

getting started snowflake metadata storage overview

The following properties are available:

Property

Description

Storage Method

Storage method used as default in mappings loading data into Snowflake.

internalStage: temporary files will be stored inside a Snowflake internal stage.
externalStorage: temporary files will be stored in an external storage system which must be defined using 'External Storage' attribute.

This will be the default method for all schemas and tables of this metadata.

External Storage

Metadata link of the external storage container to be used when using external storage method. Choose in the list the external storage, or drag and drop it inside the property.

This property supports:

An Amazon S3 Folder
A Google Cloud Storage bucket
A Microsoft Azure Storage Container

Note that this will be the default for all the schemas and tables of this metadata.

Storage Access Method

Defines the method that should be used to access the storage. Possible options are:

Credentials: The external storage is accessed by defining the credentials directly on the create table query.
Storage Integration: The external storage is accessed by using a storage integration defined on the Snowflake side.

The storage integration to use must be defined with the dedicated parameter.

When this parameter is empty, it uses the value of the parent. On the server node, as there is no parent, it uses the Credentials mode as default, for compatibility purposes.

Storage Integration

Only available when Storage Access Method is set to Storage Integration.

Free text field that allows defining the name of a snowflake integration storage.

Compression Type

Defines the compression type used when loading files (source files or temporarily generated files) into this database.

Compression Strategy

Defines the compression behavior when loading files (source files or temporarily generated files) into this database.

compress at execution: use this option when you want Semarchy xDI to compress files which are sent to Snowflake during flows automatically during execution. Template will use correct Snowflake options to specify compression type, compress the files using a Gzip Process Action, and then send compressed file to Snowflake.
delegate compression to driver: use this option when you want the compression of files which are sent to Snowflake during flows to be performed automatically by the Snowflake driver, using the 'AUTO_COMPRESS' option supported on the 'PUT' statement. Note that this is supported only when using Snowflake internal stage (see 'Storage Method' parameter')
assume already compressed: use this option when source files have already been compressed before the execution of mappings. Templates will use the correct Snowflake options to specify compression type and simply send them without compressing them as they are already compressed.

Those attributes can be overridden per schema and per table: for this, go on the desired schema or table node, then go inside the 'Storage' tab where you’ll find the same attributes, which will override the value set on the parent nodes.

Those attributes can also be overridden directly in mappings through parameters on Load Templates, allowing to manually define a different behavior than the one specified in metadata.

'External Storage' is not linked to the 'Storage Method'. You can decide to use 'internalStage' storage method but provide an External Storage link for specific cases where you’ll override the default storage method.

Definition of an external storage

As indicated in the above properties documentation, you can define the external storage location by selecting the related location, or by dragging and dropping it directly on the property.

To define an external storage, the prerequisite is to have the corresponding metadata existing in your workspace.

Example through a drag and drop:

getting started snowflake metadata storage external

Refer to Getting started with the Microsoft Azure component and Amazon component to learn how to create this metadata.

Override of storage configuration

The storage configuration that is defined on the server node can be overridden on schema, on table, and directly in mappings.

This offers the ability to have a common default behavior and override it when it is required for some specific developments or use cases.

Example of an override on a schema node:

getting started snowflake metadata storage override

Example of an override in a mapping:

getting started snowflake mapping storage override

VARIANT data type

Overview

The VARIANT data type allows storing structured data such as JSON, AVRO, or XML.

Semarchy xDI supports reading and writing data inside columns with this data type.

Read data in a VARIANT

To read and manipulate structured data contained in a VARIANT column, use a Deserializer. See Deserialize source data.

Write data in a VARIANT

In a Mapping, map any source data to the target Snowflake VARIANT column. Depending on the source data type, you may need to use Snowflake’s conversion functions to convert the source data to a format matching Snowflake’s VARIANT data type. See Create mappings.

Additionally, Semarchy xDI also provides a feature to seamlessly write JSON or XML data in a VARIANT column. To do this:

Create a JSON or XML Metadata:
1. Define the expected data structure. See Metadata and reverse engineering.
2. Right-click the JSON or XML node representing the data structure add choose New > Property Field.
3. Give a Name to the property field and select stringContent in the Property attribute. See JSON property fields and XML property fields
Create a mapping:
1. Add the JSON or XML data structure as the source.
2. Add the Snowflake table as the target.
3. Map the source stringContent property to the target Snowflake VARIANT column.
  
  In addition to mapping the stringContent property to the VARIANT column, you can also map any of the JSON or XML fields to the other columns of the Snowflake table.
Run the Mapping.
The source data is written in the target Snowflake table.

Create your first mappings

Your metadata is ready and your tables reverse engineered, you can now create your first mappings.

The Snowflake technology can be used like any other database in Semarchy xDI.

Drag and drop your sources and targets, map the columns as usual, and configure the templates accordingly to your requirements.

Loading data from a database into Snowflake

getting started snowflake mapping example 1

Loading data from a Delimited File into Snowflake

getting started snowflake mapping example 2

Loading data from Snowflake into another database

getting started snowflake mapping example 3

Performing Reject detection while loading a Snowflake Table

getting started snowflake mapping example 4

Loading structured data contained in a VARIANT field into another database

getting started snowflake mapping example 5 semi structured

The warnings on the target fields are displayed because of the hierarchical structure that is currently unknown by the expression parser. They can be ignored.

Replicating a source database into Snowflake

getting started snowflake process example 1

Fetching results using the JDBC driver

As of Snowflake JDBC driver version 3.11.0, Snowflake returns query results in the Apache Arrow data format by default. In order to properly fetch results when using the Snowflake JDBC driver, you need to either change the result format to JSON, or make sure the Apache Arrow library is working.

Using JSON results

To have Snowflake return data in the JSON format, open the database connection in Designer, and add a new property. Give the property the following name and value:

Name: jdbc_query_result_format
Value: json

getting started snowflake results jsonproperty

Save the connection information. Snowflake now returns all its results in JSON format.

Using the Arrow library

Apache Arrow has not yet been updated to work with Java versions 16 and higher. On those platforms, Apache Arrow will fail with an exception. You need to set up a workaround to make it work.

Download the Arrow library from the Apache Arrow website.^
Add the library to the Snowflake module.
Add the Java command line option --add-opens=java.base/java.nio=ALL-UNNAMED in one of the following places:
1. For the Designer, in the INI file with the startup settings.
2. For the Runtime, in the XDI_RUNTIME_OPTS environment variable.
3. A command line script used to launch the applications.

For more information, refer to this article on the Snowflake community website.

Sample project

The Snowflake component is distributed with sample projects that contain various examples and files. Use these projects to better understand how the component works, and to get a head start on implementing it in your projects.

Refer to Install components in Semarchy xDI Designer to learn about importing sample projects.