Getting started with Apache Parquet

This article explains how to reverse-engineer Apache Parquet files in Semarchy xDI, and use them in mappings to send and receive data. Apache Parquet files are open-source files used for storing data in columnar formats.

Create a metadata

To create an Apache Parquet metadata:

  1. Right-click a folder in your project and then select New > Metadata.

  2. In the New Metadata wizard, select Parquet and then click Next.

    getting started metadata

  3. Name the metadata and click Next.

  4. Select the installed module and click Finish.

The metadata is created with a root Schemas node. This node will contain the Apache Parquet files you will reverse-engineer.

getting started root node

Reverse-engineer Apache Parquet files

To reverse-engineer an Apache Parquet file:

  1. Right-click the Schemas node and select New > Schema.

    getting started schema

  2. Select the newly created Schema node. Set the File Path property to the path of the Apache Parquet file.

  3. Right-click the Schema node and select Action > Reverse.

The Apache Parquet file is reverse-engineered as a schema node and is ready to use into mappings.

Create mappings

Drag and drop the Apache Parquet files (the schema nodes) defined in your metadata into mappings to integrate data from and to these Apache Parquet files.

Write data to Parquet files

The following example shows how to integrate data from an Apache Parquet file to a database table.

getting started mapping send

Read data from Parquet files

The following example shows how to integrate data from a database table to an Apache Parquet file.

getting started mapping receive