Spark Component

Overview

Semarchy xDI allows to work with Spark to produce fully customized Data Flows.

Install the Spark Component

If you did not install it yet, install the Spark component in Designer by following the component installation process.

Third-party libraries

Due to licensing restrictions, we cannot distribute the Java libraries for this database. You must obtain them yourself.

After you have the necessary Spark libraries, create a Spark module for your project, and add the libraries to the module.

Supported Features

Spark 2

Feature	Description
LOAD	Data can be loaded to Spark: HBase, HDFS, Hive, RBDMS, Vertica, Parquet, Elasticsearch Data can also be loaded from Spark: Hive, RDBMS, Vertica, Parquet, Elasticsearch
INTEGRATE	Data can in integrated from Spark: HDFS , Hive, RDBMS
STAGE	Spark Metadata can be used as a stage (between loading and integration) to boost Hadoop Mappings. Spark Stage can be: SQL, Java

Feature

Description

LOAD

Data can be loaded to Spark: HBase, HDFS, Hive, RBDMS, Vertica, Parquet, Elasticsearch

Data can also be loaded from Spark: Hive, RDBMS, Vertica, Parquet, Elasticsearch

INTEGRATE

Data can in integrated from Spark: HDFS , Hive, RDBMS

STAGE

Spark Metadata can be used as a stage (between loading and integration) to boost Hadoop Mappings.

Spark Stage can be: SQL, Java