Getting Started with Impala

Overview

This getting started gives some clues to start working with Impala

Connect to your Data

The first step, when you want to work with Impala in Semarchy xDM Data Integration, consists of creating and configuring the Impala Metadata.

Below, a quick overview of a fully configured Impala Metadata.

getting started impala metadata overview

Create the Metadata

Create first the Impala Metadata, as usual, by selecting the Impala technology in the Metadata Creation Wizard.

Choose a name for this Metadata and go to next step.

Configure the Metadata

Kerberos Security

When working with Kerberos secured Hadoop clusters, connections will be protected, and you’ll therefore need to specify the credentials and necessary information to perform the Kerberos connection.

If your cluster is NOT secured with kerberos, you can jump to next section.

If your cluster is secured with Kerberos, close the server Wizard popup (if it is displayed), and follow the steps below before trying to connect and reverse Impala objects.

  1. Create a new Kerberos Metadata (or use an existing one)

  2. Define inside the Kerberos Principal to use for Impala

  3. Drag and drop it in the Impala Metadata

  4. Rename the Metadata Link to 'KERBEROS'

getting started impala metadata kerberos link

Refer to Getting Started With Kerberos for more information.

Server Properties

You are now ready to configure the JDBC properties that will be used to connect to Impala.

We’re going to use the Server Wizard to configure everything.

Define the JDBC properties to connect to Impala and click then on Connect when it is done.

getting started impala metadata server props

If the Server Wizard popup is not displayed (if you closed it for configuring Kerberos, or any other reason), you can open it again with a right click > Actions > Launch Server Wizard on the server node.
When using Kerberos authentication, the user and password properties are not required, as the authentication is delegated to Kerberos.

Once the connection properties are set, Kerberos optionally configured, you can click on connect and reverse your schemas and tables, as usual.

Simply follow the wizard as for any other traditional database:

getting started impala metadata reverse

JDBC URL Syntax

Defining the correct JDBC URL and parameters might be delicate as it depends a lot on the Impala server and network configuration, if Kerberos is used, what Hadoop distribution is used, and more…​

We’ll take a little time here to give advice and examples of URLs with explanations about its structure.

First, the Impala JDBC URL must follow the given syntax in Semarchy xDM Data Integration:

<jdbc:semarchy:handler1>:<Impala JDBC Driver Class>:<JDBC URL>

Example:

 jdbc:semarchy:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0

Part

Description

Example

jdbc:semarchy:handler1

The first part is present because we’re using a custom Semarchy xDM Data Integration driver which helps us to handle the Kerberos security seamlessly. It is mandatory to use Kerberos.

jdbc:semarchy:handler1

Impala JDBC Driver Class

The Impala JDBC Driver Class Name.

com.cloudera.impala.jdbc4.Driver

JDBC URL

Impala JDBC URL

jdbc:semarchy:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0

JDBC URL Examples

Example of URL to connect to an Impala server which is not secured with Kerberos

 jdbc:semarchy:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0

Example of URL to connect to an Impala server which is secured with Kerberos

jdbc:semarchy:handler1:com.cloudera.impala.jdbc4.Driver:jdbc:impala://quickstart.cloudera:21050/default;OptimizedInsert=0;UseNativeQuery=0;KrbHostFQDN=quickstart.cloudera;KrbServiceName=impala;AuthMech=1;principal=impala/quickstart.cloudera@CLOUDERA

Below are some JDBC URL properties that are usually required when using Kerberos:

Property

Description

Example

AuthMech

This is used to specify the authentication mechanism to use while connection. For Kerberos this should be set to 1.

AuthMech=1

principal

Kerberos principal to connect with

principal=impala/quickstart.cloudera@CLOUDERA

KrbServiceName

Impala service’s principal name

impala

KrbHostFQDN

Fully qualified domain name of the Impala Server host

quickstart.cloudera

KrbRealm

Kerberos realm used to connect

CLOUDERA

HDFS Temporary Storage

Most of the Impala Templates are using HDFS operations to optimize the treatments and use the native loaders.

The Impala Metadata therefore requires an HDFS connection to create temporary files while processing.

Follow these steps to configure the HDFS Temporary folder:

  1. Create an HDFS Metadata or use an existing one

  2. Define in this Metadata the temporary HDFS folder where these operations should be performed

  3. Drag and drop the HDFS Folder Metadata in the Impala Metadata

  4. Rename the Metadata Link to HDFS

Impala must have the permission to access this folder.

getting started impala metadata hdfs link

Create your first Mappings

Your Metadata being ready and your tables reversed, you can now start creating your first Mappings.

You can use Impala technology in Semarchy xDM Data Integration the same way as any other database.

Drag and drop your sources and targets, map the columns as usual, and configure the Templates accordingly to your requirements.

Example of Mapping loading data from HSQL into Impala: getting started impala mapping example 1

Example of Mapping loading data from HSQL to Impala with rejects enabled: getting started impala mapping example 2

Example of Mapping loading a delimited file into Impala: getting started impala mapping example 3

Example of Mapping loading data from Impala to HSQL, using a filter and performing joins: getting started impala mapping example 4

For further information, consult the Template’s and parameters description.

Sample Project

The Hadoop Component ships sample project(s) that contain various examples and use cases.

You can have a look at these projects to find samples and examples describing how to use it.

Refer to Install Components to learn how to import sample projects.