Hadoop Component Release Notes

This page lists the main features added to the Hadoop Component.

Feature Highlights

Version 2023.1.0

Ability to choose Hive Table Type

The Hive metadata now supports defining the Hive Table Type, which can be EXTERNAL or MANAGED.

The default type is EXTERNAL, which requires a specific configuration in the metadata. Update your hive metadata to define the external storage location, or change the type of table to MANAGED depending on your configuration.

Version 2.2.1

Minor improvements and fixed issues

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Version 2.2.0

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Version 2.1.1

This component version requires Semarchy xDI Designer version 20.4.1 or higher.

Query Editor support

The SQL Editor has been replaced in Semarchy xDI Designer by a tool named "Query Editor".

Some improvements have been made in Hadoop Component to support this new editor replacement.

Version 2.1.0

Support of EMF Compare

EMF Compare has been added in Semarchy xDI Designer

Several improvements have been made to support it with Hadoop Component.

Version 2.0.5

Minor improvements and fixed issues

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Version 2.0.4

Minor improvements and fixed issues

This version contains some minor improvements and fixed issues, which can be found in the complete changelog.

Version 2.0.3

New Template Load Salesforce to Hive

A new dedicated Template to Load data from Salesforce to Hive has been added.

It will provide a better support and optimization than the generic Templates, and handle Hive specificities when loading data from Salesforce.

HDFS File Connect Tool

When a connection issue happens on the "HDFS File Connect Tool", a session variable named "HDFS Error" is now published to provide a better error message.

Change Data Capture (CDC)

Multiple improvements have been performed to homogenize the usage of Change Data Capture (CDC) in the various Components.

Parameters have been homogenized, so that all Templates should now have the same CDC Parameters, with the same support of features.

Multiple fixes have also been performed to correct CDC issues. Refer to the changelog for the exact list of changes.

Version 2.0.2

Sample project

The Component Sample Project can now be imported directly in the "New" menu of the Project Explorer.

Version 2.0.1

HDFS Tool

Adding a new tool named "Tool HDFS File Get Properties", which allows to retrieve information about files.

For this first version it allows to retrieve the size of a remote file.

The size will be stored in the process variable "HDFS_BYTES".

Fixed issue about WebHDFS mode and files with space character

There was an issue when trying to manage HDFS files through WebHDFS mode, when the files has a space character in file name.

This is now fixed.

Change Log

Version 2023.1.3

Bug Fix

  • DI-6521: Jackson Third-Party library upgrade.

Version 2023.1.0

Breaking Changes

  • DI-5390: The metadata now supports defining the Hive Table Type, which can be EXTERNAL or MANAGED.
    The default type is EXTERNAL, which requires a specific configuration in the metadata Update your hive metadata to define the external storage location, or change the type of table to MANAGED depending on your configuration.
    See External and Managed Tables.

New Features

  • DI-5312: The metadata now supports hexadecimal properties.

  • DI-5389: The metadata reverse-engineering has been improved.

  • DI-5653: Log4j version 1 has been removed from the dependencies.

  • DI-5817: Multiple third-party libraries upgarde.

  • DI-6225: The Post Processing Operation option has been added to the Load Xml To Hive template.

Bug Fixes

  • DI-4000: The specific hive serde is missing from the Hadoop module.

  • DI-6077: Hadoop tools are missing from the process palette.

  • DI-6234: Some built-in templates cannot be saved after being modified and a NullPointerException is thrown.

Version 5.3.7 (Component Pack)

Bug Fixes

  • DI-6077: Hadoop tools are missing from the process palette.

Version 3.0.0 (Component Pack)

New Features

  • DI-4053: Query Editor menu renamed to "Launch Query Editor"

  • DI-4508: Update Components and Designer to take into account dedicated license permissions

  • DI-4727: Rebranding: Templates and sample projects

  • DI-4731: Rebranding: Template messages

  • DI-4813: Rebranding: Drivers classes and URLs

  • DI-4962: Improved component dependencies and requirements management

Version 2.2.1 (Hadoop Component)

Bug Fixes

  • DI-4559: Hive - table and column names were unexpectedly truncated to 30 characters

Version 2.2.0 (Hadoop Component)

New Features

  • DI-3713: Internal change on how some libraries are built to ease maintenance

Version 2.1.1 (Hadoop Component)

New Features

  • DI-3959: Component updated to support the replacement of SQL Explorer with the Stambia Query Editor

Version 2.1.0 (Hadoop Component)

New Features

  • DI-3510: EMF compare utility - Component has been updated to support EMF Compare comparison utility

Version 2.0.5 (Hadoop Component)

New Features

  • DI-3614: New TOOL "HDFS Get File List" allowing to retrieve a list of HDFS files and store the result in a table

Version 2.0.4 (Hadoop Component)

Bug Fixes

  • DI-2736: Template - LOAD Rdbms to Hive - generated temporary file names may unexpectedly contain object delimiters

  • DI-2737: Template - LOAD Rdbms to Impala - generated temporary file names may unexpectedly contain object delimiters

Version 2.0.3 (Hadoop Component)

New Features

  • DI-1775: New template Load Salesforce to Hive

  • DI-1777: TOOL Hdfs File Connect - an "HDFS Error" session variable is now published when a connection fails, providing more detailed information about the error

  • DI-1910: Templates updated - New parameter 'Cdc Subscriber' on Templates on which it was not handled yet

  • DI-1909: Templates updated - New Parameters 'Unlock Cdc Table' and 'Lock Cdc Table' to configure the behaviour of CDC tables locking

Bug Fixes

  • DI-1729: HDFS File Get Process Tool - Module specification was missing when using WebHDFS mode, which was causing errors as the required dependencies could not be found

  • DI-1908: Templates updated - The 'Cdc Subscriber' parameter was ignored in some Templates on Lock / Unlock CDC steps

  • DI-1907: Templates updated - The 'Cdc Subscriber' parameter was ignored in some Templates when querying the source data