Welcome to Semarchy Convergence for Data Integration.
This document is intended for users interested in deploying and optimizing the usage of the Semarchy Convergence for Data Integration components on production servers.
Preface
Audience
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface |
Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic |
Italic type indicates special emphasis or placeholder variable that you need to provide. |
|
Monospace type indicates code example, text or commands that you enter. |
Other Semarchy Resources
In addition to the product manuals, Semarchy provides other resources available on its web site: http://www.semarchy.com.
Obtaining Help
There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see http://www.semarchy.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness
of this documentation.
If you find any error or have any suggestion for improvement, please
mail support@semarchy.com and indicate the title of the documentation
along with the chapter, section, and page number, if available. Please
let us know if you want a reply.
Overview
Using this guide, you will:
-
Discover the technical architecture of Semarchy Convergence for Data Integration
-
Learn how to install the components of Semarchy Convergence for Data Integration
-
Learn how to deploy and schedule deliverables for Semarchy Convergence for Data Integration in production
This guide contains information about using the product to manage production tasks, such as:
-
installation
-
deployment
-
runtime parameters management
-
deliverable scheduling
Semarchy Convergence for Data Integration Architecture
Architecture Overview
Semarchy Convergence for Data Integration is composed of three components:
-
The Designer: This component is a thick client used for configuring and developing data integration flows.
-
The Runtime Engines (also called Runtimes): This component executes the data integration flows.
-
Analytics: This component enables monitoring and administrative tasks in production through a web interface.
The Runtime Engine
Overview
A runtime is a Java component in charge of executing the data integration flows. Execution reports (session logs) as well as schedules handled by the runtime are saved in a database.
Runtime Engine Services
The runtime exposes different services:
-
An RMI Service: this service enables Java applications (for example, the Designer) to communicate with the runtime. Its default port is 42000.
-
A Database used as the default storage of log sessions and schedules. Its default ports are 42100 and 42101.
-
A Scheduler service
-
A Web Services API (SOAP) available to start the execution of a deliverable from a remote location. Its default port is 42200.
Runtime Engine Ports
The runtime may open several ports on startup:
-
The RMI port, by default 42000
-
The SOAP port, by default 42200
-
The Database port for JDBC access, by default 42100
-
The Database HTTP listening port, by default 42101
Depending on the configuration and the runtime engine usage, these ports should be accessible from the network, and the network should be configured accordingly. Changing the default value for these ports is explained later in this document.
Configuring the Runtime
By default, the runtime stores session logs and schedules in an embedded H2 database.
The various parameters of the runtime are store in files in the properties
sub-folder:
-
properties\engineParameters.xml
contains the location of all other runtime parameters (see below). -
properties\engineScheduler.properties
defines the location where schedules are saved. This file is optional.
The engineParameters.xml
file refers to other configuration files, including:
-
properties\engineParameters.xml
-
properties\engines\commonParameters.xml
-
properties\engines\engineParameters42000.xml
-
properties\logs\logH2.xml
When modifying these files, the runtime must be restarted. The parameters available in these files are described later in this document.
Configuring Debug Logs
The runtime also stores debug logs in the log
folder. Logging uses Apache Log4J and is configured via the log4j.xml
file. By default, the debug log files are rotating, which means that the log
folder cannot exceed a certain size.
Debug Logs configuration may be changed by the administrator, or when Semarchy support requests it
Semarchy Convergence for DI Analytics
The Semarchy Convergence for DI Analytics component runs in an Application Server such as Apache Tomcat.
It is used to:
-
Deploy and schedule deliverables
-
Consolidate the session logs from several runtime engines.
-
Manage the runtime engines
Semarchy Convergence for DI Analytics is provided as a WAR file to deploy in the application server.
For more information about installing and configuring Semarchy Convergence for DI Analytics refer to this product’s documentation. |
System Requirements
Before installing Semarchy Convergence for Data Integration, you should read the system requirements and certification documents to ensure that your environment meets the minimum installation requirements.
Runtime Engine
The runtime engine has the following requirements:
-
A JDK (Java Development Kit) 1.6.07 or above.
-
80Mb available disk space.
CPU and memory requirements depend on the data integration flows that are executed by the engine. |
If using a JRE (Java Runtime Environment) instead of a JDK, advanced features such as Web Services generation will not be available. It is recommended to use the most recent Java Development Kit version. |
Semarchy Convergence for DI Analytics
Semarchy Convergence for DI Analytics has the following requirements:
-
A JEE Web Application Server such as Tomcat or JBoss is installed and configured.
-
A Java Virtual Machine or Java Development Kit version 1.6 and above, supported by the Application Server is installed.
-
A folder is available on the server to store parameters and data for Semarchy Convergence for DI Analytics.
For more information about installing and configuring Semarchy Convergence for DI Analytics refer to this product’s documentation. |
Installation
Installing the Designer
In the following section:
-
semarchy-di-designer.zip
file refers to the Convergence for Data Integration- Full Setup file that you can download to install Semarchy Convergence for Data Integration. The name of this file varies as it includes the platform information, product version and build number. -
<semarchy_di>
refers to the installation folder of Convergence for Data Integration.
To install the Designer:
-
Download the Convergence for Data Integration distribution (
semarchy-di-designer.zip
) corresponding to your platform and to your default Java Machine (32 vs. 64 Bits). -
Uncompress the
semarchy-di-designer.zip
file in your machine. This will create asemarchy_di
sub-folder. This sub-folder will be referred to as<semarchy_di>
(the Convergence for Data Integration installation directory). -
Start the Designer:
-
On Windows platforms:
-
Open Windows Explorer, and go to the
<semarchy_di>
folder. -
Run
semarchy.exe
. The Designer starts.
-
-
On UNIX/Linux platforms:
-
Open a shell window, and go to the
<semarchy_di>
folder. -
Run
./semarchy
. The Designer starts. -
In a shell window, go to the
<semarchy_di>/runtime
folder and runchmod 755 *.sh
to make the runtime scripts executable.
-
-
-
When the Designer starts, it prompts you for the license key.
-
In the Please validate your product dialog, enter in the Key the key string that was provided to you by Semarchy.
-
Click the Apply button.
-
After registering the license key, you must create the folder into which the Designer will store its data. This folder in your local machine is the Workspace. Convergence for DI Designer prompts creates by default a
workspace
folder in its installation directory.
To install it in a different location:-
In the Workspace Launcher window, click the Browse button.
-
In the Select Workspace Directory dialog, select the folder into which the workspace will be created.
-
Click OK to create the workspace and open it. The Convergence for Data Integration Designer window opens on the Introduction page. This page provides access Overview, Tutorials and Web Resource pages.
-
-
Click the Workbench link
to open the newly created workbench.
Directories Contents
The <semarchy_di>
directory contains the following sub-folders:
-
/samples
contains the files for running this getting started tutorial and other samples. -
/workspace
contains the workspace that you have created. Note that you can have several workspaces for a single installation. You can locate these workspaces anywhere in your file system. -
/templates
contains the templates provided out of the box with Convergence for Data Integration. -
/runtime
contains the Convergence for Data Integration runtime engine binary and startup scripts. -
/plugins
andconfiguration
contain the binaries and configuration files for the Designer.
Installing the Runtime Engine
The Runtime Engine is not shipped as a standalone installer or package. It is deployed the first time you start a Designer in the /runtime sub-folder. You must first install and start a designer in order to deploy runtime engines on your servers.
|
Before you begin the installation:
-
Install a supported version of the JDK (Java Development Kit)
-
Create or set the
STAMBIA_JAVA_HOME
environment variable to the folder located above thebin
folder of your JDK
Installing the Runtime Engine on Linux/UNIX
To install the runtime engine:
-
Transfer the
runtime
directory from a Semarchy Convergence for Data Integration Designer installation to the target folder. -
Grant execution permissions for the SH files in the target
runtime
folder. -
Grant write permission in the
temp
,build
andsessions
sub-directories. -
Create and set a
STAMBIA_HOME
environment variable pointing to theruntime
folder. -
If required, edit the engine configuration, as explained in the following sections.
-
Start the engine using the
./startengine.sh
command from theruntime
folder.
Installing the Runtime on Windows
To install the runtime engine on Windows:
-
Transfer the
runtime
directory from a Semarchy Convergence for Data Integration Designer installation to the target folder. -
If required, edit the engine configuration, as explained in the following sections.
-
Start the engine using the
startengine.bat
command from theruntime
folder.
To configure the runtime engine as a windows service, use the following scripts available in the runtime
directory:
-
installEngineAsService.bat –i
installs the runtime as a service. -
installEngineAsService.bat –r
removes the service
Semarchy support services may request that you modify the service configuration using the external\stambiaService\conf\stambiaEngine.conf
file. You do not need to edit this file for a standard configuration.
Any parameter change in the properties sub-folder requires that you remove and reinstall the Windows service to take these changes into account.
|
The user used to start the Runtime Engine as a Windows service should be changed to a user with network abilities, usually a domain user. |
Installing and Configuring Additional Drivers
To connect or use database technologies with their own drivers, you must add these drivers to your Convergence for DI installation for each machine running a Runtime Engine or Designer.
To install and configure an additional driver:
-
Copy the driver file (.jar) into the
runtime/lib/jdbc/
folder. -
Stop and restart the Runtime Engine or/and the Designer. The runtime automatically takes into account the new driver from that folder.
-
If Designer is installed, you must declare the driver into the workspace.
-
Open Designer and connect to your workspace.
-
Select Window > Preferences
-
In the preferences, select SQL Explorer > JDBC Drivers.
-
Click the Add button.
-
In the Name field, enter the name of your driver.
-
In the Example URL field, enter an example URL for this driver.
-
Select the Extra Class Path tab and then click the Add JARs… button.
-
Browse and select the JAR File(s) required for your driver. They should be located in the
runtime/lib/jdbc/
sub-folder of your Convergence for DI installation. -
Click List Drivers and then select your driver class name in the Driver Class Name field.
-
Click OK to create the new driver.
-
Click OK to close the preferences.
-
Configuring the Engine
Environment Variables
Two environment variables are used by the runtime engine and may have to be set specifically for your configuration:
-
STAMBIA_JAVA_HOME
: Path to the JVM used by the runtime. -
STAMBIA_PROPERTIES_LOCATION
: Path to theproperties
folder.
In order to facilitate future upgrades of the runtime engine, it is recommended to copy the properties sub-directory to another location and to set the STAMBIA_PROPERTIES_LOCATION environment variable to points to this new location. With such a configuration, an entire replacement of the runtime folder for an upgrade will not erase the configuration.
|
Log Storage Database
If you want to store logs in a specific database schema, you must configure a connection to this database.
To configure this connection, add in the properties/logs
sub-folder a file configuring the connection the database, and name it log[name].xml
where [name]
describes the database to connect.
Sample connection files are available from the properties/logs/samples sub-folder.
|
The example below is a connection to an Oracle database.
<?xml version="1.0" encoding="UTF-8"?>
<repository>
<logs>
<log userLogName="oraclePROD" autoUpdate="true" userLogClass="com.indy.engine.userLog.RdbmsUserLog" enable="false">
<parameter name="userLogRdbmsDriver" value="oracle.jdbc.driver.OracleDriver"/>
<parameter name="userLogRdbmsUrl" value="jdbc:oracle:thin:@[host]:[port]:[sid]"/>
<parameter name="userLogRdbmsUser" value="user"/>
<parameter name="userLogRdbmsPassword" value="password"/>
<!-- parameter name="userLogRdbmsEncryptedPassword" value="password"/ -->
<parameter name="userLogRdbmsVarcharType" value="varchar2"/>
<parameter name="userLogRdbmsVarcharMaxSize" value="4000"/>
<parameter name="userLogRdbmsClobType" value="clob"/>
<parameter name="userLogRdbmsBlobType" value="blob"/>
<parameter name="userLogRdbmsNumericType" value="number"/>
<parameter name="userLogRdbmsDeleteSyntaxe" value="Delete from"/>
<parameter name="userLogRdbmsdeliverableFormat" value="text"/>
<parameter name="userLogRdbmsPropertyMaxVarcharSize" value="1000"/>
<parameter name="userLogRdbmsPropertyMaxClobSize" value="10000"/>
<parameter name="userLogRdbmsPropertyBinaryFormat" value="compressed"/>
</log>
</logs>
</repository>
In the example, you can modify the following parameters to connect to your own Oracle database and schema:
-
userLogName
-
userLogRdbmsUrl
-
userLogRdbmsUser
-
userLogRdbmsPassword or userLogRdbmsEncryptedPassword
-
userLogSchemaName (if the schema to use is not the default one for the connected user)
After creating the connection file, reference it from the runtime/properties/engines/commonParameters.xml
by adding the following line:
<include file="../logs/log[name].xml"/>
Finally, in the commonParameters.xml
file, specify the default logging storage. Note that the value specified must match the userLogName value from the connection configuration file.
<engineParameters>
<parameter name="userLogDefaultName" value="OraclePROD"/>
<!-- The value specified must match the userLogName value from the connection configuration file-->
…
</engineParameters>
Listening Ports
In the engineParameters4200.xml
file, or in a copy of this file, change as needed the RMI, SOAP, etc. listening ports.
<engineParameters>
…
<parameter name="rmiPort" value="42000"/>
<!--<parameter name="rmiCallbackPort" value="42000"/>-->
<parameter name="internalDbTcpPort" value="42100"/>
<parameter name="internalDbWebPort" value="42101"/>
<parameter name="soapServerPort" value="42200"/>…
</engineParameters>
It is recommended when changing the ports to copy the file and name the copy with the number of the RMI port. For example: engineParameters5500.xml
|
Schedules Storage
By default, the scheduler stores the schedules in an embedded database. It is possible to change this storage to another location.
When the org.quartz.jobStore.dataSource
property is set to internal
(which is the default value), the scheduler stores its schedules in the runtime embedded database.
To store schedules in a different database, you must first create a database schema, and run in this schema the script to seed the storage structure. Scripts for the supported database technologies are available from the /scripts/scheduler/
sub-folder.
Then you must configure the /properties/engineScheduler.properties
file to connect to this new storage, as shown in the example below for an Oracle Server.
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.useProperties=false
# org.quartz.jobStore.dataSource=internal
org.quartz.jobStore.dataSource=database01
org.quartz.jobStore.tablePrefix=QRTZ_
org.quartz.jobStore.isClustered=false
#============================================================================
# Configure Datasources
#============================================================================
org.quartz.dataSource.database01.driver = oracle.jdbc.driver.OracleDriver
org.quartz.dataSource. database01.URL = jdbc:oracle:thin:@[host]:[port]:[sid]
org.quartz.dataSource.database01.user = oracle_user
org.quartz.dataSource.database01.password = oracle_password
org.quartz.dataSource.database01.maxConnections = 5
org.quartz.dataSource.database01.validationQuery=
Other Parameters
Services Startup
It is possible to select those of the services (Scheduler, SOAP, Embedded H2 Database) should be started with the runtime engine.
The parameters for the services are located in the engineParameters42000.xml
and commonParameters.xml
files
|
Start embedded H2 database |
|
Start SOAP service |
|
Start scheduler |
|
Start execution engine |
|
Start reporting |
A standard runtime engine should at least start the Execution Engine and Reporting. |
Folders Configuration
In the commonParameters.xml
file, two folder parameters should be reviewed and modified as needed.
|
Location of the deliveries |
|
Runtime temporary folder |
Start an Engine with a Specific Configuration File
A runtime can start with a specific configuration file.
To specify this file, use the STAMBIA_CONF_FILE_LOCATION
environment variable.
By default, the value configured in the file initvariable.sh|bat
in the following line:
STAMBIA_CONF_FILE_LOCATION=$STAMBIA_PROPERTIES_LOCATION/engineParameters.xml
Automating Log Purge
You can configure an automatic purge of the runtime session logs.
To automate the log purge:
-
Run the
startcommand.sh
(Linux/UNIX) orstartcommand.bat
(Windows) script. -
Use the following command to schedule a log purge.
schedule purge keep <number> <minute|hour|day|session>
cron <cronExpression>
[sessionname <name,name2,...>]
[status <done,error,killed>]
[on host <hostname>]
[port <hostport>]
The command defines:
-
the amount of log information to keep, with a number of days, hours or minutes or a number of sessions,
-
which job is impacted,
-
the frequency at which the purge should be started.
For example, to keep 90 days of log history for all jobs and schedules and purge every evening at 23:00:
schedule purge keep 90 day cron "0 0 23 * * ?"
Upgrading Semarchy Convergence for Data Integration
Using this guide, you will learn how to plan and perform the upgrade of Semarchy Convergence for Data Integration for development and production environments.
Before the Upgrade
Before starting the upgrade, you should review the following documents:
-
The Semarchy Convergence for Data Integration Release Notes provides the latest information about the Release, including new features and bug fixes.
-
Depending on your current version and the upgrade version, some actions may be required after the upgrade process. Review these Post-Upgrade Actions before starting the upgrade process.
Upgrading the Designer
The upgrade path is as follows:
-
Stop the local Runtime.
-
Exit Convergence for Data Integration Designer
-
Backup your existing Convergence for Data Integration Designer folder
-
Backup your workspace folder
-
Install the new version of Convergence for Data Integration Designer in a separate folder.
-
Start the new Designer and select your exiting workspace directory. Execute all Post-Upgrade Actions required for your workspace.
-
Launch a "Rebuild Cache" operation from the Impact view’s menu
-
Re-install your specific versioning system plugins if any.
-
Install the updated templates into your workspace.
You can check your Designer version in the Help > About Convergence for DI Designer menu. |
Upgrading Runtime Engines
We assume your current Runtime is installed into a directory named semarchy_runtime
.
The upgrade path is as follows:
-
Backup your existing runtime
semarchy_runtime
directory. -
Install the new Runtime in a new directory, for example
semarchy_runtime_new
On UNIX/Linux systems make sure that all thesemarchy_runtime_new/*.sh
files are executable -
Stop the previous Runtime and backup its
semarchy_runtime
installation directory. -
Copy the content of the following directories from the
semarchy_runtime
directory to the newsemarchy_runtime_new
runtime directory:-
semarchy_runtime\build\deliveries
-
semarchy_runtime\build\packages
-
semarchy_runtime\lib\jdbc
. Do not overwrite any file already present in the new runtime directory as they contain newer versions of the Convergence for DI drivers. -
semarchy_runtime\lib\addons
, if you have additional libraries added to the Runtime. -
semarchy_runtime\properties
, if you have performed specific configurations. -
semarchy_runtime\scheduler
, if you are using the Runtime’s scheduler. -
semarchy_runtime\sessions
, if you use the Runtime’s internal log database. -
semarchy_runtime\temp
, if you use this temporary folder in your processes.
-
-
Rename the old Runtime directory to a different name, for example
semarchy_runtime_backup
-
Rename the new Runtime directory to
semarchy_runtime
-
Restart the Runtime
The Semarchy Convergence for Data Integration runtime in version 3.2 is able to run deliveries produced with previous releases of Semarchy Convergence for Data Integration designer. You do not necessarily need to re-deploy deliverables when you upgrade the runtime. |
Upgrading Semarchy Convergence for DI Analytics
The upgrade path is as follows:
-
Backup your existing Semarchy Convergence for DI Analytics SEMARCHY_DI_ANALYTICS_WEBAPP_HOME directory.
-
Backup your existing Semarchy Convergence for DI Analytics installation.
-
Install the new version of Semarchy Convergence for DI Analytics. See the Semarchy Convergence for DI Analytics User’s Guide for more information.
-
Open and Save your Semarchy Convergence for DI Analytics repositories in order to upgrade them.
Once opened and saved into the new release of Semarchy Convergence for DI Analytics, a repository can no longer be opened with a previous release of Semarchy Convergence for DI Analytics. |
Post-Upgrade Actions
Upgrading to Version 3.2.x
Upgrading the Workspace
Make sure to backup your workspace before opening it with the new product release. |
You do not need to start the new version of Semarchy Convergence for Data Integration Designer with a new workspace. When launching the Designer on an existing workspace, it will automatically upgrade this workspace.
Switch to the New Internal Resource Management
The internal resources define how the Designer reverse-engineers a technology, transform Xpath expressions into SQL queries, convert datatypes, etc. Before version 3.2, these resources were stored in the ".tech" project and hidden by default. Designer introduces since version 3.2 a new storage for Internal Resources, which will give you more control on them.
When opening a workspace created with a previous version for the first time with Designer version 3.2, you will be prompted with the following options:
Choose the option for managing internal resources:
-
Close (Recommended): the .tech project will be preserved in your workspace, but in the "closed" state. Choose this option if you know that you made modifications in your .tech and you want to keep them for future reference.
-
Delete from Workspace: the .tech project is removed from your workspace, but not from your hard drive.
-
Delete permanently: the .tech project is removed from your hard drive and from your workspace. Choose this if you did not know that it existed or you know you never modified it.
-
Keep: the .tech project will remain untouched active. Choose this option if you mode changes in your .tech and you really need to keep them active.
If you have never modified the content of the .tech folder. We recommend that you choose the Close option. You will be able to delete later the .tech project as needed. |
Upgrading the Convergence for MDM Template
Review the Semarchy Convergence for Data Integration Release Notes for possible changes in the Convergence for MDM integration templates and the GetLoadID, Submit Load and Cancel Load processe templates, and upgrade these templates accordingly.
The Convergence for MDM INTEGRATION Template provided in versions before 3.2.0 included an incorrect value in certain tasks for the Core > Nb Cycles parameter. When used in version 3.2.0 and above, the resulting process never executes the INTEGRATION task (it remains grey after execution). Upgrade the template to the latest version to fix this error. If you face similar symptoms with other templates, review the value of the Core > Nb Cycles parameter in the property view for the task. If this value is set to 0, the task will never execute. It should be set to -1 instead. |
To Upgrade the Convergence for MDM Templates:
-
In the Project Explorer, expand the global project.
-
Select the templates.semarchy folder, right-click and select Delete.
-
Click OK to confirm the deletion.
Note that removing the templates does not modify the mappings using this template but makes them invalid. -
In the Project Explorer, right-click the global project and then select Import….
-
In the Import Wizard, select General > Archive File for the import source.
-
Click Next.
-
Use the Browse button to select the
<semarchy_di>/templates/templates.semarchy.zip
archive file in the From Archive File: field. -
Expand the tree view and make sure that all the templates and folders are selected.
-
Click Finish. The import process imports all the templates from the archive file into the global project.
With this import, the mappings are now valid and can be executed, re-built and deployed.
Upgrading the Mappings
The following issues may appear on existing mapping:
-
Warning on all mappings: The upgrade process leaves your existing mapping files unchanged: only the internal files are re-generated. Warning icons appear on mappings that still use the previous version’s architecture. They can still execute (directly or by executing a parent process) and the generated code will remain unchanged and continue to work exactly as before.
If you edit and save them, the Designer will silently convert them to the new architecture and this warning icon will disappear. Note that the presence of non-migrated mappings in a workspace may produce errors when trying to Move Metadata nodes to Sub-metadata files. Therefore, we recommend to open and save all the mappings that use a metadata file before moving it. -
Source in more than one Load: In previous releases, when a source table was used in more than one Load Template, the mapping would silently compile - and sometimes produce unexpected behavior at execution. The new release prevents any error before compilation. The developer is now informed with a "Problem" icon on the join. Such a mapping must be fixed.
-
Cross Joins: In previous versions, when adding source tables without an explicit join between them, a cross join was automatically created. In the new release, these mappings will display with a "Problem" icon (red triangle) on the target. The new mapping model requires that joins are designed explicitly. These mappings should be modified in order to design the join explicitly as a cross join.
-
Mappings referencing Process parameters: If a Mapping contains expressions (Filters, Mapped fields, Joins, etc.) that reference the parent process' parameters using a relative path scheme such as
${../../../../MY_PARAM}$
, it should be updated in order to take into account a new level introduced during code generation:${../../../../../MY_PARAM}$
.
We recommend to switch those mapping from the relative path scheme to an absolute path scheme when referencing parameters. For example:${~/MY_PARAM}$
.
Upgrading the Processes
The following issues may appear on existing processes:
-
Processes referencing Template variables: Process which references a template variable, for example:
${~/mapping/I_TARGET_TABLE - INTEGRATION/T - Insertion of rows in target/SQL_STAT_INSERT}$
have to be modified in order to take into account a new depth level introduced during code generation. + In the example above, you may prefer to get the statistic usingctx.sumVariable("SQL_STAT_INSERT", "~/mapping")
.
Upgrading Version Controlled Workspaces
Semarchy Convergence for Data Integration generates internal files in the indy.build folder. In previous releases, this folder existed as a sibling of each mapping in a Project. In previous releases there is only a single indy.build folder under the root of each project.
If your versioning system required to ignore the indy.build folders, you now have to configure it to ignore the new indy.build folder located at the root of the project.
All Designers sharing the same workspace through a version system should be upgrade together. A Designer in a previous version cannot open mappings created or modified by a new Designer. |
Using Templates for Stages
Semarchy Convergence for Data Integration introduces a new features called Stages. In order to use this feature, the new templates supporting it must be imported into the workspace.
Deploying Deliverables
Terminology
Configuration
A configuration is a set of values specific to an environment. For example, the Production environment is configured with specific servers (host), users, passwords, folders, etc. These values (or properties) differ from one environment to another.
In Semarchy Convergence for Data Integration, every property can be configured and can take a different value from one environment to another:
-
URL (server, port,…)
-
Schema
-
User and password
-
Table name
-
Column size, type, or name
-
etc.
The specific values of each environment are used when an integration job runs in this environment.
Deliverables
A deliverable is what is ultimately executed or scheduled. It is an XML file which contains the complete sequence of tasks to execute. It also contains pre-defined connections (server, ports, passwords, etc.). The deliverable is entirely configured for runtime. It is a self-sufficient artifact that only needs the runtime engine for execution.
Bringing an integration job to production is a two-step process:
-
Generate a deliverable from the package
-
Schedule or execute the deliverable
The development team may deliver already configured deliverables. However, in most cases, the production team has to manage the configuration for production.
Packages
A package is an archive containing all the required files and folders to generate a deliverable. When the development team generates the package, it already contains default configuration. There is no need to uncompress a package. The operations required to configure and generate a deliverable are performed through a command line interface.
Deploying a Package
When the development team generates a package, this package contains a default configuration. In the following section, you will learn the operations required to go to production. The commands and scripts used are available in the runtime root installation folder.
The following instructions explain how to deploy from a command line. It is also possible to use Semarchy Convergence for DI Analytics to deploy the deliverables in a graphical user interface. |
Commands are given with the Linux/UNIX syntax, Windows users have access to similar scripts in .bat form for their platform. |
Preparing a Configuration
Extracting the Configuration File
In order to set the configuration specific properties, you must extract the configuration file using the following command:
./buildDelivery.sh [PACKAGE_NAME] -conf [CONFIGURATION_NAME] -extract
or
./buildDelivery.sh [PACKAGE_NAME] -confFile [FILE_NAME] -extract
This command generates a file into which you can change the properties. The configuration name can be any text. The file’s default name will be [PACKAGE_NAME].[CONFIGURATION_NAME].conf
. You can choose this file name with the -confFile
option.
Editing the Configuration
The extracted file contains properties that have to be left as is or modified. The properties to change depend on the data integration job and the differences between the development and production configurations.
################################################################# ### Name: super/Rdbms MetaData/Hypersonic SQL/ServerDatamart_HSQL.rdbms ### Type: com.stambia.rdbms.server #_hXz80FvYEeGmhptGa6rXTA/url=jdbc:hsqldb:hsql://localhost:62211 #_hXz80FvYEeGmhptGa6rXTA/user=sa #_hXz80FvYEeGmhptGa6rXTA/password=3951C0D79B227B95C1DC348DD0BCE8F1 ################################################################# ### Name: super/Rdbms MetaData/Hypersonic SQL/ServerDatamart_HSQL.rdbms/DATAMART ### Type: com.stambia.rdbms.schema #_ha9XcFvYEeGmhptGa6rXTA/TABLE_SCHEM=DATAMART ################################################################# ### Name: super/Rdbms MetaData/Hypersonic SQL/ServerMotel_HSQL.rdbms ### Type: com.stambia.rdbms.server #_Pu1T8FvYEeGmhptGa6rXTA/url=jdbc:hsqldb:hsql://localhost:62210 #_Pu1T8FvYEeGmhptGa6rXTA/user=sa #_Pu1T8FvYEeGmhptGa6rXTA/password=3951C0D79B227B95C1DC348DD0BCE8F1 ################################################################# ### Name: super/Rdbms MetaData/Hypersonic SQL/ServerMotel_HSQL.rdbms/MOTEL ### Type: com.stambia.rdbms.schema #_P0qPIFvYEeGmhptGa6rXTA/TABLE_SCHEM=MOTEL
Usually, elements to modify include URLs, users, passwords, schemas, folders and such.
To modify this file, the user must un-comment the line (remove the "#" at the beginning of the line), and modify the values. You can specify a property not visible in this file using the following syntax:
[object identifier]/[property name]=[value]
The object identifier and property name are unique. This means you can reuse a configuration file for several deliveries if they use the same servers, schemas, etc, and not have to extract the configuration file for each delivery.
Encrypting Passwords
The password
property requires an encrypted password.
In order to encrypt, open a command with:
./startcommand.sh
Then in the command line, enter:
encrypt PASS_WORD`
This command returns a string which is the encrypted password which can be copied into the configuration file.
Generating Deliverables
Use the following command to generate a deliverable from a package and a configuration:
./buildDelivery.sh [PACKAGE_NAME] -conf [CONFIGURATION_NAME]
Or
./buildDelivery.sh [PACKAGE_NAME] -confFile [CONF_FILE_NAME]
Example:
./buildDelivery.shTEST_LAUNCH.pck -confPROD
or
./buildDelivery.sh TEST_LAUNCH.pck -confFileproduction.conf
The deliverable is generated in the build/deliveries
sub-folder.
When using -confFile
, you can use a configuration file that includes all the elements required for the production environment.
Executing a deliverable
To execute a deliverable, use the following command line:
./startdelivery.sh –name [DELIVERY_NAME]
Or
./startdelivery.sh –file [DELIVERY_FILE]
Certain deliverables can be parameterized with variables that you provide on the command line as shown below:
./startdelivery.sh –name [DELIVERY_NAME] –var [VAR1] [VALUE1] … -var [VARn] [VALUEn]
This script is synchronous, and waits for the execution to complete. The return code is either:
-
1 if the execution was successful
-
-1 if the execution failed
-
-2 if the session was killed
Third-Party Scheduling
It is possible to use third party schedulers to start the startdelivery.sh
script.
In addition to the return code, the command provides additional information on the standard output, as shown in the example below:
##### BEGIN ##### 04/05/2011 17:22:11,718 - SESSION: e5b70658db3117952ad056f12fbb9a21e08000 is started -- DURATION = 00:00:11,907 ##### STATISTICS ##### SQL_NB_ROWS=177051 SQL_STAT_INSERT=37969 SQL_STAT_UPDATE=0 SQL_STAT_DELETE=37972 04/05/2011 17:22:23,671 - SESSION: e5b70658db3117952ad056f12fbb9a21e08000 is ended ##### END #####
Built-In Scheduling
Scheduling Basics
You can schedule deliverables using the built-in scheduler. Deliverables scheduled this way appear as jobs named after the deliverable. However , you can schedule a deliverable several times as jobs with a different name, and attach one or more triggers to one job. A named job is unique.
Scheduling From the Command Line
Once you are in the command line (using startcommand.bat
or startcommand.sh
), connect to the runtime into which you wish to schedule a job.
To connect to the local runtime, use the command:
>connect
You can also use this command to connect to a remote runtime. Use the help
command for more details about all commands.
To schedule a job with a single schedule:
>schedule delivery MY_DELIVERY cron "0 15 10 * * ? *"
To schedule a job with different names and schedules
>schedule delivery MY_DELIVERY with name NAME1 cron "0 15 10 * * ? *" >schedule delivery MY_DELIVERY with name NAME1 cron "0 20 10 * * ? *" >schedule delivery MY_DELIVERY with name NAME2 cron "0 15 11 * * ? *"
To schedule a job with a starting date and/or an ending date:
>schedule delivery MY_DELIVERY start "2009/12/10 12:55:22" cron "0 15 10 * * ? *" >schedule delivery MY_DELIVERY start "2009/12/10 12:55:22" end "2009/12/25 12:55:22" cron "0 15 10 * * ? *"
To retrieve the list of schedules for a deliverable
> get delivery schedules MY_DELIVERY Getting schedules for MY_DELIVERY -- Trigger Name: CRON_MY_DELIVERY-0 -- Job Name: MY_DELIVERY [...]
Using the Trigger Name returned in the list of schedules, you can remove a schedule.
> remove trigger CRON_MY_DELIVERY-0
If you need more information about the command line:
>help
Cron Trigger Tutorial
General syntax
A Cron expression is a character string containing 6 or 7 fields separated by spaces.
These fields can contain the characters listed in this table, or a combination of them.
Field name |
Mandatory |
Authorized values |
Authorized special characters |
Seconds |
YES |
0-59 |
, - * / |
Minutes |
YES |
0-59 |
, - * / |
Hours |
YES |
0-23 |
, - * / |
Day of the month |
YES |
1-31 |
, - * ? / L W |
Month |
YES |
1-12 or JAN-DEC |
, - * / |
Weekday |
YES |
1-7 or SUN-SAT |
, - * ? / L # |
Year |
NO |
empty, 1970-2099 |
, - * / |
Special Characters:
-
("all values") – used to select all the values for this field. For example,
in the minutes field means ‘every minute’.
-
?
("no specific value") – useful if you need to specify something in one of the two fields relevant to this special character, but not in the other one. For example, if you wish to set a trigger for a specific day in the month (let’s say the 10th), whatever the weekday. In this case, you will put ‘10’ in the ‘day of the month’ field, and ‘?’ in the ‘weekday’ field. For further understanding, check the examples. -
-
used to specify an interval. For example,10-12
in the ‘hour’ field means “hours 10, 11 and 12”. -
,
– used to add more values. For example, "MON,WED,FRI" in the ‘weekday’ field means “Mondays, Wednesdays and Fridays”. -
/
– used to specify repetition increments. For example, "0/15" in the ‘seconds’ field means “seconds 0, 15, 30 and 45”, in other words every 15 seconds, starting at 0 included. And "5/15" in the same field means "seconds 5, 20, 35, et 50". If you put ‘/’ with no number before and a number behind (for example ‘/5’) is equivalent to putting a 0 before the ‘/’. (i.e. ‘0/5’). Another example: '1/3' in the ‘day of the month’ field means “trigger every 3 days starting on the 1st of the month”. -
L
(Last) – this character has different meanings depending on the field it is used in. For example, "L" in the ‘day of the month’ field means “the last day of the month”, i.e. the 31st for January, the 28th for February in non leap years. If ‘L’ is used in the ‘weekday’ field, it means the 7th day, i.e. Saturday (SAT). However, if ‘L’ is used in the ‘weekday’ field following a number, it will mean “the last X day in the month”; for example “6L” means “the last Friday in the month”. So as to have no ambiguity, it is advised not to use the ‘L’ character in value lists. -
W
(weekday) – Used to specify the working weekday (Monday to Friday) that is nearest to a given date. For example, “15W” in the ‘day of the month’ field means “the working weekday the closest to the 15th”. So, if the 15th happens to be a Saturday, the trigger will position itself to Friday the 14th. And if the 15th happens to be a Sunday, the job will trigger on Monday the 16th. Take care, though, if you specify “1W” and the 1st happens to be a Saturday, the job will only be triggered on Monday the 3rd, since you cannot change the month. Also, ‘W’ will only work with unique values, not with intervals. Characters ‘L’ and ‘W’ can be combined in the ‘day of the month’ field: “LW” will then mean “the last working weekday in the month”. -
#
– used to specify “the n-th day XXX in the month”. For example, value "6#3" in the ‘weekday’ field means “the 3rd Friday of the month” (Day 6 = Friday, and #3 = the 3rd of the month).
Day names are not case-sensitive. This means ‘MON’ and ‘mon’ are identical.
Examples
Expression |
Meaning for the trigger |
0 0 12 * * ? |
At 12 o’clock every day |
0 15 10 ? * * |
at 10:15 every day |
0 15 10 * * ? |
at 10:15 every day |
0 15 10 * * ? * |
at 10:15 every day |
0 15 10 * * ? 2005 |
at 10:15 every day of year 2005 |
0 * 14 * * ? |
Every minute, between 14:00 and 14:59, every day |
0 0/5 14 * * ? |
Every 5 minutes from 14:00 to 14:55, every day |
0 0/5 14,18 * * ? |
Every 5 minutes from 14:00 to 14:55, every day, and every 5 minutes from 18:00 to 18:55, every day |
0 0-5 14 * * ? |
Every minute from 14:00 to 14:05, every day |
0 10,44 14 ? 3 WED |
at 14:10 and 14:44 every Wednesday of the month of March |
0 15 10 ? * MON-FRI |
at 10:15 every Monday, Tuesday, Wednesday, Thursday and Friday |
0 15 10 15 * ? |
at 10:15 on the 15th of each month |
0 15 10 L * ? |
at 10h15 every last day of the month |
0 15 10 ? * 6L |
at 10:15 the last Friday of each month |
0 15 10 ? * 6L 2002-2005 |
at 10:15 the last Friday of each month for years 2002 to 2005 |
0 15 10 ? * 6#3 |
at 10:15 the third Friday of each month |
0 0 12 1/5 * ? |
at 12:00 every 5 days, each month, starting on the 1st |
0 11 11 11 11 ? |
Every November 11th at 11:11 |
Runtime Engine Monitoring
You can use the Designer thick client to monitor the sessions running in a runtime engine, as well as Semarchy Convergence for DI Analytics to monitor these from a web UI.
It is also possible from the runtime engine console to monitor runtime engine activity.
Connecting the Runtime
Once you are in the command line (using startcommand.bat
or startcommand.sh
), connect to the runtime running the session.
To connect to a remote runtime, use the command:
>connect to <host_name> port <port_number>
where <host_name>
is the name of the host running the runtime engine and <port_number>
is the port of the runtime engine.
To connect a runtime from its server’s comment line, just run the following:
>connect
Managing Runtime Services
The runtime runs several services which can be started and stopped from the console.
To retrieve the status of a given service (or all services) in a given format:
>get services [name <name>] [format <format>]
To stop/start/restart services:
><start|stop|restart> <name> service
Managing Sessions
The following commands allow monitoring and configuring of the list of sessions managed by the runtime.
To retrieve a list of sessions by name, id, status or duration:
>get sessions [name <name>] [id <id1,id2,idn>] [status <running,error,done,stopped>] [duration <min> [to <max>]] [limit <limit>] [format <format>]]
To stop, restart or wait for the end of a given session identified by its ID:
><stop|restart|wait> session <id>
Stopping the Runtime
To stop the runtime, optionally waiting for all sessions to complete (recommended):
>stop runtime [wait sessions]
To kill the runtime process (not recommended):
>kill runtime [wait sessions]
You can stop a runtime remotely, but you need to connect to this runtime’s server (using SSH for example) to restart it. |
Batch Commands
You can create batches of commands and run them using the startcommand
script.
This command supports the following syntaxes.
startcommand[.sh|.bat] [-separator <separator>] [<command1>;<command2>;...;<commandx>]
startcommand[.sh|.bat][-file <commands file>]
The first syntax allows you to start a sequence of commands separated by a separator (defaults to ";"). The second syntax allows you to specify a file containing a sequence of commands.
Note that the first command in the sequence should be a connect
command.
Appendix A: Runtime Engine Parameters Reference
This chapter provides a list of all parameters for the runtime engine.
Parameters Listed in engineParameters42000.xml
Parameter |
Default value |
Comment |
startInternalDb |
true |
Condition for starting the internal database used for the session logs and the scheduler |
startSoapServer |
true |
Condition for starting the SOAP services that enable Semarchy Convergence for Data Integration to expose web services |
rmiPort |
42000 |
IP port of the Java RMI service used for communication between the runtime and graphical interfaces |
rmiCallbackPort |
In the RMI protocol, the client can also receive queries from the server. In this case, it uses the rmiCallbackPort. The default value is that of the rmiPort. |
|
internalDbTcpPort |
42100 |
IP port of the internal database |
internalDbWebPort |
42101 |
IP port of the Web interface of the internal database |
soapServerPort |
42200 |
IP port used by the SOAP server |
soapServerUser |
Optional user for queries to the SOAP server |
|
soapServerPassword |
Optional password for queries to the SOAP server |
|
soapServerUncryptedPassword |
Optional un-encrypted password for queries to the SOAP server |
|
soapServerThreadPoolSize |
Maximum number of concurrent threads on the SOAP server. If number exceeds this size, sessions are put on hold |
=== Parameters listed in commonParameters.xml
Parameter |
Default value |
Comment |
userLogDefaultName |
internalDB |
Name of the log where session logs are written. This log must exist in the includes. |
debugLevel |
0 |
Debug level in the runtime |
launchSchedulerEngine |
true |
Condition for starting the scheduler that is built in the runtime. |
launchExecutionEngine |
true |
Condition for starting the execution part of the runtime. Should be set to true. |
launchReportEngine |
true |
Condition for starting the reporting part of the runtime. Should be set to true. |
memoryScanDelay |
1000 |
Deprecated. |
memoryLogScanDelay |
10000 |
Scan delay of the logs that have been brought back to memory so as to be purged, in milliseconds. |
memoryLogCacheDelay |
300000 |
Time during which logs stay in memory, in milliseconds. |
sessionFolder |
sessions |
Folder (relative or absolute) used to store the session logs. Now used to store the data of the internal base. |
rmiHost |
The RMI host is automatically calculated. If specified, this parameter skips this step. Useful when there are multiple domains, or address translations that generate different IP addresses for the same host. The host that is indicated (IP or name) must be reachable by the client |
|
soapHost |
See above. |
|
deliveryFolder |
build/deliveries |
Folders where the runtime will find the deliverables |
temporaryFolder |
temp |
Temporary folder for the runtime. Used by developers as a workspace, and by the runtime to store some temporary files. |
deliveryExtension |
deliv |
Extension of the deliverables. Do not change this value. |
defaultFetchSize |
1000 |
Default fetch value for reading data in databases. This value may be overloaded by the developers or when putting into production directly in the deliverables. |
defaultBatchSize |
1000 |
Default batch update value for writing data into databases. This value may be overloaded by the developers or when putting into production directly in the deliverables. |
defaultJdbcConnectionTimeout |
100 |
Default timeout value for connections to databases, in seconds |
defaultJdbcQueryTimeout |
10 |
Default timeout value for queries to databases, in seconds |
defaultSessionReportNumber |
10 |
Default number of sessions for the reporting part of the runtime. Usually overloaded by the graphical interface. |
stackTraceOnStdOutput |
true |
To write the stack trace on the standard output, if there is an error. |
statisticsOnStdOutput |
true |
To write the statistics on the standard output, at the end of a session. |
sumVariables |
… |
List of the variables used to calculate the session’s statistics |
=== Logs parameters
In this part, the default values will be those used for the logs in H2 (the internal base). For other logs examples, please refer to the example files that are provided.
Log header
Parameter |
Example value |
Comment |
userLogName |
internalDB |
Name of the log which will then be used in engineParameters.xml |
autoUpdate |
true |
Condition for the automatic update of the log structures |
userLogClass |
com.indy.engine.userLog.RdbmsUserLog |
Java class that is used (do not change) |
Internal parameters
Parameter |
Example value |
Comment |
userLogRdbmsDriver |
org.h2.Driver |
Java driver which will be used (this file must be in the runtime folder lib/jdbc) |
userLogRdbmsUrl |
jdbc:h2:tcp://localhost:42100/sessions/internalDb/sessionLogs |
Connection url |
userLogRdbmsUser |
sa |
Connection user |
userLogRdbmsPassword |
Connection password (non encrypted) |
|
userLogRdbmsEncryptedPassword |
Connection password (encrypted) |
|
userLogRdbmsVarcharType |
varchar |
Type used when the data is a character string |
userLogRdbmsVarcharMaxSize |
1000 |
Maximum length of the character strings with the type defined above |
userLogRdbmsNumericType |
numeric |
Type used when the data is numerical |
userLogRdbmsClobType |
clob |
Type used when the data is text (clob, limitless text) |
userLogRdbmsBlobType |
blob |
Type used when the data is binary (blob) |
userLogRdbmsSchemaName |
logs Database scheme used to create the tables |
|
userLogRdbmsUseSchemaNameForIndexCreation |
true |
Condition for adding a scheme to prefix indexes during creation |
userLogRdbmsDeleteSyntaxe |
Delete from |
Syntax of the delete commands. The name of the table will be added behind. |
userLogRdbmsCompressedLevel |
bestCompression |
Type of compression used (if activated). Possible values: bestCompression, bestSpeed or default |
userLogRdbmsDeliveryFormat |
compressed |
Gives the deliverable’s storage format in the database. Possible values: text,binary or compressed |
userLogRdbmsPropertyMaxVarcharSize |
1000 |
Size of the character strings beyond which the data will be stored as CLOB |
userLogRdbmsPropertyMaxClobSize |
10000 |
Size of the CLOB beyond which the data will be stored as BLOB. -1 means "infinite" |
userLogRdbmsPropertyBinaryFormat |
compressed |
Specifies the compression of the BLOB. Possible values: binary or compressed |
userLogRdbmsTimestampQuery |
select now() |
SQL order to retrieve the current time stamp. |
userLogRdbmsInactivityDetectionPeriod |
90000 |
|
userLogRdbmsActivityRefreshInterval |
60000 |
|
userLogRdbmsIndexCreationOption |
Character string that is added after the SQL order which creates indexes. Useful to specify physical storage parameters such as "tablespaces" or underlying physical types. For example, with MysqlENGINE = InnoDB |
|
userLogRdbmsTableCreationOption |
Character string that is added after the SQL order which creates tables. Useful to specify physical storage parameters such as "tablespaces" or underlying physical types. For example, with MysqlENGINE = InnoDB |