Change Data Capture (CDC)

Change Data Capture (CDC) is a commonly used process that detects and captures data changes to perform actions. In a Data Integration context, CDC is mostly used to replicate data between databases and/or systems.

Change Data Capture in Semarchy xDI is:

  • Managed using dedicated process actions.

  • Activated and used in mappings.

Install the CDC Component

Using Change Data Capture in Semarchy xDI requires the CDC Component.

Configure CDC

Depending on the database technology, CDC in Semarchy xDI either relies on native CDC features offered by the technology (if available) or, by default, on a generic trigger-based mechanism. In this case, Semarchy xDI automatically generates the tables and triggers required to enable CDC for the concerned database.

Refer to the Components reference documentation for more information on CDC capabilities per technology.

Create A CDC Management Process

CDC in Semarchy xDI is managed using dedicated process actions.

To manage CDC for a given datastore table:

  1. Create a new process.

  2. In the process editor’s palette, expand the Tools accordion.

  3. Select the CDC xxxxx tool that corresponds to the datastore’s technology and (depending on the technology) the expected operation (for example, CDC PostgreSQL, CDC Mssql - Start or Stop…​).

    If CDC tools are not available in the palette, you need to first install the CDC component. Refer to Install the CDC Component for more information.
  4. Click inside an empty area of the process diagram to add the action.

  5. Drag and drop the concerned datastore table from the Project Explorer onto the REF element of the added process step.

    Add CDC Action

  6. In the Properties view, click the Cdc Operation property to enable it and select the operation to execute.

CDC Operations

The following sections describe the typical operations that can be performed using a CDC management process.

The list of available operations and parameters may differ depending on the technology. Refer to the Components reference documentation for more information.

Set Up CDC (trigger-based)

By default, CDC uses dedicated tables and triggers to track and store data changes that occur on a datastore table. These technical elements must be initiated before enabling CDC.

To create CDC tables and triggers for one or several datastore tables:

  1. Create a CDC Management Process or open an existing one from the Project Explorer.

  2. In the Properties view, click the Cdc Operation property to enable it and select START in the dropdown list.

  3. Right-click in an empty area of the diagram and click Run to execute the process.

TIP

  • To remove tables and triggers, select the STOP operation instead.

  • To remove and re-create table and triggers, create a process composed of a CDC STOP step followed by a START step.

Add Subscribers

CDC events for data changes are published for and consumed by identified subscribers. By default, events are published for a subscriber named defaultSubscriber.

To add a CDC subscriber:

  1. Create a CDC Management Process or open an existing one from the Project Explorer.

  2. In the Properties view:

    1. Click the Cdc Operation property label to enable it and select ADD SUBSCRIBER in the dropdown list.

    2. Click the Subscriber property label and enter the name of the subscriber to create.

  3. Right-click in an empty area of the diagram and click Run to execute the process.

The same subscriber name must be set on the target datastore’s load template (or on the stage template if the source datastore is mapped with a stage) for captured data changes to be consumed by the target.
To remove a subscriber, select the REMOVE operation instead.

Use CDC in Mappings

Once CDC has been set up for a datastore table, it can be used in a mapping so that only the data changes for the table flow to a target datastore. When CDC is enabled, at the time of execution, the mapping automatically uses the CDC table corresponding to the source datastore instead of the source table.

To enable CDC in a mapping:

  1. Right-click the concerned datastore on the mapping diagram and from the contextual menu, click enable CDC icon Consume CDC Data (enable/disable).

    You can alternatively enable CDC by selecting Use CDC in the datastore’s Properties View.
  2. Click the load template icon on the target datastore, or the stage template icon if the source datastore is mapped with a stage.

  3. In the Properties view of the Load template, review and edit the CDC Template Parameters.

CDC Template Parameters

Property

Default Value

Description

Cdc Subscriber

empty

Defines the name of the CDC subscriber. This name must match with a subscriber name set in the concerned datastore’s CDC configuration (see Add Subscribers).

Cdc Wait Mode

empty

If selected, the process will wait until a defined amount or rows have been updated

Cdc Wait Poll Interval

empty

If Cdc Wait Mode is selected, this parameter defines the wait interval between two checks for data changes.

Cdc Wait Rows Number

empty

If Cdc Wait Mode is selected, this parameter defines the minimum number of expected rows required to execute the mapping.

Cdc Wait Timeout

empty

If Cdc Wait Mode is selected, this parameter defines the duration after which the mapping must be executed even if less than Cdc Wait Rows Number have been detected.

Lock Cdc Table

True

When selected, a lock is performed on the source table to make changed data available for the mapping. If this option is not selected, the lock must be performed by an anterior mapping.

Unlock Cdc Table

True

When selected, the source table is unlocked after processing to mark changed data as consumed. If this option is not selected, the unlock must be performed by a posterior mapping.

These parameters only apply if CDC is enabled. Refer to Use CDC In mappings for more information.
These parameters are the most frequent CDC parameters available on Load and/or Stage templates. Their availability depends on the technology and the template type.

Capture Data Changes

Start Data Capture

  • If CDC uses the generic trigger-based mechanism to capture the changes for a given technology, all changes in a database table are automatically captured as soon as the CDC table and triggers have been created for this datastore. Refet to Set Up CDC (Trigger-Based) for more information.

  • If CDC uses native CDC features proposed by the datastore’s technology, it may be required, depending on this technology, to start the capture of data changes with a process running technology-specific actions. Refer to the Components reference documentation for more information.

View Captured Data

To view the data changes captured by CDC on a source datastore, right-click this datastore on the mapping diagram and click Actions > Consult Captured Data. This runs a SELECT query on the CDC table.