Change Data Capture (CDC)

Change Data Capture (CDC) is a commonly used process that detects and captures data changes to perform actions. In a Data Integration context, CDC is mostly used to replicate data between databases and/or systems.

Change Data Capture in Semarchy xDI is:

  • Managed using dedicated process actions.

  • Activated and used in mappings.

Install the CDC Component

Using Change Data Capture in Semarchy xDI requires the CDC Component.

Configure CDC

Depending on the database’s technology, CDC in Semarchy xDI relies either on native CDC features offered by the technology (if available) or, by default, on a generic trigger-based mechanism. In this case, Semarchy xDI automatically generates the tables and triggers required to enable CDC for the concerned database.

For more details about CDC capabilities per technology, refer to the Components reference documentation.

Create A CDC Management Process

CDC in Semarchy xDI is managed using dedicated process actions.

To manage CDC for a given datastore table:

  1. Create a new process.

  2. Expand the Tools accordion in the process editor’s palette.

  3. Select the CDC xxxxx tool that corresponds to the datastore’s technology and (depending on the technology) the expected operation (for example, CDC PostgreSQL, CDC Mssql - Start or Stop…​).

    If CDC tools are not available in the palette, refer to Install the CDC Component.
  4. Click on an empty area of the process diagram to add the action.

  5. Drag and drop the concerned datastore table from the Project Explorer onto the REF element of the added process step.

    Add CDC Action

  6. In the Properties view, click the Cdc Operation property to enable it and select the operation to execute.

    The following sections describe the typical operations that can be performed using a CDC management process.
    The list of available operations and parameters may differ depending on the technology. For more details, refer to the Components reference documentation.

Set Up CDC (trigger-based)

By default, CDC uses dedicated tables and triggers to track and store data changes that occur on a datastore table. These technical elements must be initiated before enabling CDC.

To create CDC tables and triggers for one or several datastore tables:

  1. Create a CDC Management Process or open an existing one from the Project Explorer.

  2. In the Properties view, click the Cdc Operation property to enable it and select START in the dropdown list.

  3. Right-click an empty area of the diagram and select Run to execute the process.

TIP

  • To remove tables and triggers, select the STOP operation instead.

  • To remove and re-create table and triggers, create a process composed of a CDC STOP step followed by a START step.

Add Subscribers

CDC events for data changes are published for and consumed by identified subscribers. By default, events are published for a subscriber named defaultSubscriber.

To add a CDC subscriber:

  1. Create a CDC Management Process or open an existing one from the Project Explorer.

  2. In the Properties view:

    • Click the Cdc Operation property label to enable it and select ADD SUBSCRIBER in the dropdown list.

    • Click the Subscriber property label and enter the name of the subscriber to create.

  3. Right-click an empty area of the diagram and select Run to execute the process.

The same subscriber name must be set on the target datastore’s load template (or on the stage template if the source datastore is mapped with a stage) for captured data changes to be consumed by the target.
To remove a subscriber, select the REMOVE operation instead.

Use CDC in Mappings

Once CDC is set up for a datastore table, it can be used in a mapping so that only data changes for this table flow to a target datastore. Once CDC is enabled, the CDC table that corresponds to the source datastore is automatically used when the mapping is executed in replacement of the source table.

To enable CDC in a mapping:

  1. Right-click the concerned datastore on the mapping diagram and then select enable CDC icon Consume CDC Data (enable/disable) in the contextual menu.

    You can alternatively enable CDC by selecting Use CDC in the datastore’s Properties View.
  2. Select the load template icon on the target datastore, or the stage template icon if the source datastore is mapped with a stage.

  3. In the Properties view of the Load template, review and edit the CDC parameters (see below).

CDC Template Parameters

Property

Default Value

Description

Cdc Subscriber

empty

Defines the name of the CDC subscriber. This name must match with a subscriber name set in the concerned datastore’s CDC configuration (see Add Subscribers).

Cdc Wait Mode

empty

If selected, the process will wait until a defined amount or rows have been updated

Cdc Wait Poll Interval

empty

If Cdc Wait Mode is selected, this parameter defines the wait interval between two checks for data changes.

Cdc Wait Rows Number

empty

If Cdc Wait Mode is selected, this parameter defines the minimum number of expected rows required to execute the mapping.

Cdc Wait Timeout

empty

If Cdc Wait Mode is selected, this parameter defines the duration after which the mapping must be executed even if less than Cdc Wait Rows Number have been detected.

Lock Cdc Table

True

When selected, a lock is performed on the source table to make changed data available for the mapping. If this option is not selected, the lock must be performed by an anterior mapping.

Unlock Cdc Table

True

When selected, the source table is unlocked after processing to mark changed data as consumed. If this option is not selected, the unlock must be performed by a posterior mapping.

These parameters apply only if CDC is enabled (see Use CDC In mappings).
These parameters are the most frequent CDC parameters available on Load and/or Stage templates, which may be available or unavailable depending on the technology and the template type.

Capture Data Changes

Start Data Capture

  • If CDC uses the generic trigger-based mechanism to capture the changes for a given technology, all changes in a database table are automatically captured as soon as the CDC table and triggers have been created for this datastore (see Set Up CDC (Trigger-Based)).

  • If CDC uses native CDC features proposed by the datastore’s technology, it may be required, depending on this technology, to start the capture of data changes with a process running technology-specific actions. In this case, refer to the Components reference documentation.

View Captured Data

To view the data changes captured by CDC on a source datastore, right-click this datastore on the mapping diagram and then select Actions > Consult Captured Data. This runs a SELECT query on the CDC table.