Slowly Changing Dimensions

Overview

Slowly Changing Dimension (SCD) are used in DataWarehouses to track the changes in the databases.

Tracking the updates of customers' addresses to keep a trace of it, for example.

With SCD there are different ways of doing it:

  • Insert a new row every time the address is changed (historize)

  • Only update the row

  • Use a start date, end date, etc.

For further information on what is SCD, there are good articles you can find on the internet.

Semarchy xDM Data Integration supports using SCD with the Oracle database Metadata and Templates. We will see in this article how to do that with examples.

Goal

We will use SCD to track the changes of the cities' names of the T_ADDRESS demo tables.

Create the dimension table

The first step consists of the creation of a new table in oracle, the dimension table, in which we will track the updates.

We put in this table the informations about addresses.

Here is the code for our example:

CREATE TABLE XDI_TEMPLATE_ORACLE_DEMO.dim_geo
(
      id_geo INTEGER,
      ADR_id INTEGER,
      city VARCHAR2(200) ,
      zipcode CHAR(5) ,
      dim_version INTEGER ,
      start_date TIMESTAMP,
      end_date TIMESTAMP ,
      dim_comment VARCHAR2(200)
)

We will use a sequence for the id_geo column:

CREATE SEQUENCE XDI_TEMPLATE_ORACLE_DEMO.seq_dim_geo minvalue 0;

Then, the table can be reversed on the oracle metadata.

SCD will be managed by Semarchy xDM Data Integration and the Oracle Integration Template, this table is nothing more than a usual Oracle table.

Configure the Metadata

The next step is the configuration of the columns of the dimension table, directly in the metadata. The purpose is to indicate the SCD behaviour for each column.

scd metadata

The different modes are:

Mode Description

historizedIfModified

If a change is detected on this column a new row will be inserted with the new value and this row will become the current one.

updatedIfModified

If a change is detected on this column the existing record will be updated with the new value.

startDate

The date or timestamp when the record was inserted.

endDate

The date or timestamp when the record was outdated for a newer version.

recordVersion

A flag indicating the status of the record (0=history, 1=current)

surrogateKey

A unique specific key to each record. This column will typically be loaded with a Sequence.

naturalKey

The key which identifies logically the record, regardless of the version that have been historized (typically the primary key of source table)

Below the modes used for our example:

Column Mode

ID_GEO

surrogateKey

ADR_ID

naturalKey

CITY

historizedIfModified

ZIPCODE

historizedIfModified

DIM_VERSION

recordVersion

START_DATE

startDate

END_DATE

endDate

DIM_COMMENT

updateIfModified

Create a Mapping to load the table

Mapping Overview

scd mapping overview

CUS_START_DATE, CUS_END_DATE , CUS_VERSION are left empty. These columns are used for historization and will be managed automatically by the template. The Integration Strategy has to be set to SlowlyChangingDimension and the Useless Update Detection Method to notExists. We are using a sequence for the ID_GEO column. This expression must be executed on Target.

After executing the mapping, the dimension is loaded with data.

Exemple:

scd mapping result

The DIM_VERSION is 1 and the END_DATE is null because it is the current version so it does not have yet an end date.

Update an historized field

Our mapping is ready, we are now going to update a city, which is set to historizedIfModified mode in the dimension table.

For that, we update the city in the in the T_ADDRESS table with an update statement in the SQL Editor for example:

UPDATE HOTEL_MANAGEMENT.T_ADDRESS SET ADR_CITY='BOSTONnew' WHERE ADR_ID=1;

Be careful, the dimension table is just used to track changes and is mostly used for historization.

So the updates must still be done in T_ADDRESS in our example.

Then we execute the mapping again, which will historize the changes of the source table.

Here is an example of the result, after updating the city and executing the mapping:

scd mapping updated adr result

A new record has been added with the DIM_VERSION 1 and the old record has been updated with and END_DATE and its DIM_VERSION is set to 0.

Update an updateIfModified field

We will now try to update the DIM_comment column, which is set to updateIfModified mode in the dimension table.

This column is filled manually in mapping in this exemple so we have to update our mapping

scd mapping updated comment

Example of result:

scd mapping updated comment result

The comment has been directly updated in the current version of the record and no new record has been added.