Slowly Changing Dimensions
Overview
Slowly Changing Dimension (SCD) are used in DataWarehouses to track the changes in the databases.
Tracking the updates of customers' addresses to keep a trace of it, for example.
With SCD there are different ways of doing it:
-
Insert a new row every time the address is changed (historize)
-
Only update the row
-
Use a start date, end date, etc.
For further information on what is SCD, there are good articles you can find on the internet.
Semarchy xDI supports using SCD with the Oracle database Metadata and Templates. We will see in this article how to do that with examples.
Goal
We will use SCD to track the changes of the cities' names of the T_ADDRESS demo tables.
Create the dimension table
The first step consists of the creation of a new table in oracle, the dimension table, in which we will track the updates.
We put in this table the informations about addresses.
Here is the code for our example:
CREATE TABLE XDI_TEMPLATE_ORACLE_DEMO.dim_geo
(
id_geo INTEGER,
ADR_id INTEGER,
city VARCHAR2(200) ,
zipcode CHAR(5) ,
dim_version INTEGER ,
start_date TIMESTAMP,
end_date TIMESTAMP ,
dim_comment VARCHAR2(200)
)
We will use a sequence for the id_geo column:
CREATE SEQUENCE XDI_TEMPLATE_ORACLE_DEMO.seq_dim_geo minvalue 0;
Then, the table can be reversed on the oracle metadata.
SCD will be managed by Semarchy xDI and the Oracle Integration Template, this table is nothing more than a usual Oracle table. |
Configure the metadata
The next step is the configuration of the columns of the dimension table, directly in the metadata. The purpose is to indicate the SCD behaviour for each column.
The different modes are:
Mode | Description |
---|---|
historizedIfModified |
If a change is detected on this column a new row will be inserted with the new value and this row will become the current one. |
updatedIfModified |
If a change is detected on this column the existing record will be updated with the new value. |
startDate |
The date or timestamp when the record was inserted. |
endDate |
The date or timestamp when the record was outdated for a newer version. |
recordVersion |
A flag indicating the status of the record (0=history, 1=current) |
surrogateKey |
A unique specific key to each record. This column will typically be loaded with a Sequence. |
naturalKey |
The key which identifies logically the record, regardless of the version that have been historized (typically the primary key of source table) |
Below the modes used for our example:
Column | Mode |
---|---|
ID_GEO |
surrogateKey |
ADR_ID |
naturalKey |
CITY |
historizedIfModified |
ZIPCODE |
historizedIfModified |
DIM_VERSION |
recordVersion |
START_DATE |
startDate |
END_DATE |
endDate |
DIM_COMMENT |
updateIfModified |
Create a mapping to load the table
Mapping Overview
CUS_START_DATE, CUS_END_DATE , CUS_VERSION are left empty. These columns are used for historization and will be managed automatically by the template. The Integration Strategy has to be set to SlowlyChangingDimension and the Useless Update Detection Method to notExists. We are using a sequence for the ID_GEO column. This expression must be executed on Target. |
After executing the mapping, the dimension is loaded with data.
Exemple:
The DIM_VERSION is 1 and the END_DATE is null because it is the current version so it does not have yet an end date.
Update a historized field
Our mapping is ready, we are now going to update a city, which is set to historizedIfModified mode in the dimension table.
For that, we update the city in the in the T_ADDRESS table with an update statement in the SQL Editor for example:
UPDATE HOTEL_MANAGEMENT.T_ADDRESS SET ADR_CITY='BOSTONnew' WHERE ADR_ID=1;
Be careful, the dimension table is just used to track changes and is mostly used for historization.
So the updates must still be done in T_ADDRESS in our example.
Then we execute the mapping again, which will historize the changes of the source table.
Here is an example of the result, after updating the city and executing the mapping:
A new record has been added with the DIM_VERSION 1 and the old record has been updated with and END_DATE and its DIM_VERSION is set to 0.
Update an updateIfModified field
We will now try to update the DIM_comment column, which is set to updateIfModified mode in the dimension table.
This column is filled manually in mapping in this exemple so we have to update our mapping
Example of result:
The comment has been directly updated in the current version of the record and no new record has been added.