Integration Process Design |
Previous
|
|
Next
|
Logical Modeling |
|
Working with Applications |
Integration Process Design
Introduction to the Integration Process
The Integration Process transforms
Source Records pushed in the hub by several
Publishers into consolidated and certified
Golden Records. This process is automated and involves several phases, generated from the rules and constraints defined in the model. The rules and constraints are defined in the model based on the functional knowledge of the entities and the publishers involved.
Integration Process Overview
The integration process involves the following steps:
-
Enrichment: During this step, the source data is enriched and standardized using SemQL and Plug-in
Enrichers.
-
Pre-Consolidation Validation: The quality of the enriched data is checked against the various
Constraints to be executed
Pre-Consolidation.
-
Matching: This process runs in two phases: first, a
binning phase creates small groups of records and a
matching phase performs the matching within these smaller bins and detects duplicates.
-
Consolidation: This process consolidates duplicates detected in the matching phase in a single record. It performs field-level or record-level consolidation.
-
Post-Consolidation Validation: This process is similar to the pre-consolidation validation, but is executed on the consolidated records.
Rules Involved in the Process
These rules involved in the process include:
-
Enrichers: Sequence of transformations performed on the source data to make it complete and standardized.
-
Data Quality Constraints: Checks done on the source and/or consolidated data to isolate erroneous rows. These include Referential Integrity, Unique Keys, Mandatory attributes, List of Values, SemQL/Plug-in Validations
-
Matcher: Process to bin (group) then match similar records, and detect them as duplicates.
-
Consolidator: Process for data reconciliation from duplicate records (detected by the matcher) into a single (golden) record.
Integration Jobs
When all the rules are defined, one of more
Integration Jobs can be defined for the model.
The integration job will run to perform the integration process, using the hub’s database engine for most of the processing (including SemQL processing) and Semarchy Convergence for MDM for running the plug-ins code.
Integration jobs are triggered to integrate data published in batch by data integration/ETL tools, or to process data handled by users in human workflows.
When pushing data in the hub, a data integration or ETL product performs the following:
- It requests a
Load ID to identify the data load and initiate a transaction with Semarchy Convergence for MDM.
- It loads data in the landing tables of Semarchy Convergence for MDM, possibly from several sources identified as
Publishers.
- It submits the load identified by the Load ID, and when submitting the load, it provides the name of the
Integration Job that must be executed to process the incoming data.
Similarly, when a user starts a human workflow for data entry and duplicate management:
- A transaction is created and attached to the workflow instance and is identified by a
Load ID.
- The user performs the data entry and duplicate management operations in the graphical user interface. All the data manipulations are performed within the transaction.
- When the activity is finished, the transaction is submitted. This triggers the
Integration Job specified in the workflow definition.
Human Workflows
Human Workflows allows users to perform data entry or duplicate management operations in the MDM hub:
- In a
Data Entry Workflow, a user enters new master data or modifies existing master data. For such workflow, the user is identified with a given
Publisher, and the data provided by this publisher is processed normally using the integration job. Note that to modify existing master data sent by another publisher, the data entry workflow creates a copy of the other publisher’s data, and publishes this data as a copy with his own publisher’s name.
- In a
Duplicate Management Workflow, a user validates or invalidates matches automatically detected by the matching process. He is able to create or split matching groups. Such operation
overrides the matching rule and is not identified with a publisher. A user decision taken in a duplicate management workflow is enforced in subsequent integration jobs execution.
When a human workflow completes, an integration job is triggered to process the data entered or the duplicate management choices made by the user.
Publishers
Publishers are application and users that provide source data to the MDM Hub. They identify themselves using a code when pushing batches of data.
The publisher does not represent the technical provider of the data (the ETL or Data Integration product), but the source of the data (The CRM Application, the Sales Management system, etc.). Examples of publishers:
CRM,
Sales,
Marketing,
Finance, etc.
Consolidation performs certain choices depending on the publishers, and the publishers are tracked to identify the origin of the golden data certified by Semarchy Convergence for MDM.
Note: Identifying clearly and declaring the publishers is important in the design of the integration process. Make sure to identify the publishers when starting an MDM project.
Important: As a general rule, use dedicated publishers for data entry operations. Such publisher can be used as a preferred publisher for all consolidation rules to enable the precedence of
user entered data over
application provided data.
To create a publisher:
- Right-click the
Publishers node and select
Add Publisher.... The
Create New Publisher wizard opens.
- In the
Create New Publisher wizard, check the
Auto Fill option and then enter the following values:
-
Name: Internal name of the object.
-
Code: Code of the publisher. This code is used by the integration process pushing data to the hub, to identify records originating from this publisher.
-
Label: User-friendly label for this object. Note that as the
Auto Fill box is checked, the
Label is automatically filled in. Modifying this label is optional.
-
Active: Check this box to make this publisher active. An inactive publisher is simply declared but not used in the consolidation rules.
- Click
Finish to close the wizard. The
Publisher editor opens.
- In the
Description field, optionally enter a description for the Publisher.
- Press
CTRL+S to save the editor.
- Close the editor.
Enrichment
Enrichers normalize, standardize and enrich source data (attribute values) pushed by the Publishers in the hub
Enrichers have the following characteristics:
- Several enrichers can be defined for an entity, and are executed in a sequence. The order in which they are defined in the model will be the order in which they will be executed
- Enrichers can be enabled or disabled.
- Enrichers apply to data from all publishers. It is possible to define a filter on each enricher. Only the filtered records are modified by the enricher.
There are two types of enrichers:
-
SemQL Enrichers express the enrichment rule in the SemQL language. These enrichers are executed in the hub’s database.
-
Plug-in Enrichers use a Plug-in developed in Java. These enrichers are executed by Semarchy Convergence for MDM. Transformation that cannot be done in within the database (for example that involve calling an external API) can be created using plug-in enrichers.
Creating SemQL Enrichers
A SemQL Enricher enriches several attributes of an entity using attributes from this entity, transformed using SemQL expressions and functions.
You will find SemQL examples for enrichers in the
Introduction to the Semarchy Workbench chapter.
To create a SemQL enricher:
- Expand the entity node, right-click the
Enrichers node and select
Add SemQL Enricher.... The
Create New SemQL Enricher wizard opens.
- In the
Create New SemQL Enricher wizard, check the
Auto Fill option and then enter the following values:
-
Name: Internal name of the object.
-
Label: User-friendly label for this object. Note that as the
Auto Fill box is checked, the
Label is automatically filled in. Modifying this label is optional.
- Click
Next.
- In the
Enricher Expressions page, select the
Available Attributes that you want to enrich and click the
Add >> button to add them to the
Used Attributes.
- Click
Next.
- Optionally click the
Edit Expression button to open the expression editor to define a filter. The enricher will only enrich those of the records respecting this filter. Skip this task if you want to enrich all the records.
- Click
Finish to close the wizard. The
SemQL Enricher editor opens.
- In the
Description field, optionally enter a description for the SemQL Enricher.
- Set the enricher expressions:
- In the
Enricher Expressions table, select the
Expression column for the attribute that you want to enrich and then click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL expression to load the attribute to enrich, and then click
OK to close the SemQL Editor. This expression may use any attribute of the current entity.
- Repeat the previous steps to set an expression for each attribute to enrich.
- Press
CTRL+S to save the editor.
- Close the editor.
Creating Plug-in Enrichers
A Plug-in Enricher enriches several attributes of an entity using attributes from this entity, transformed using a plug-in developed in Java.
A plug-in enricher takes
- a list of
Plug-in Inputs: These are attributes possibly transformed using SemQL.
- a list of
Plug-in Parameters values.
It returns a list of
Plug-in Outputs which must be mapped on the entity attributes.
Attributes are mapped on the input to feed the plug-in and on the output to enrich the entity with the resulting data transformed by the plug-in.
Note: Before using a plug-in enricher, make sure the plug-in was added to the platform by the administrator. For more information, refer to the
"Semarchy Convergence for MDM Administration Guide".
To create a plug-in enricher:
- Expand the entity node, right-click the
Enrichers node and select
Add Plug-in Enricher.... The
Create New Plug-in Enricher wizard opens.
- In the
Create New Plug-in Enricher wizard, check the
Auto Fill option and then enter the following values:
-
Name: Internal name of the object.
-
Label: User-friendly label for this object. Note that as the
Auto Fill box is checked, the
Label is automatically filled in. Modifying this label is optional.
-
Plug-in ID: Select the plug-in from the list of plug-ins installed in the platform.
- Click
Next.
- Optionally click the
Edit Expression button to open the expression editor to define a filter. The enricher will only enrich those of the records respecting this filter. Skip this task if you want to enrich all the records.
- Click
Finish to close the wizard. The
Plug-in Enricher editor opens. The
Plug-in Params,
Plug-in Inputs and
Plug-in Outputs tables show the parameters and inputs/outputs for this plug-in.
- You can optionally add parameters to the
Plug-in Params list:
- In the
Plug-in Params table, click the
Define Plug-in Parameters button.
- In the
Parameters dialog, select the
Available Parameters that you want to add and click the
Add >> button to add them to the
Used Parameters.
- Click
Finish to close the dialog.
- Set the values for the parameters:
- Click the
Value column in the
Plug-in Params table in front a parameter. The cell becomes editable.
- Enter the value of the parameter in the cell, and then press
Enter .
- Repeat the previous steps to set the value for the parameters.
- You can optionally add parameters to the
Plug-in Inputs list:
- In the
Plug-in Inputs table, click the
Define Plug-in Inputs button.
- In the
Input Bindings dialog, select the
Available Inputs that you want to add and click the
Add >> button to add them to the
Used Inputs.
- Click
Finish to close the dialog.
- Set the values for the inputs:
- Click the
Expression column in the
Plug-in Inputs table in front an input and then click the
Edit Expression button. The SemQL editor opens.
- Edit the SemQL expression using the attributes to feed the plug-in input and then click
OK to close the SemQL Editor.
- Repeat the previous steps to set an expression for the inputs.
- Select the attributes to bind no the
Plug-in Outputs:
- In the
Plug-in Outputs table, click the
Define Plug-in Outputs button.
- In the
Output Bindings dialog, select the
Available Attributes that you want to enrich and click the
Add >> button to add them to the
Attributes Used.
- Click
Finish to close the dialog.
- For each attribute in the
Plug-in Outputs table, select in the
Output Name column the plug-in output that you want to use to enrich the attribute shown in the
Attribute Name column.
- Press
CTRL+S to save the editor.
- Close the editor.
Pre and Post-Consolidation Validation
Several validations can be defined per Entity. Validations check attribute values and reject invalid records. All validations are executed on each record.
Validations can take place (and/or):
-
Pre-Consolidation: Applies to source data pushed by any publishers, after the Enrichment phase, before the Consolidation Phase. All the source records pass through all the pre-consolidation checks and records failing one check are isolated from the integration flow. All the errors for each record are raised.
-
Post-Consolidation: Applies to the data de-duplicated consolidated from the various sources. All the consolidated records pass through all the post-consolidation checks and records failing one check are isolated from the integration flow. All the errors for each record are raised.
Note:
Unique Keys are only checked post-consolidation, as they only make sense on consolidated records.
Pre vs. Post Validation
Pre-Consolidation Validation is done on the data from all publishers to this entity, after enrichment.
Post-Consolidation Validation is done on data de-duplicated and consolidated.
Choosing a validation to be done pre and/or post validation has an impact on the behavior of the integration hub.
The following examples will illustrate the impact of the choice of the pre or post consolidation validation.
Example #1:
- The
CheckNullRevenue validation checks that
Customer.revenue is not null
.
- Customer data is published from the CRM and Sales applications.
- Only the Sales publisher loads revenue data. CRM leaves it null.
- The consolidation needs critical information from the CRM application (email, name, address, etc…)
- If CheckNullRevenue is executed pre-consolidation, all data from the CRM will be rejected, as revenue is null.
- At consolidation, no data will be consolidated from the CRM.
In this example,
CheckNullRevenue should be done
Post-Consolidation to avoid rejecting information required later in the integration process.
Example #2:
- The matching process for
Customer uses the
GeocodedAddress to match customers from all the sources.
- An
IsValidGeocodedAddress validation checks that
GeocodedAddress is not empty and
GeocodedAddress.Quality is high enough.
- Enrichers will create a
GeocodedAddress, if possible.
- If the resulting
GeocodedAddress is empty or not good enough, then these customers should not be processed further.
In this example,
IsValidGeocodedAddress should be done
Pre-consolidation to avoid the performance cost of matching records with addresses not meeting the entity requirements.
Matching
The matching phase detects the duplicates in order to consolidate them into a single golden record.
Matching works differently for Fuzzy Matching and ID Matching entities.
-
Fuzzy Matching Entities use a
Matcher to automatically detect duplicates using fuzzy matching algorithms.
-
ID Matching Entities perform an exact match on the user-provided ID value, as this ID is a primary key that is unique across all systems.
Note: You can define a matcher on ID Matching Entities. It is used for for the sole purpose of detecting duplicates when creating new records in a data entry workflow. Such a matcher interactively warns the user when his new entry matches existing records.
Matcher
A Matcher is a two phase process:
-
Binning: This phase uses a set of expressions to group the records into bins. Binning is a “Divide and Conquer” approach to avoid an excessive number of comparisons.
-
Matching : This phase uses a SemQL condition that compare all the pairs of record (record1 is compared with record2) within a bin. When this condition returns true, the pair of records are considered as duplicates.
The Binning Phase
The binning phase divides the source records into
bins to allow the matching phase to run only within a given bin. As the matching phase can be a resource consuming one, reducing the number of comparisons is important for high performance matching.
Examples:
- Instead of trying to match all customers, we will try match customer only when they are located in the same country. Binning will take the
Country attribute as the binning expression.
- If we want again to make the matching phase more efficient, we could match customers with the same
Country and
SalesRegion.
Binning is done using several expressions defined in the matcher. The records for which all binning expressions give the same results belong to the same bin.
For example, to perform binning for Customers with the same Country and Region first’s letter in the GeocodedAddress complex field, we would use:
- Binning Expression #1:
GeocodedAddress.Country
- Binning Expression #2:
SUBSTR(GeocodedAddress.Region,1,1)
Warning: Smaller Bins will mean faster processing, but, you must make sure that binning does not exclude possible matches.
For example, binning by
Customer's
first four letters of the last name will split “Mr. Bill Jones-Smith” from “Mr. Bill Jonnes-Smith” into different bins. These two duplicates caused by a typo will never be matched. For this specific case, you may consider a different attribute, or a SOUNDEX on the name.
If it recommended for binning to use preferably very accurate fields, such as
CountryName,
ZipCode, etc.
The Matching Phase
The matching phase uses an condition that compares two records.
This condition uses two pseudo records named
Record1 and
Record2 corresponding to the two records being matched. If this condition is true, then the two records are considered as matched.
For example, the following matching condition matches customer having Customer names meeting the two following requirements
- Sounding the same in English (SOUNDEX) OR with a name similar (by edit distance) by more 80%.
- City name and address similar (by edit distance) by more 65%
(SOUNDEX(Record1.CustomerName) = SOUNDEX(Record2.CustomerName) OR
SEM_EDIT_DISTANCE_SIMILARITY(Record1.CustomerName,Record2.CustomerName)>80)
and SEM_EDIT_DISTANCE_SIMILARITY(Record1.InputAddress.Address, Record2.InputAddress.Address) > 65
and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.City, Record2.InputAddress.City ) > 65
Creating a Matcher
Note: Only one matcher can be created for each entity.
To create a matcher:
- Expand the entity node, right-click the
Matcher node and select
Define SemQL Matcher.... The
Create New SemQL Matcher wizard opens.
- In the
Description field, optionally enter a description for the Matcher.
- Click
Finish to close the wizard. The
SemQL Matcher editor opens.
- Define the
Binning Expressions:
- In the
Binning Expressions table, click the
Add Binning Expression button. The SemQL editor opens.
- Create a SemQL expression used to bin records for this entity, and then click
OK to close the SemQL Editor. This expression may use any attribute of the current entity.
- Repeat the previous steps to create all your binning expressions.
- Define the
Matching Condition:
- In the
Matching Condition section, click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL condition used to match records for this entity, and then click
OK to close the SemQL Editor. This condition may use any attribute of the current entity.
- Press
CTRL+S to save the editor.
- Close the editor.
Consolidation
Consolidation merges fields from all the duplicates detected into a single golden record. It is defined in the
Consolidator defined in the entity.
Consolidation Type
Consolidation uses one of the following methods:
-
Record Level Consolidation: Using this method, all fields are consolidated from one of the duplicates using a given strategy.
-
Field Level Consolidation: Using this method, a strategy can be defined for each attribute of the entity.
Consolidation Strategies
A consolidation strategy defines how to choose the best record or field value in the consolidation process. The consolidation strategies available differ depend on the consolidation method.
Record Level Consolidation
Record Level Consolidation supports the following strategies:
-
Any Value: The first record in the list.
-
Custom Ranking: A SemQL expression is used to ranks duplicates, and the value of the first duplicate is used for all fields. The expression is an order by clause and can contain the specification of the ascending (ASC) or descending (DESC) order.
-
Preferred Publisher: Publishers are manually ordered. The first one in the list returning a record is used.
Record level consolidation uses an
Additional Order By option. This option is a SemQL Expression used to sort records in the event of an ambiguity after the first strategy. For example, when two records are duplicates from the same publisher and
Preferred Publisher strategy is used.
Field Level Consolidation
With this method, a different strategy can be selected for each attribute
Field Level Consolidation supports the strategies listed in the table below. This table indicates the behavior for each strategy whether null values are skipped in the strategy or taken into account. It also shows the equivalent SemQL expression for the strategy.
Strategy |
Description |
Nulls |
Expression |
Any Value
|
The first value in the list ordered by Publisher and SourceID. |
Preserved |
PubID ASC, SourceID ASC
|
Custom Ranking
|
A SemQL expression is used to ranks duplicates, and the first value by rank is used. The expression is an order by clause and can contain the specification of the ascending (ASC) or descending (DESC) order. |
User choice |
[semQL expression] , PubID ASC, SourceID ASC
|
Largest/Smallest Value
|
Values are sorted using their type-specific sort method (alphabetical for strings, for example). For example: “Mozart” is larger than “Beethoven” (“M” is after “B” in the alphabet). LOB are not supported. |
Skipped |
[value] ASC or
[value] DESC
|
Longest/Shortest Value
|
The lengths of the values are ordered. For example: “Mozart” is shorter than “Beethoven” (String size). |
Skipped |
LENGTH([value]) ASC or
LENGTH([value])DESC
|
Most Frequent Value
|
The first most frequent non null value. |
Skipped |
Specific
|
Preferred Publisher
|
Publishers are manually ordered. The first one returning a value for the field is used. |
User choice |
Specific
|
A global
Additional Order By option stores a SemQL Expression used to sort records in the event of an ambiguity after the first strategy, for example, when two fields having different values are duplicates from the same publisher and a
Preferred Publisher strategy is used. The expression is an order by clause and can contain the specification of the ascending (ASC) or descending (DESC) order. Note that the additional order by clause is not supported for a
Most Frequent Value consolidation strategy.
Creating a Consolidator
To create a record-level consolidator:
- Expand the entity node, right-click
Consolidator node and select
Define Consolidator.... The
Create New Consolidator wizard opens.
- In the
Create New Consolidator wizard, select
Record Level Consolidation in the
Consolidator Type.
- In the
Description field, optionally enter a description for the Consolidator.
- Click
Finish to close the wizard. The
Consolidator editor opens.
- In the
Record Level Consolidation Strategy , select the consolidation strategy.
- Set the parameters depending on the selected strategy.
-
Any Value: No parameter is required.
-
Custom Ranking:
- On the
Custom Ranking Expression field, click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL expression used to rank the records, and then click
OK to close the SemQL Editor. Note that this expression is an order by clause and can contain the specification of the ascending (ASC) or descending (DESC) order.
-
Preferred Publisher:
- In the
Publisher Ranking table, click the
Add Publisher Ranking button. The
Manage Consolidators dialog opens.
- Double-click the first publisher in the
Available Publishers list to add to the
Publishers list.
- Repeat this operation to add the other publishers in the preference order.
- Use the
Move Up and
Move Down buttons to order the publishers.
- Click
Finish to close the dialog.
- Set the
Additional Order By expression.
- On the
Additional Order By field, click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL expression used to handle consolidation disambiguation, and then click
OK to close the SemQL Editor.
- Press
CTRL+S to save the editor.
- Close the editor.
To create a field-level consolidator:
- Expand the entity node, right-click
Consolidator node and select
Define Consolidator.... The
Create New Consolidator wizard opens.
- In the
Create New Consolidator wizard, select
Field Level Consolidation in the
Consolidator Type.
- In the
Description field, optionally enter a description for the Consolidator.
- Click
Finish to close the wizard. The
Consolidator editor opens. All the fields appear in the
Field Level Consolidators table and are defined with the
Any Value strategy.
- To modify the consolidation strategy for a field:
- Double-click the
Attribute Name in the
Field Level Consolidators table.
- In the
Define Field-Level Consolidator wizard, select a consolidation strategy.
- Set the parameters depending on the selected strategy:
-
Any Value,
Largest Value ,
Longest Value,
Most Frequent Value,
Shortest Value and
Smallest Value: No parameter is required. Click
Finish to close the wizard.
-
Custom Ranking:
- Click
Next.
- On the
Custom Ranking Expression field, click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL expression used to rank the records, and then click
OK to close the SemQL Editor.
- Select the
Skip Nulls option if you want to skip null values and pick the highest ranking not null value.
- Click
Finish to close the wizard.
-
Preferred Publisher:
- Click
Next.
- In the
Publisher Ranking table, click the
Add Publisher Ranking button. The
Manage Consolidators dialog opens.
- Double-click the first publisher in the
Available Publishers list to add to the
Publishers list.
- Repeat this operation to add the other publishers in the preference order.
- Use the
Move Up and
Move Down buttons to order the publishers.
- Click
Finish to close the dialog.
- Select the
Skip Nulls option if you want to skip null values returned by publishers.
- Click
Finish to close the wizard.
- Set the
Additional Order By expression.
- On the
Additional Order By field, click the
Edit Expression button. The SemQL editor opens.
- Create a SemQL expression used to handle consolidation disambiguation, and then click
OK to close the SemQL Editor.
- Press
CTRL+S to save the editor.
- Close the editor.
Creating Integration Jobs
An
Integration Job is a job executed by Semarchy Convergence for MDM to integrate and certify source data into golden records. This job uses the rules defined as part of the integration process, and contains a sequence of
Tasks running these rules. Each task addresses one entity, and performs several processes (Enrichment, Validation, etc.) for this entity.
Creating Jobs
To create a job:
- Right-click the
Jobs node and select
Add Job.... The
Create New Job wizard opens.
- In the
Create New Job wizard, check the
Auto Fill option and then enter the following values:
-
Name: Internal name of the object.
-
Description: Optionally enter a description for the Job.
-
Queue Name: Name of the queue that will contain this job.
- Click
Next.
- In the
Tasks page, select the
Available Entities that you want to process in this job and click the
Add >> button to add them to the
Selected Entities.
- Click
Finish to close the wizard. The
Job editor opens.
- Select
Tasks in the editor sidebar. In the list of
Tasks, the entities involved in each task are listed, as well as the processes (Enrichers, Matchers, etc.) that will run for these entities.
- Use the
Move Up and
Move Down buttons to order the tasks.
- To edit the processes involved in one task:
- Double-click the entity
Name in the
Tasks table. The editor switches to the
Task editor.
- Select the process that you want to enable for this task.
- Use the editor breadcrumb to go back to the Job editor.
- Press
CTRL+S to save the editor.
- Close the editor.
Jobs Parameters
Jobs can be parameterized for optimizing their execution.
To change a job parameter:
- In the job editor, select
Job Parameters in the editor sidebar.
- In the
Job Parameters table, click the
Add Parameter button. The
Create New Job Parameter wizard opens.
- In the
Name field, enter the name of the parameter.
- In the
Value field, enter the value for this parameter.
- Click
Finish to close the wizard.
- Press
CTRL+S to save the editor.
- Close the editor.
The following table lists the parameters available to customize the jobs.
Parameter Name |
Values |
Description |
PARAM_RECYCLE_ERRORS
|
0 or 1 |
If this parameter is set to 1, error recycling is triggered and rejects from previous job executions are recycled in this job. |
PARAM_ANALYZE_STATS
|
0 or 1 |
If this parameter is set to 1, statistics collection is triggered in the MDM hub tables to optimize processing. This open is useful to accelerate the processing of large data sets. |
PARAM_AGGREGATE_JOB_ENRICHERS
|
0 or 1 |
If this parameter is set to 1, consecutive SemQL enrichers are merged into a single SQL statement when executed. This applies to all entities. |
PARAM_AGGREGATE_ENTITY_ENRICHERS_<entity_name>
|
0 or 1 |
If this parameter is set to 1, consecutive SemQL enrichers are merged into a single SQL statement when executed. This applies only to the entity which name is provided as
<entity_name> .
|
Jobs Sequencing and Parallelism
Jobs are a sequence of task. These tasks must be ordered to handle referential integrity. For example, if you perform all the tasks on
Contact then
Customer, it is likely that new contacts attached to new customers will not be integrated as the new golden customers are not created yet.
They are themselves executed sequentially in a defined
Queues in a FIFO (First-In First-Out) mode.
If two jobs can run simultaneously, they should be in different queues. For example, if two jobs address two different areas in the same model, then these jobs can run simultaneously in different queues.
Designing Integration Jobs
It is recommended to create jobs specific to the data loads performed by the data integration batches, and dedicated jobs for human workflows.
Integration Jobs for Data Integration
Data published in batch may target several entities.
It is recommended to define jobs specific to the data loads targeting the hub:
- Such jobs should include all the entities loaded by the data load process.
- In addition, it is recommended to include the entities referencing the entities loaded by the data load process.
Integration Jobs for Data Entry Workflows
A data entry workflow uses a
Business Object composed of several entities.
It is recommended to define a specific job for each data entry workflow:
- Such job should process all the entities of the business object involved in the data entry workflow.
- In addition, if the business object contains fuzzy matching entities, the job should process all the entities referencing these entities.
Integration Jobs for Duplicate Management Workflows
A duplicate management workflow handles duplicates detected on a specific entity.
It is recommended to define a specific jobs for each duplicate management workflow:
- Such job should process the entity involved in the duplicate management workflow.
- In addition, if the entity involved in the duplicate management workflow is a fuzzy matching entity, the job should process all the entities referencing this entity.