Enrichers

Enrichers normalize, standardize and enrich data loaded or authored in the hub. Additional Post-Consolidation enrichers can apply to consolidated data resulting from the match and merge process.

Overview

Enrichers have the following characteristics:

  • Several enrichers can be defined for an entity, and are executed in a sequence. The order in which they are defined in the model will be the order in which they will be executed

  • Enrichers can be enabled or disabled for integration jobs. Enrichers disabled for the jobs can be used for data authoring.

  • Enrichers can be configured to run on the source data (pre-consolidation) and/or the consolidated (post-consolidation) data.

  • Enrichers running pre-consolidation apply to all incoming data. It is possible to define a filter on each enricher. Only the filtered records are modified by the enricher.

Basic entities only use pre-consolidation enrichers. Post-consolidation enrichers will not run on basic entities.

There are two types of enrichers:

  • SemQL Enrichers express the enrichment rule in the SemQL language. The hub’s database engine executes these enrichers.

  • API Enrichers use Java plug-ins or REST clients. The Semarchy xDM engine runs such enrichers. API enrichers let you perform data transformations that cannot be carried out within the database. For example, if they required an online API or an external library.

Create SemQL Enrichers

A SemQL Enricher enriches several attributes of an entity using attributes from this entity, transformed using SemQL expressions and functions.

To create a SemQL enricher:

  1. Expand the entity node, right-click the Enrichers node and select Add SemQL Enricher…. The Create New SemQL Enricher wizard opens.

  2. In the Create New SemQL Enricher wizard, check the Auto Fill option and then enter the following values:

    • Name: Internal name of the object.

    • Label: User-friendly label for this object. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

  3. Click Next.

  4. In the Enricher Expressions page, select the Available Attributes you want to enrich and click the Add >> button to add them to the Used Attributes.

  5. Click Next.

  6. Optionally click the edit expression button Edit Expression button to open the expression editor to define a filter. The enricher will only enrich those of the records respecting this filter. Skip this task if you want to enrich all the records.

  7. Click Finish to close the wizard. The SemQL Enricher editor opens.

  8. In the Description field, optionally enter a description for the SemQL Enricher.

  9. Select the Enrichment Scope for this enricher. The scope may be Pre-Consolidation Only, Post-Consolidation Only, Pre and Post Consolidation or None (not executed in the jobs).

  10. Set the enricher expressions:

    1. In the Enricher Expressions table, select the Expression column for the attribute you want to enrich and then click the edit expression button Edit Expression button. The SemQL editor opens.

    2. Create a SemQL expression to load the attribute to enrich, and then click OK to close the SemQL Editor.

    3. Repeat the previous steps to set an expression for each attribute to enrich.

  11. Press CTRL+S to save the editor.

  12. Close the editor.

When running multiple SemQL enrichers on the same entity, you can configure the enricher aggregation in the integration jobs running these enrichers for faster processing.

Create API Enrichers

An API enricher enriches and standardizes data in an entity, using values from this entity, which are transformed using a Java plug-in or a REST client.

An API enricher has:

  • a list of Inputs, which are mapped on source attributes or SemQL expressions.

  • a list of Parameters values, for plug-ins only, to configure the plug-in behavior.

  • a list of Outputs which are mapped on target attributes.

An API enricher receives the inputs and parameters, processes them and issues outputs which are then loaded into the target attributes.

  • Semarchy xDM has a large set of built-in plug-ins. See Plug-ins Reference for the list of built-in plug-ins.

  • You can also develop your own plug-ins, as described in Plug-in Development.

  • Before creating an API enricher based on a plug-in not built-into Semarchy xDM, make sure that this plug-in was added to the platform by the administrator as explained in Manage Plug-ins.

  • Before using a REST client, make sure that it is defined in the platform. Refer to REST clients for more information.

To create a plug-in enricher:

  1. Expand the entity node, right-click the Enrichers node and select Add API Enricher…. The Create New API Enricher wizard opens.

  2. In the Create New API Enricher wizard, select REST Client or Java Plug-in.

  3. Select the REST client or Java plug-in in the drop-down list.
    This list shows the built-in plug-ins and those installed in the platform, or the list of REST clients available in the platform.

  4. Check the Auto Fill option and then enter the following values:

    • Name: Internal name of the object.

    • Label: User-friendly label for this object. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

  5. Click Next.

  6. The enricher can enrich a filtered subset of the records. Click the edit expression button Edit Expression button to open the expression editor to define a filter. Skip this task if you want to enrich all the records.

  7. Click Finish to close the wizard. The API Enricher editor opens. The Plug-in Params, Inputs and Outputs tables show the parameters (for a Java plug-in only) and inputs/outputs for the selected Java plug-in or REST client.

  8. Select the Enrichment Scope for this enricher. The scope may be Pre-Consolidation Only, Post-Consolidation Only, Pre and Post Consolidation or None (not executed in the jobs).

  9. For a Java plug-in, the mandatory parameters are listed in the Plug-in Params. Optionally add the parameters that you need to set:

    1. In the Plug-in Params table, click the Define Parameters button.

    2. In the Parameters dialog, select the Available Parameters you want to add and click the Add >> button to add them to the Used Parameters.

    3. Click Finish to close the dialog.

  10. Set the values for the Java plug-in parameters:

    1. Click the Value column in the Plug-in Params table in front of a parameter. The cell becomes editable.

    2. Enter the value of the parameter in the cell, and then press Enter.

    3. Repeat the previous steps to set the value for the other parameters.

  11. Define the Inputs of the enricher. For a Java plug-in, the mandatory inputs are automatically listed in the Inputs.
    Add the inputs that you need to set for the enricher:

    1. In the Inputs table, click the Define Inputs button.

    2. In the Define Input Bindings dialog, select the Available Inputs you want to add and click the Add >> button to add them to the Used Inputs.

    3. Click Finish to close the dialog.

  12. Set the values for the inputs:

    1. Click the Expression column in the Inputs table for an input and then click the edit expression button Edit Expression button. The SemQL editor opens.

    2. Edit the SemQL expression using the attributes to feed the plug-in or REST client input and then click OK to close the SemQL Editor.

    3. Repeat the previous steps to set an expression for other inputs.

  13. Define the attributes to enrich in the Outputs table:

    1. In the Outputs table, click the Define Outputs button.

    2. In the Output Bindings dialog, select in the Available Attributes list those that you want to enrich and then click the Add >> button to add them to the Attributes Used.

    3. Click Finish to close the dialog.

  14. For each attribute in the Outputs table, select in the Output Name column the plug-in or REST client output used to enrich that attribute.

  15. Optionally, you can use Advanced Configuration properties to optimize and configure the API enricher execution.

  16. Press CTRL+S to save the editor.

  17. Close the editor.

Advanced Enricher Configuration

Plug-in and REST Clients

The enrichers using plug-ins and REST clients provide options for optimizing and configuring their execution.
The following properties appear in the Advanced Configuration section of the editor:

  • Max Retries: If the execution of the REST client or plug-in fails, it is repeated for this number of times.

  • Behavior on Error: If the execution still fails after the Max Retries have been attempted, the plug-in or REST client either skips the current record, skips the entire task, or stops the whole job, depending on this property.

  • Thread Pool Size: This property defines the number of parallel threads used when running the plug-in or REST client. For plug-ins, this option is taken into account only if the plug-in used is thread-safe and declared as such.

  • Batch Update Size: This property defines the batch update size used by an enricher to write records to the database.

    If Batch Update Size is left empty, the batch update is set to 1000 to optimize performance.
  • Processing Batch Size: This property defines the size of the record batches processed by each thread of a Java plug-in enricher. When configuring this option, bear in mind that records in a batch are processed together. If one record in a batch fails, the entire batch fails and all the records in this batch are processed according to the Max Retries and Behavior on Error properties. This property is not available for REST clients.

In addition: Semarchy xDM comes with features to optimize the execution of the enrichers, including: