Optimize enricher execution with aggregation

Enrichers can be aggregated to enable faster processing.

When not aggregated, enrichers run one after the other—​reading, modifying, and then updating the data into the database. Aggregation reduces database read/write operations by executing multiple enrichers in a single operation.

This page explains how enricher execution can be optimized using aggregation.

Whether or not to aggregate enrichers depends on factors such as the volume of data, the complexity of data sources, and the type and number of enrichers involved. Aggregation can optimize the execution of enrichers by reducing the number of database queries or external calls when dealing with a large volume of data and a high number of enrichers. However, in scenarios with a limited number of enrichers or when successive enrichers are not of the same type (SemQL or API), the overhead of aggregation might not provide a noticeable performance improvement.

Enable enricher aggregation

SemQL enricher aggregation

Multiple consecutive SemQL enrichers can be aggregated using the PARAM_AGGREGATE_JOB_ENRICHERS and PARAM_AGGREGATE_ENTITY_ENRICHERS_<entity_name> job parameters. This process converts multiple SemQL enrichers into a single SQL statement that is processed by the database to prevent consecutive database read/write operations.

API enricher aggregation

Multiple consecutive API enrichers (Java plug-ins and REST clients) can be aggregated using the PARAM_AGGREGATE_JOB_PLUGIN_ENRICHERS and PARAM_AGGREGATE_ENTITY_PLUGIN_ENRICHERS_<entity_name> job parameters. This process creates a memory-efficient chain that processes data in a single pass, avoiding successive database read/write operations.

Enricher aggregation rules and limitations

Enricher aggregation adheres to the following rules:

  • Only successive enrichers of the same type (SemQL or API) can be aggregated. For example, in a sequence of enrichers such as SEMQL_1, SEMQL2, PLUGIN_1, PLUGIN_2, and SEMQL3, you can aggregate SEMQL_1 with SEMQL2, and PLUGIN_1 with PLUGIN_2.

  • API enricher aggregation stops when:

    • An API enricher has a filter that uses an attribute updated by a previous enricher in the chain.

    • An API enricher has an input that contains a complex SemQL expression with attributes updated by a previous enricher in the chain.

    • An API enricher has one of the Thread pool size, Max retry, Behavior on error, Batch update size or Processing batch size options set to a different value than a previous enricher in the chain.