Optimize Enricher Execution with Aggregation
Enrichers can be aggregated to enable faster processing.
When not aggregated, enrichers run one after the other—reading, modifying, and then updating the data into the database. Aggregation reduces database read/write operations by executing multiple enrichers in a single operation.
This page explains how enricher execution can be optimized using aggregation.
Multiple consecutive SemQL enrichers can be aggregated using the
PARAM_AGGREGATE_ENTITY_ENRICHERS_<entity_name> job parameters. This process converts multiple SemQL enrichers into a single SQL statement that is processed by the database to prevent consecutive database read/write operations.
Multiple consecutive API enrichers (Java plug-ins and REST clients) can be aggregated using the
PARAM_AGGREGATE_ENTITY_PLUGIN_ENRICHERS_<entity_name> job parameters. This process creates a memory-efficient chain that processes data in a single pass, avoiding successive database read/write operations.
Enricher aggregation adheres to the following rules:
Only successive enrichers of the same type (SemQL or API) can be aggregated. For example, in a sequence of enrichers such as SEMQL_1, SEMQL2, PLUGIN_1, PLUGIN_2, and SEMQL3, you can aggregate SEMQL_1 with SEMQL2, and PLUGIN_1 with PLUGIN_2.
API enricher aggregation stops when:
An API enricher has a filter that uses an attribute updated by a previous enricher in the chain.
An API enricher has an input that contains a complex SemQL expression with attributes updated by a previous enricher in the chain.
An API enricher has one of the Thread pool size, Max retry, Behavior on error, Batch update size or Processing batch size options set to a different value than a previous enricher in the chain.