Optimize Enricher Execution with Aggregation
Enrichers support aggregation for faster processing.
When not aggregated, enrichers run one after the other, reading, modifying and then updating the data into the database. With aggregation, you avoid excessive database read/write operations by forcing several enrichers to run in a single operation.
Multiple consecutive SemQL Enrichers are aggregated using the
PARAM_AGGREGATE_ENTITY_ENRICHERS_<entity_name> job parameters. This aggregation converts the multiple SemQL enrichers into a single SQL statement processed by the database, preventing successive database read/writes.
Multiple consecutive API Enrichers (Java plug-in and REST clients) are aggregated using the
PARAM_AGGREGATE_ENTITY_PLUGIN_ENRICHERS_<entity_name> job parameters. This aggregation creates a processing chain that processes data in one pass in memory, avoiding successive database read/writes.
Enricher aggregation follows the rules listed below:
Only successive enrichers of the same type (SemQL or API) can be aggregated. For example, if you have the following sequence of enrichers SEMQL_1, SEMQL2, PLUGIN_1, PLUGIN_2, SEMQL3, you can aggregate SEMQL_1 with SEMQL2 and PLUGIN_1 with PLUGIN_2.
API enricher aggregation will stop when:
An API enricher has a filter that uses an attribute updated by a previous enricher in the chain.
An API enricher has an input that contains a complex SemQL expression with attributes updated by a previous enricher in the chain.
An API enricher has one of the Thread Pool Size, Max Retry, Behavior on Error, Batch Update Size or Processing Batch Size options set to a different value than a previous enricher in the chain.