Email Enricher Domain Name Cache
The email enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.
The plug-in stores the cache in the table name
EXT_EMAIL_DOMAINS. This table is created at first run of the enricher, by default in the data location served by the enricher.
You can specify a specific datasource location to store this table in the Datasource enricher parameter.
The structure of the
EXT_EMAIL_DOMAINS table is the following:
Domain name. e.g. "gmail.com"
2 first letters of the domain name. e.g. "gm"
2 last letters of the domain name. e.g. "om"
Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher.
Indicates whether this record was part of the seeded data, of created by the enricher. The value is
Indicates whether the domain name is valid
Latest correction found for an invalid domain.
Additional date information used to reconsider a domain validity after a certain period of time.
The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:
The Edit Distance between the invalid domain and cached domain.
The hit count of the cached domain.
A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.
It is possible to force the creation of new records in the cache, for example to create new fix suggestions.
To manually insert a domain correction
<domain_name_replacement> for a
<domain_host_name> invalid domain, use the following query sample:
INSERT INTO EXT_EMAIL_DOMAINS (
SUBSTR(<invalid_host_name>, 0, 2),
SUBSTR(<invalid_host_name>, -2, 2),
The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.
If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.
It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.