Email enricher domain name cache
The Email enricher uses a local cache to prevent redundant MX record lookups when checking the validity of an email domain.
This domain name cache takes precedence, meaning that if a record exists in the cache, the enricher will use the locally available information, thus bypassing the need for an MX records lookup.
The plugin saves the cache in a table named EXT_EMAIL_DOMAINS
. This table is created during the first execution of the enricher and is stored by default in the data location served by the enricher.
You can designate a specific datasource location to store this table by configuring the enricher’s Datasource
parameter.
Domain name cache table structure
The structure of the EXT_EMAIL_DOMAINS
table is the following:
Column name | Description |
---|---|
|
Domain name (e.g., |
|
First two letters of the domain name (e.g., |
|
Last two letters of the domain name (e.g., |
|
Number of times this hostname was processed by the enricher. This value is automatically incremented by the enricher. |
|
Indicates whether this record was part of the seeded data, or created by the enricher. The value is |
|
Indicates whether the domain name is valid |
|
Latest correction found for an invalid domain. |
|
Additional date information used to reconsider a domain validity after a certain period. |
Correcting domain names
The enricher automatically rectifies invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:
-
The edit distance between the invalid domain and cached domain.
-
The hit count of the cached domain.
A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.
Adding records to the cache
It is possible to force the creation of new records in the cache (e.g., to create new fix suggestions).
To manually insert a domain correction <domain_name_replacement>
for a <domain_host_name>
invalid domain, use the following query sample:
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
SUBSTR('<invalid_host_name>', 0, 2),
SUBSTR('<invalid_host_name>', -2, 2),
0,
'1',
'0',
'<host_name_replacement>',
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
LEFT('<invalid_host_name>', 2),
RIGHT('<invalid_host_name>', 2),
0,
'1',
'0',
'<host_name_replacement>',
NOW(),
NOW()
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
LEFT('<invalid_host_name>', 2),
RIGHT('<invalid_host_name>', 2),
0,
'1',
'0',
'<host_name_replacement>',
GETDATE(),
GETDATE()
);
In online mode, MX record lookups are resolved by the DNS and flag any valid domain (such as gail.com) as valid. Consequently, the Email enricher does not apply user-created suggestions to replace such domains. Suggestions are only applied to syntactically invalid domains in offline mode or non-existing domains in online mode. |
Cache refresh
The Email enricher refreshes the local cache records after three months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.
If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to set the date field to NULL so that the email enricher does not refresh the cache for that domain.
It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the three-month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.