Email Enricher Domain Name Cache

The email enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.

The plug-in stores the cache in the table name EXT_EMAIL_DOMAINS. This table is created at first run of the enricher, by default in the data location served by the enricher. You can specify a specific datasource location to store this table in the Datasource enricher parameter.

Domain Name Cache Table Structure

The structure of the EXT_EMAIL_DOMAINS table is the following:

Column Name Description

HOST_NAME

Domain name. e.g. "gmail.com"

PREFIX

2 first letters of the domain name. e.g. "gm"

SUFFIX

2 last letters of the domain name. e.g. "om"

HIT_COUNT

Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher.

SEED_DATA

Indicates whether this record was part of the seeded data, of created by the enricher. The value is 1 for seeded data, 0 otherwise.

VALID

Indicates whether the domain name is valid 1 or invalid 0. The value is N/A if the validity is unknown (for example, when a new domain is added in the cache in offline mode).

SUGGESTION

Latest correction found for an invalid domain.

FIRST_INVALID_DATE
LAST_INVALID_DATE
LAST_VALID_DATE

Additional date information used to reconsider a domain validity after a certain period of time.

Fixing Domain Names

The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:

  • The Edit Distance between the invalid domain and cached domain.

  • The hit count of the cached domain.

A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.

Adding Records to the Cache

It is possible to force the creation of new records in the cache, for example to create new fix suggestions.

To manually insert a domain correction <domain_name_replacement> for a <domain_host_name> invalid domain, use the following query sample:

INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	<invalid_host_name>,
	SUBSTR(<invalid_host_name>, 0, 2),
	SUBSTR(<invalid_host_name>, -2, 2),
	0,
	'1',
	'0',
	<host_name_replacement>,
	CURRENT_TIMESTAMP,
	CURRENT_TIMESTAMP
	);

Cache Refresh

The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.

If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.

It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.