Email enricher domain name cache

The Email enricher uses a local cache to prevent redundant MX record lookups when checking the validity of an email domain.
This domain name cache takes precedence, meaning that if a record exists in the cache, the enricher will use the locally available information, thus bypassing the need for an MX records lookup.

The plugin saves the cache in a table named EXT_EMAIL_DOMAINS. This table is created during the first execution of the enricher and is stored by default in the data location served by the enricher. You can designate a specific datasource location to store this table by configuring the enricher’s Datasource parameter.

Domain name cache table structure

The structure of the EXT_EMAIL_DOMAINS table is the following:

Column name Description

HOST_NAME

Domain name (e.g., gmail.com).

PREFIX

First two letters of the domain name (e.g., gm).

SUFFIX

Last two letters of the domain name (e.g., om).

HIT_COUNT

Number of times this hostname was processed by the enricher. This value is automatically incremented by the enricher.

SEED_DATA

Indicates whether this record was part of the seeded data, or created by the enricher. The value is 1 for seeded data, 0 otherwise.

VALID

Indicates whether the domain name is valid 1 or invalid 0. The value is N/A if the validity is unknown (e.g., when a new domain is added in the cache in offline mode).

SUGGESTION

Latest correction found for an invalid domain.

FIRST_INVALID_DATE
LAST_INVALID_DATE
LAST_VALID_DATE

Additional date information used to reconsider a domain validity after a certain period.

Correcting domain names

The enricher automatically rectifies invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:

  • The edit distance between the invalid domain and cached domain.

  • The hit count of the cached domain.

A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.

Adding records to the cache

It is possible to force the creation of new records in the cache (e.g., to create new fix suggestions).

To manually insert a domain correction <domain_name_replacement> for a <domain_host_name> invalid domain, use the following query sample:

INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	SUBSTR('<invalid_host_name>', 0, 2),
	SUBSTR('<invalid_host_name>', -2, 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	CURRENT_TIMESTAMP,
	CURRENT_TIMESTAMP
	);
For PostgreSQL
INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	LEFT('<invalid_host_name>', 2),
	RIGHT('<invalid_host_name>', 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	NOW(),
	NOW()
	);
For SQL Server
INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	LEFT('<invalid_host_name>', 2),
	RIGHT('<invalid_host_name>', 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	GETDATE(),
	GETDATE()
	);
In online mode, MX record lookups are resolved by the DNS and flag any valid domain (such as gail.com) as valid. Consequently, the Email enricher does not apply user-created suggestions to replace such domains. Suggestions are only applied to syntactically invalid domains in offline mode or non-existing domains in online mode.

Cache refresh

The Email enricher refreshes the local cache records after three months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.

If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to set the date field to NULL so that the email enricher does not refresh the cache for that domain.

It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the three-month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.