Semarchy Email Enricher

The Semarchy Email Enricher standardizes and improves the quality of email addresses.

Plug-in ID

Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email

Description

This enricher takes an Input Email Address and splits this address into the local-part (user name) and the domain name. Both these parts are checked syntactically and syntax errors are fixed automatically. The domain name validity is also checked using MX records lookup. The plug-in uses a Domain Name Cache for faster checks and automated fixes on domain names.

This plug-in is thread-safe and supports parallel execution.

Domain Name Cache

The plug-in uses several mechanisms for faster checks and automated fixes on domain names:

  • Domain names already checked as valid (MX record lookup) are persisted in a domain name cache stored in a Datasource. This avoids repeating MX lookup.

  • A list of known domains (e.g.: hotmail.com, gmail.com, etc.) is automatically seeded in the host name validation cache.

  • Common domain mistakes are fixed using a seeded replace list. For example gmai.com is automatically fixed to gmail.com using the cache.

  • Invalid domains are automatically fixed to similar valid domains already present in the cache. For example, semarcyh.com is fixed to semarchy.com as semarchy.com was previously checked as a valid domain name.

See domain name cache for more information about the domain name cache.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter Name Mandatory Type Description

Datasource

No

String

Name of the Datasource used to store the host name validation cache. This datasource must be configured in the platform.
If no datasource is specified then the data location’s datasource is used.

Lowercase User Name

No

String

Set to `1' to transform the local-part (username) to lowercase in the cleansed email address.

Offline Mode

No

String

Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup.

Processing Mode

No

String

Processing mode: DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the host name validation cache in memory.

Plug-in Inputs

The following table lists the plug-in inputs.

Input Name Mandatory Type Description

Input Email Address

Yes

String

Input email address to cleanse.

Plug-in Outputs

The following table lists the plug-in outputs.

Output Name Type Description

Cleansed Email Address

String

Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs.

Valid Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the cleansed email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address.

Valid Email Syntax

String

Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not.

Valid Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address.

Valid Input Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the input email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid of invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Input Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address.

Valid Input Email Syntax

String

Flag (0 or 1) indicating whether the input email address is syntactically valid or not.

Valid Input Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address.