Semarchy Email enricher

The Semarchy Email enricher standardizes and improves the quality of email addresses.

Plug-in ID

Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email

Description

This enricher processes an input email address and splits it into a local-part (username) and a domain name. Any syntax errors in both parts are automatically corrected. Additionally, the domain name validity is verified using MX records lookup. To expedite checks and automated fixes on domain names, the plugin utilizes a domain name cache.

This plug-in is thread-safe and supports parallel execution.

Domain name cache

The plug-in uses several mechanisms for faster checks and automated fixes on domain names:

  • Domain names already checked as valid (based on MX records lookup) are persisted in a domain name cache stored in a datasource. This avoids repeating MX lookups.

  • A list of known domains (e.g., hotmail.com, gmail.com, etc.) is automatically seeded in the hostname validation cache.

  • Common domain mistakes are fixed using a seeded replace list. For example gmai.com is automatically fixed to gmail.com using the cache.

  • Invalid domains are automatically fixed to similar valid domains already present in the cache. For example, semarcyh.com is fixed to semarchy.com as semarchy.com was previously checked as a valid domain name.

For more information about the domain name cache, see Domain name cache.

Plug-in parameters

The following table lists the plug-in parameters.

Parameter name Mandatory Type Description

Datasource

No

String

Name of the datasource used to store the hostname validation cache. This datasource must be configured in the platform.
If no datasource is specified then the data location’s datasource is used.

Lowercase User Name

No

String

Set to 1 to transform the local part (username) to lowercase in the cleansed email address.

Offline Mode

No

String

Set to 1 to query only the local domain cache. The plug-in does not perform the MX records lookup.

Processing Mode

No

String

DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the hostname validation cache in memory.

Plug-in inputs

The following table lists the plug-in inputs.

Input name Mandatory Type Description

Input Email Address

Yes

String

Input email address to cleanse.

Plug-in outputs

The following table lists the plug-in outputs.

Output name Type Description

Cleansed Email Address

String

Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs.

Valid Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the cleansed email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns null if the domain name does not exist in the cache and the MX lookup was not issued.

Valid Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address.

Valid Email Syntax

String

Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not.

Valid Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address.

Valid Input Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the input email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns null if the domain name does not exist in the cache and the MX lookup was not issued.

Valid Input Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address.

Valid Input Email Syntax

String

Flag (0 or 1) indicating whether the input email address is syntactically valid or not.

Valid Input Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address.