Table of Contents

Welcome to Semarchy xDM.
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.

Preface

Overview

This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
Using this guide, you will learn how to use these plug-ins in your MDM projects.

Audience

This document is intended for integration architects and developers setting up an MDM hub as part of their enterprise integration architecture.

To discover Semarchy xDM, you can watch our tutorials.
The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

ConventionMeaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: https://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see https://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Introduction to Semarchy xDM

Semarchy xDM is the Intelligent Data Hub platform for Master Data Management (MDM), Reference Data Management (RDM), Application Data Management (ADM), Data Quality, and Data Governance.
It provides all the features for data quality, data validation, data matching, de-duplication, data authoring, workflows, and more.

Semarchy xDM brings extreme agility for defining and implementing data management applications and releasing them to production. The platform can be used as the target deployment point for all the data in the enterprise or in conjunction with existing data hubs to contribute to data transparency and quality.
Its powerful and intuitive environment covers all use cases for setting up a successful data governance strategy.

Semarchy xDM Plug-ins

Semarchy xDM implements plug-ins that use external services or information systems to contribute to the master data processing and enrichment.

Plug-ins are used in Semarchy xDM in:

  • Enrichers: By adding new enrichers, you can perform record-level enrichment to update, augment or standardize existing attribute values, or create content in new attributes. For example, you can connect to an external web service to retrieve stock ticker symbols from company names.
  • Validations: By adding new validations, you can perform record-level checks, that is check the value of attributes in a record against complex rules. For example, you can connect to an external provider to check whether a billing or shipping address is valid or not.

INFO: Using Plug-ins is explained in the Semarchy xDM Developer’s Guide, in the Certification Process Design chapter. Installing plug-ins to your Semarchy xDM instance is explained in the Semarchy xDM Administration Guide, in the Configuring the Platform chapter.

The plug-ins are designed using the Open Plug-In Architecture. Plug-in design is covered in the Semarchy xDM Plug-in Development Guide.

Text Normalization and Transliteration

This plug-in applies normalization, transliteration and phonetic transformations to text strings.

Semarchy Text Enricher

Plug-in ID

Semarchy Text Enricher - com.semarchy.engine.plugins.convergence.text

Description

This enricher applies normalization, transliteration and phonetic transformations to text strings. It takes an Input Text and applies an Input Filter to this text, for example to remove all characters but letters. Then it applies a series of transformations defined in the Transformation parameter and returns a Transformed Text.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Input Filter

No

String

Filter applied to the input text before the transformation. Valid values for the Filter are: NONE, which applies no filter, LETTERS, which removes all non-letter characters from the input string and STANDARD, which tokenizes the input text by splitting words.

Transformation

Yes

String

A pipe-separated sequence of transformation definitions. Transformations include:

  • NORMALIZE
  • TRANSLITERATE [<Id>]
  • PHONETIC <Type> [<MaxCodeLengh>]
  • BEIDERMORSE [Split] [RuleType] [MaxPhonemes] [NameType]
  • DOUBLEMETAPHONE [<max_code_length>] [split].

See the Transformations section for a detailed description of each transformation.

Synonyms Separator

No

String

Separator used between the synonyms returned by the enricher. Default value is a pipe (|).

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Text

Yes

String

Text to transform.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Transformed Text

String

Filtered and transformed text.

Secondary Transformed Text

String

Secondary transformed text. This text may contain transformation resulting from a Beidermorse or Double Metaphone transformation. See Other Transformations for more information.

Input Filters

The following input filters are supported by the enricher:

  • NONE: No filter is applied to the input text.
  • LETTERS: This transformation removes all non-letter characters from the input string.
  • STANDARD: Breaks words in the input text according to the rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

Transformations

The following transformations definitions are supported by the enricher:

  • Normalization
  • Phonetic Transformation
    • PHONETIC [SOUNDEX | REFINEDSOUNDEX | METAPHONE [<max_code_length>] | DOUBLEMETAPHONE [<max_code_length>] | CAVERPHONE | CAVERPHONE1 | NYSIIS | MRA | COLOGNE | BEIDERMORSE ]: applies Phonetic Transformations
  • Other Transformations
    • BEIDERMORSE [Split] [RuleType] [MaxPhonems] [NameType]
    • DOUBLEMETAPHONE [<max_code_length>] [split]
  • Transliteration
    • TRANSLITERATE [<ID>] apply a Transliteration transformation to the string. The transliteration is identified by an ID. If not ID is provided, the Any-Latin transliteration is used.

It is possible to sequence transformations. Successive transformations are separated by a pipe | sign.
Examples of transformations:

  • Normalize and apply Phonetic Soundex: NORMALIZE | SOUNDEX
  • Normalize and then transliterate to Latin script: NORMALIZE | TRANSLITERATE Any-Latin
  • Normalize, transliterate to Latin script and then apply Metaphone with a maximum resulting length of 5 characters: NORMALIZE | TRANSLITERATE Any-Latin | PHONETIC METAPHONE 5
  • Perform a BEIDERMORSE transformation for family names with an approximate transformation on generic name types: BEIDERMORSE APPROX 10 FALSE GENERIC
Normalization

The NORMALIZE transformation normalizes the string by applying a series of transformations, which map similar characters to a common target, to ignore certain distinctions between similar characters. This includes accent removal, case folding, etc.

Example of transformations:

Original TextNormalized TextComments

‒ – — ―

- - - -

4 different dashes converted to 4 similar dashes.

AbSoLuteLy TRUE

absolutely true

CaseFolding

…​

...

convert [dotdotdot] to [dot dot dot]

½ Tsp

1/2 tsp

Symbol folding

Æsop

aesop

Äsop

asop

Dürst

durst

Encyclopædia

encyclopaedia

œuvre

oeuvre

poſt

post

résumé français

resume francais

Accent removal and case folding

Straße

strasse

٣ is a magic number

3 is a magic number

Native Digital folding

The complete list of transformations is given below:

Accent removal

Hebrew Alternates folding

Overline folding

Suzhou Numeral folding

Case folding

Jamo folding

Positional forms folding

Symbol folding

Canonical duplicates folding

Letterforms folding

Small forms folding

Underline folding

Dashes folding

Math symbol folding

Space folding

Vertical forms folding

Diacritic removal (including stroke, hook, descender)

Multigraph Expansions: All

Spacing Accents folding

Width folding

Greek letterforms folding

Native digit folding

Subscript folding

Han Radical folding

For more information about these transformations see the UTR#30 Characters Foldings transformation.

Phonetic Transformations

A phonetic transformation applied to the string transforms it to a string corresponding to its pronunciation. The default phonetic transformation is PHONETIC METAPHONE.

Phonetic transformations include:

  • PHONETIC SOUNDEX and PHONETIC REFINEDSOUNDEX: Phonetic algorithms for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. More information about Soundex
  • PHONETIC METAPHONE and PHONETIC DOUBLEMETAPHONE are algorithms for indexing words by their English pronunciation. They are suitable for use with most English words, not just names. Double Metaphone can return both a primary and a secondary code for an input string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. These algorithms support a Max Code Length parameter which defines the maximum length of the encoded result. This value default to 4. More Details about Metaphone.
  • PHONETIC CAVERPHONE and PHONETIC CAVERPHONE1. Algorithm for data matching for electoral rolls, optimized for accents present in parts of New Zealand. More Details about Caverphone and Caverphone 1
  • PHONETIC NYSIIS. New York State Identification and Intelligence System (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. More Details about NYSIIS
  • PHONETIC MRA: Match Rating Approach developed by Western Airlines - this algorithm has an encoding and range comparison technique. More Details about MRA
  • PHONETIC COLOGNE Phonetic algorithm optimized for the German language. See Kölner Phonetik
  • PHONETIC BEIDERMORSE is a phonetic algorithm supporting greater accuracy in matching Slavic and Yiddish surnames with similar pronunciation but differences in spelling. It returns a list of tokens (separated by the string specified in the Synonyms Separator parameter.): first the transformed input text, then the transformed synonyms of the input text. More information about Beidermorse.
Other Transformations

These other transformations return a list of tokens which can be split into the Transformed Text and Secondary Transformed Text outputs.

These transformations should be preferably used at the end of the transformation sequence, as their secondary transformed text is not processed in subsequent transformations in the sequence.

Other transformations include:

  • BEIDERMORSE [<split>] [<rule_type>] [<max_phonems>] [<name_type>] The Beidermorse transformation returns a list of tokens: first the transformed input text, then the transformed synonyms of the input text. Beidermorse supports the following parameters:
    • split. If this parameter is set to true all synonyms after the first one are concatenated in the Secondary Transformed Text output. If this parameter is set to false (default value) all synonyms are appended to the first token in the Transformed Text output.
    • rule_type is EXACT for exact or APPROX for approximate phonetic transformation.
    • max_phonems is the maximum number of synonyms returned. Default is 20.
    • name_type default value is GENERIC. Use ASHKENAZI or SEPHARDIC if you specifically want phonetic encodings optimized for Ashkenazi or Sephardic Jewish family names.
  • DOUBLEMETAPHONE [<max_code_length>] [<split>]. This transformation encodes the input string with the Double Metaphone algorithm and returns a primary code and a secondary code. If split is set to true, then the secondary code is pushed to the Secondary Transformed Text output. Otherwise, it is concatenated to the primary code in the Transformed Text output.
Transliteration

The TRANSLITERATE transformation transforms a text from one character script to another. For example, Traditional to Simplified Chinese, Japanese Hiragana to Katakana, Cyrillic to Latin script.
Each source/target transliteration is identified by an ID. The list of supported transliteration IDs is provided in the list below. If no ID is provided, the Any-Latin transliteration is used.

Each ID represents a transliteration from one script/language to another. For example: Katakana-Latin, Latin-thai, etc. The special tag any stands for any script/language. For example, Any-Latin converts any input script to Latin script.

Accents-Any

Any-Name

Devanagari-Bengali

Han-Latin

Latin-Greek

Pinyin-NumericPinyin

Amharic-Latin/BGN

Any-NFC

Devanagari-Gujarati

Han-Latin/Names

Latin-Greek/UNGEGN

pl_FONIPA-ja

Any-Accents

Any-NFD

Devanagari-Gurmukhi

Hangul-Latin

Latin-Gujarati

pl-ja

Any-am

Any-NFKC

Devanagari-Kannada

Hans-Hant

Latin-Gurmukhi

pl-pl_FONIPA

Any-Arabic

Any-NFKD

Devanagari-Latin

Hant-Hans

Latin-Han

Publishing-Any

Any-Armenian

Any-Null

Devanagari-Malayalam

Hebrew-Latin

Latin-Hangul

ro_FONIPA-ja

Any-Bengali

Any-Oriya

Devanagari-Oriya

Hebrew-Latin/BGN

Latin-Hebrew

ro-ja

Any-Bopomofo

Any-pl_FONIPA

Devanagari-Tamil

Hex-Any

Latin-Hiragana

ro-ro_FONIPA

Any-CaseFold

Any-Publishing

Devanagari-Telugu

Hex-Any/C

Latin-Jamo

ru-ja

Any-cs_FONIPA

Any-Remove

Digit-Tone

Hex-Any/Java

Latin-Kannada

ru-zh

Any-Cyrillic

Any-ro_FONIPA

es_419-ja

Hex-Any/Perl

Latin-Katakana

Russian-Latin/BGN

Any-Devanagari

Any-ru

es_419-zh

Hex-Any/Unicode

Latin-Malayalam

Serbian-Latin/BGN

Any-es_419_FONIPA

Any-sk_FONIPA

es_FONIPA-am

Hex-Any/XML

Latin-NumericPinyin

Simplified-Traditional

Any-es_FONIPA

Any-Syriac

es_FONIPA-es_419_FONIPA

Hex-Any/XML10

Latin-Oriya

sk_FONIPA-ja

Any-FCC

Any-Tamil

es_FONIPA-ja

Hiragana-Katakana

Latin-Syriac

sk-ja

Any-FCD

Any-Telugu

es_FONIPA-zh

Hiragana-Latin

Latin-Tamil

sk-sk_FONIPA

Any-Georgian

Any-Thaana

es-am

IPA-XSampa

Latin-Telugu

Syriac-Latin

Any-Greek

Any-Thai

es-es_FONIPA

it-am

Latin-Thaana

Tamil-Bengali

Any-Greek/UNGEGN

Any-Title

es-ja

it-ja

Latin-Thai

Tamil-Devanagari

Any-Gujarati

Any-Upper

es-zh

ja_Latn-ko

Macedonian-Latin/BGN

Tamil-Gujarati

Any-Gurmukhi

Any-zh

Fullwidth-Halfwidth

ja_Latn-ru

Malayalam-Bengali

Tamil-Gurmukhi

Any-Han

Arabic-Latin

Georgian-Latin

Jamo-Latin

Malayalam-Devanagari

Tamil-Kannada

Any-Hangul

Arabic-Latin/BGN

Georgian-Latin/BGN

JapaneseKana-Latin/BGN

Malayalam-Gujarati

Tamil-Latin

Any-Hans

Armenian-Latin

Greek-Latin

Kannada-Bengali

Malayalam-Gurmukhi

Tamil-Malayalam

Any-Hant

Armenian-Latin/BGN

Greek-Latin/BGN

Kannada-Devanagari

Malayalam-Kannada

Tamil-Oriya

Any-Hebrew

ASCII-Latin

Greek-Latin/UNGEGN

Kannada-Gujarati

Malayalam-Latin

Tamil-Telugu

Any-Hex

Azerbaijani-Latin/BGN

Gujarati-Bengali

Kannada-Gurmukhi

Malayalam-Oriya

Telugu-Bengali

Any-Hex/C

Belarusian-Latin/BGN

Gujarati-Devanagari

Kannada-Latin

Malayalam-Tamil

Telugu-Devanagari

Any-Hex/Java

Bengali-Devanagari

Gujarati-Gurmukhi

Kannada-Malayalam

Malayalam-Telugu

Telugu-Gujarati

Any-Hex/Perl

Bengali-Gujarati

Gujarati-Kannada

Kannada-Oriya

Maldivian-Latin/BGN

Telugu-Gurmukhi

Any-Hex/Plain

Bengali-Gurmukhi

Gujarati-Latin

Kannada-Tamil

Mongolian-Latin/BGN

Telugu-Kannada

Any-Hex/Unicode

Bengali-Kannada

Gujarati-Malayalam

Kannada-Telugu

Name-Any

Telugu-Latin

Any-Hex/XML

Bengali-Latin

Gujarati-Oriya

Katakana-Hiragana

NumericPinyin-Latin

Telugu-Malayalam

Any-Hex/XML10

Bengali-Malayalam

Gujarati-Tamil

Katakana-Latin

NumericPinyin-Pinyin

Telugu-Oriya

Any-Hiragana

Bengali-Oriya

Gujarati-Telugu

Kazakh-Latin/BGN

Oriya-Bengali

Telugu-Tamil

Any-ja

Bengali-Tamil

Gurmukhi-Bengali

Kirghiz-Latin/BGN

Oriya-Devanagari

Thaana-Latin

Any-Kannada

Bengali-Telugu

Gurmukhi-Devanagari

Korean-Latin/BGN

Oriya-Gujarati

Thai-Latin

Any-Katakana

Bopomofo-Latin

Gurmukhi-Gujarati

Latin-Arabic

Oriya-Gurmukhi

Tone-Digit

Any-ko

Bulgarian-Latin/BGN

Gurmukhi-Kannada

Latin-Armenian

Oriya-Kannada

Traditional-Simplified

Any-Latin (default)

cs_FONIPA-ja

Gurmukhi-Latin

Latin-ASCII

Oriya-Latin

Turkmen-Latin/BGN

Any-Latin/BGN

cs_FONIPA-ko

Gurmukhi-Malayalam

Latin-Bengali

Oriya-Malayalam

Ukrainian-Latin/BGN

Any-Latin/Names

cs-cs_FONIPA

Gurmukhi-Oriya

Latin-Bopomofo

Oriya-Tamil

Uzbek-Latin/BGN

Any-Latin/UNGEGN

cs-ja

Gurmukhi-Tamil

Latin-Cyrillic

Oriya-Telugu

XSampa-IPA

Any-Lower

cs-ko

Gurmukhi-Telugu

Latin-Devanagari

Pashto-Latin/BGN

zh_Latn_PINYIN-ru

Any-Malayalam

Cyrillic-Latin

Halfwidth-Fullwidth

Latin-Georgian

Persian-Latin/BGN

Lookup

This plug-in performs a data lookup on a mapping table.

Semarchy Lookup Enricher

Plug-in ID

Semarchy Lookup Enricher - com.semarchy.engine.plugins.convergence.text

Description

This enricher performs a data lookup on a mapping table accessed via a JDBC datasource.

The mapping table is located in a datasource provided using the Datasource parameter, which defaults to the data location’s datasource. The mapping table is declared to the enricher:

  • By giving a Mapping Table as well as a Lookup Column and a list of (up to 20) Output Columns from this table. The input lookup value is searched in the Lookup Column and the corresponding values from the Output Columns are returned.
  • By giving a Custom SQL select statement executed on the datasource, which must return columns aliased LOOKUP_COLUMN and OUTPUT_COLUMN1, …​, OUTPUT_COLUMN20. These columns will be used as the lookup and output columns.
You must either set Mapping Table, Lookup Column and Output Columns, or only set Custom SQL. The Mapping Table, Lookup Column, and Output Columns parameters are mandatory unless the Custom SQL parameter is set instead.

The lookup is performed on the mapping table with an optional memory cache configured with the Cache Lookup Data parameter.

When a null value is passed as the Lookup Value or when the lookup finds no matching value in lookup column, the enricher returns the Fallback Value or the Lookup Value, depending on the Fallback Behavior parameter.

The lookup value expected and output values emitted by this plug-in are string values. Any other datatype passed as the input should be converted to a string using SemQL, and outputs should be mapped to string attributes. Output values mapped to non-string output attributes rely on the database implicit conversion, which may give unexpected results.
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Cache Lookup Data

No

String

Use this parameter to optionally use a memory cache for the lookup process. Possible values are:

  • NO_CACHE: Do not use a cache, the mapping table is queried for each lookup.
  • LOAD_ON_START (Default): Cache all lookup data in memory at initialization. All lookups are made using the memory cache.
  • LOAD_ON_DEMAND : Cache data after it is looked for. Lookups are first attempted on the memory cache, then on the mapping table if the lookup value is not present in the cache.
Use the cache only to process batches of records. Do not use it when processing one record at a time. For example, it is recommended to set this parameter to NO_CACHE for enrichers running in steppers. If you configure the cache in such situation, it would load everytime the stepper triggers the enricher, causing bad performances.

Custom SQL

No

String

Leave this parameter empty to use a generated SQL query. Use this parameter instead of Mapping Table, Lookup Column and Output Columns to define the lookup dataset with a select statement in the following form:

select
    <lookup_column> LOOKUP_COLUMN,
    <output_column> OUTPUT_COLUMN1,
    <output_column> OUTPUT_COLUMN2,
    <output_column> OUTPUT_COLUMN3,
	...
from <mapping_table>
where ...

The number of OUTPUT_COLUMN<N> is limited to 20 (from `OUTPUT_COLUMN1 to OUTPUT_COLUMN20)

This query must return a dataset with n+1 columns aliased LOOKUP_COLUMN and OUTPUT_COLUMN1 to OUTPUT_COLUMNn. These columns are used instead of the Lookup Column and Output Columns.

Datasource

No

String

JNDI name of datasource containing the lookup data. If this parameter is not defined, the enricher uses the data location datasource.

This parameter should contain the full path of the datasource, for example: java:comp/env/jdbc/SEMARCHY_STAGING.

Fallback Behavior

No

String

Behavior when the lookup value is not found in the lookup column. Possible values are:

  • USE_FALLBACK (default): returns the fallback value or null if the fallback value is not specified
  • USE_LOOKUP_VALUE: returns the lookup value.
When multiple output columns are specified, the same value - the fallback or lookup value - is sent to all these columns.

Fallback Value

No

String

Value to return if the lookup value is not found in the lookup column. Default value: NULL.

Lookup Column

No

String

Physical name of the column containing the lookup values. Default value: NONE.

Mapping Table

No

String

Physical name of the mapping table containing the lookup and output columns. Default value: NONE.

Output Columns

No

String

Comma-separated list of the physical names of the columns containing the values returned by the enricher. Default value: NONE.

The (singular) Output Column parameter available in previous versions of this plug-in is deprecated and replaced by this parameter.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Lookup Value

Yes

String

Value to look for in the mapping table’s lookup column.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Output Value<N>

String

Nth Value returned by the lookup.

Translation

Google Translate Enricher

Plug-in ID

Google Translate Enricher - com.semarchy.engine.plugins.convergence.translate.v2

Description

This enricher translates an Input Text from a Source Language to a Target Language using the Google Translate service. The source language is automatically detected if unspecified. This enricher requires a valid Google Key.

This plug-in must be used in compliance with the Google Translate APIs Terms of Service.
This enricher uses the Google Translate Service, which must be accessible from the Semarchy xDM Application at the following URL: https://www.googleapis.com/language/translate/v2?<parameters>;. Make sure to make this URL accessible through your firewalls.
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Application Name

Yes

String

Name of the client application accessing the Google Translate service. Application names should preferably have the format <company-id>_<app-name>_<app-version>. The name will be used by the Google servers to monitor the source of authentication.

Google Key

Yes

String

Google API Key. It is a unique key that you generate using the Google API Console.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Text

Yes

String

Text to translate.

Source Language

No

String

Language of the input text. If it is unspecified, it is detected from the input text.

Target Language

Yes

String

Target language for the translation.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Translated Text

String

Translated Text.

Name Processing

Semarchy Person Name Enricher

Plug-in ID

Semarchy Person Name Enricher - com.semarchy.engine.plugins.convergence.personname.PersonNameEnricher

Description

This enricher extracts from a person’s full name his/her Given Name, Surname and Gender. It parses the Input Name and identifies a Given Name and Surname (with a Name Parsing Score confidence percentage). Then the given name is searched in a database of names for the source country code provided in the input. It a given name is matched, a Gender and a Most Frequent Gender (if the given name is unisex) are returned.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Surname Position

Yes

String

Position of the Surname. This parameter is used for parsing the input name to detect the first and last names, and for generating the Full Name output. Possible values (SURNAME_LAST ,SURNAME_FIRST )

Case Transformation

Yes

String

Case transformation for the name. Possible values: NONE, UPPER_CASE, LOWER_CASE and CAMEL_CASE.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Name

Yes

String

Person full name to enrich.

Source Country Code

Yes

String

Code of the country of origin for the name. This code indicates the database of names to search to determine a gender for the given name. Built-in databases include fr for France, us for the USA and ru for Russia.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Full Name

String

The reconstructed full name, with the surname positioned according to the Surname Position parameter.

Gender

String

The gender of the Matched Given Name. One of MALE, FEMALE, UNISEX, UNKNOWN.

Gender Score

String

Confidence with which for Most Frequent Gender can be used [0-100].

Given Name

String

The part identified as Given Name in the input name.

Matched Given Name

String

Given name matched in the given name database.

Most Frequent Gender

String

The more frequent gender of the Matched Given Name for the given country. One of MALE, FEMALE, UNKNOWN.

Names Parsing Score

String

Names Parsing confidence [0-100]

Surname

String

The part identified as Surname in the input name.

Surname Position

String

Position at which the surname was detected.

International Phone Numbers Plug-In

The International Phone Numbers Plug-In for Semarchy xDM provides two features:

  • An enricher to standardize and improve phone numbers formatting.
  • A validator to check the validity of phone numbers.

Semarchy Phone Enricher

Plug-in ID

Semarchy Phone Enricher - com.semarchy.engine.plugins.convergence.phone

Description

This enricher takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Region Code. It returns a standardized Enriched Phone Number in the Enriched Phone Format. Geocoding Data is also returned and includes (depending on the country) the country, the region/state and the city name.

If a phone number is not valid, the enricher returns the original phone value in the Enriched Phone Number, a Status Code as well as a Status Text describing the issue with the input phone number.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

This plug-in does not use any parameter.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Phone Number

Yes

String

Input Phone Number.

Region Code

No

String

Two letters region code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code.

Enriched Phone Format

No

String

Format of the Enriched Phone Number. Possible values are INTERNATIONAL (default), NATIONAL, E123_INTERNATIONAL, E123_NATIONAL , E164 and RFC3966. See Phone Formats for more information.

Region of Origin

No

String

Formats the phone output for international dialing from the country or region provided in this input. E.g.: US, FR, GB, DE. Use ZZ for unknown region. See this link for the list of codes.

Phone Formats

The following standards are supported to format the enriched phone number:

Phone Format Examples:

  • E123_NATIONAL (E.123 - National Notation): (042) 123 4594
  • E123_INTERNATIONAL (E.123 - International Notation): +31 42 123 4567
  • NATIONAL (E.123 - National Notation with hyphens): (042) 123-4594
  • INTERNATIONAL (E.123 - International Notation with hyphens): +31 42-123-4567
  • E.164 (E.164 - International Notation): +31421234567 (equivalent to E.123 with no formatting)
  • RFC3966 (RFC3966 - International Notation): +31-42-123-4567 (equivalent to E.123 with hyphens instead of spaces)

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Enriched Phone Number

String

Phone number returned by the enricher in the format specified in the Enriched Phone Format input. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues.

Geocoding Data

String

Geocoding data computed for a given number and country. Depending on the country and phone number, this value includes the country, region/state and city information. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues.

Status Code

String

Return code for the phone number processing. More details about the Status Codes.

Status Text

String

Text explaining the status code.

International Phone Prefix

String

International Phone Prefix for worldwide dialing.

National Number

String

National number part of a phone number in International format. It is often the International number without the Country Prefix.

Extension

String

Extension part of the phone number.

Country Code Source

String

Explains how the Country Code was retrieved. Possible values are FROM_NUMBER_WITH_PLUS_SIGN, FROM_NUMBER_WITH_IDD, FROM_NUMBER_WITHOUT_PLUS_SIGN and FROM_DEFAULT_COUNTRY.

Leading Zero

String

Returns 0 or 1 to specify if leading zero is mandatory for foreign calls.

Possible Phone Number

String

Returns 0 or 1 to indicate whether a phone number is a possible number, and the region where the number could be dialed from.

Possible Phone Number Reason

String

Detailed explanation of why a phone number is a possible number or not. Possible values are INVALID_COUNTRY_CODE, IS_POSSIBLE, TOO_LONG and TOO_SHORT.

Valid Phone Number

String

Returns 0 or 1 to indicate whether a phone number matches a valid pattern.

Valid Phone Number For Region

String

Returns 0 or 1 to indicate that a phone number is valid for the specified Region Code.

Phone Line Type

String

Provides the line type of a phone number. Possible values are : FIXED_LINE, FIXED_LINE_OR_MOBILE, MOBILE, PAGER, PERSONAL_NUMBER, PREMIUM_RATE, SHARED_COST, TOLL_FREE, UAN, UNKNOWN and VOIP

Region Code

String

Returns the region code for the Phone Number. See this link for the list of codes.

International Phone Number

String

Phone number formatted for international dialing.

Time Zones

String

List of corresponding time zones for a given number. For example: Europe/Paris. If the timezone is unknown, returns Etc/Unknown

First Time Zone

String

First time zone from the list of corresponding time zones for a given number.

Carrier Name

String

Name of the carrier for the phone number.

Status Codes

The following status codes are returned by the enricher:

  • 0 - OK: Optimal execution. No error detected.
  • 1 - INPUT_WAS_NULL: Input phone number was not set.
  • 2 - PARSING FAILED: The string supplied did not seem to be a phone number. Review the Status text for more information.

Semarchy Phone Extractor

Plug-in ID

Semarchy Phone Extractor - com.semarchy.engine.plugins.convergence.phone.extractor

Description

This enricher extracts a list of phone numbers from an Input Text and returns them as a Phone List, in a given Extraction Format.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Matching Leniency

No

String

Defines the phone number extraction leniency. Possible values are POSSIBLE (default), VALID_FOR_REGION (according to the Accepted Region) and VALID.

Extraction Format

No

String

Format of the extracted phone numbers. Possible values are RAW (default), INTERNATIONAL , NATIONAL , E164 and RFC3966 .

List Separator

No

String

Define the separator character used in the extracted phones list.

Maximum Invalid Numbers

No

String

Maximum number of invalid numbers allowed before stopping to process the text. This is to cover cases where the text contains a lot of false positives.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Text

Yes

String

Input text to search for phone numbers.

Accepted Region

No

String

Defines the region used when Matching Leniency is set to VALID_FOR_REGION.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Extracted Phone List

String

List of phone numbers extracted.

Phone 1 to Phone 5

String

First, second… extracted phone number in the list.

Semarchy Phone Validator

Plug-in ID

Semarchy Phone Validator - com.semarchy.engine.plugins.convergence.phone

Description

This validator takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Country Code. The validator checks whether this phone number is a valid international or national phone number.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Validation Leniency

No

String

Precise validation leniency for possible phone numbers. Value may be VALID (default), POSSIBLE or VALID_FOR_REGION.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Phone Number

Yes

String

Input Phone Number.

Country Code

No

String

Two letters country code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code.

Email Plug-In

The Email Plug-In for Semarchy xDM provides an enricher to improve the quality of email addresses and a validator to check email validity.

Semarchy Email Enricher

Plug-in ID

Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email

Description

This enricher takes an Input Email Address and splits this address into the local-part (user name) and the domain name. Both these parts are checked syntactically and syntax errors are fixed automatically. The domain name validity is also checked using MX records lookup. The plug-in uses a Domain Name Cache for faster checks and automated fixes on domain names.

This plug-in is thread-safe and supports parallel execution.

Domain Name Cache

The plug-in uses several mechanisms for faster checks and automated fixes on domain names:

  • Domain names already checked as valid (MX record lookup) are persisted in a domain name cache stored in a JDBC Datasource. This avoids repeating MX lookup.
  • A list of known domains (e.g.: hotmail.com, gmail.com, etc.) is automatically seeded in the host name validation cache.
  • Common domain mistakes are fixed using a seeded replace list. For example gmai.com is automatically fixed to gmail.com using the cache.
  • Invalid domains are automatically fixed to similar valid domains already present in the cache. For example, semarcyh.com is fixed to semarchy.com as semarchy.com was previously checked as a valid domain name.

See Appendix A: Semarchy Email Enricher Domain Name Cache for more information about the domain name cache.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Datasource

No

String

Full name of the JDBC Datasource used to store the host name validation cache.
If no datasource is specified then the data location’s datasource is used. For example: java:comp/env/jdbc/email_cache.

Lowercase User Name

No

String

Set to `1' to transform the local-part (username) to lowercase in the cleansed email address.

Offline Mode

No

String

Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup.

Processing Mode

No

String

Processing mode: DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the host name validation cache in memory.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Email Address

Yes

String

Input email address to cleanse.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Cleansed Email Address

String

Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs.

Valid Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the cleansed email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid or invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address.

Valid Email Syntax

String

Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not.

Valid Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address.

Valid Input Domain

String

Flag (0 or 1) indicating whether the domain name is valid or not (based on syntax and MX records lookup) in the input email address. In Offline mode, this parameter returns 1 or 0 if the domain name appears in the local domain cache as valid of invalid. It returns null if the domain name does not exist in the cache and the MX Lookup was not issued.

Valid Input Domain Syntax

String

Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address.

Valid Input Email Syntax

String

Flag (0 or 1) indicating whether the input email address is syntactically valid or not.

Valid Input Username Syntax

String

Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address.

Semarchy Email Validator

Plug-in ID

Semarchy Email Validator - com.semarchy.engine.plugins.convergence.email

Description

This enricher takes an Input Email Address and checks its syntactic validity. The domain name validity is optionally also checked using MX records lookup.

The plug-in uses the same mechanisms as the Semarchy Email Enricher for checking the email validity, except that it does not modify the incoming email.

This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Accepted Domains

No

String

Value tolerated for the email domain. Possible values:

  • ALL_DOMAINS accepts all syntactically valid domains.
  • VALID_DOMAINS accepts only domain that are known to be valid (found in the locale cache as being valid or for which the MX lookup was successful).
  • VALID_AND_UNKNOWN is used in Offline Mode to accept/reject records based on their status (valid/invalid) found in the local cache. Unknown domains (not found in the local cache) are accepted.
    Syntax checking is always done and an email with an invalid syntax will always be rejected.

Offline Mode

No

String

Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup.

Processing Mode

No

String

Processing mode: DATABASE (default) or MEMORY. Memory mode is faster but requires more memory as it caches entirely the host name validation cache in memory.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Input Email Address

Yes

String

Input email address to check.

Melissa Plug-ins

The Melissa Plug-in for Semarchy xDM provides enrichers to fix and complete contact data for US/Canada using the Personator service, and to validate international addresses in 240 countries using the Global Address Verification service.

Melissa Global Address Enricher

Plug-in ID

Melissa Global Address Enricher - com.semarchy.engine.plugins.melissa.GlobalAddressVerificationEnricher

Description

The Melissa Global Address Enricher validates international addresses in 240 countries using the Global Address Verification service.

This plug-in requires a valid license string to access the Melissa service. Contact Melissa for the license.
For more details about the service, the parameters, inputs and outputs, refer to the Melissa Global Address Documentation
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

License String

Yes

String

Your license string. This must be valid for you to access the Melissa Service.

Delivery Lines

No

Boolean

The options allows you to specify if the Address Lines 1-8 should contain just the delivery address or the entire address

Line Separator

No

String

Possible values: SemiColon, Pipe, CR, LF, CRLF, Tab, BR. This is the line separator used for the FormattedAddress result.

Output Script

No

String

Possible values: NoChange, Latn, Native. This is the script type used for all applicable fields.

Country Of Origin

No

String

Must contain a valid ISO-3166-1Alpha-2, ISO-3166-1 Alpha-3, or ISO-3166-1 Numeric code. This is used to determine whether or not to include the country name as the last line in FormattedAddress

SSL Connection

No

Boolean

Default is true. Set to false if you don’t wish to use a secure connection.

Failure Error Codes

No

String

Comma-separated list of codes (AE01, AE02) or code families (AE). When this result code is returned by the API, the enrichment is failed.

Requests Limit

No

Number

When set, this numeric value limits the number of requests made to the Melissa API and the number of enriched records. Records after this limit are not enriched and the plugin returns blank outputs. This parameter is intended for tests purposes only.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

AddressLine1

No

String

The input field for the address line 1. This should contain the delivery address information (house number, street, building, suite, etc.) but should not contain locality information (city, state, postal code, etc.) which have their own inputs.

AddressLine2

No

String

The input field for the address line 2. This can be a continuation of AddressLine1 (ex: suite) or another address.

AddressLine3
…​
AddressLine8

No

String

The input field for the address. This should contain the delivery address information (house number, thoroughfare, building, suite, etc.) but should not contain locality information (locality, administrative area, postal code, etc.) which have their own inputs.

DependentLocality

No

String

The smaller population center data element. This depends on the Locality element.

DoubleDependentLocality

No

String

The smallest population center data element. This depends on the Locality and DependentLocality elements.

Locality

No

String

The most common population center data element.

PostalCode

No

String

The postal code.

SubAdministrativeArea

No

String

The smallest geographic data element.

SubNationalArea

No

String

The administrative region within a country on an arbitrary level below that of the sovereign state.

Country

No

String

The country.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

AddressKey

String

Returns a unique identifier for an address. This key can be used with other current and future Melissa services.

AddressLine1
…​
AddressLine8

String

These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code.

AddressType

String

Returns the Address Type for US and Canada

AdministrativeArea

String

The most common geographic data element.

Building

String

Descriptive name identifying an individual location. This is a string value that is the parsed Building element from the output.

CountryISO3166_1_Alpha2

String

ISO 3166 2-character country code.

CountryISO3166_1_Alpha3

String

ISO 3166 3-character country code.

CountryISO3166_1_Numeric

String

ISO 3166 3-digit numeric country code.

CountryName

String

Returns the country name for the record.

DependentLocality

String

A dependent locality is a logical area unit that is smaller than a locality but larger than a double dependent locality or thoroughfare. It can often be associated with a neighborhood or sector. Great Britain is an example of a country that uses double dependent locality. In the United States, this would correspond to Urbanization, which is used only in Puerto Rico.

DependentThoroughfare

String

Block data element or dependent street. This is used when there are more than one thoroughfares with the same name in one locality. An adjoining thoroughfare is used to uniquely identify the target thoroughfare. This is rarely used.

DependentThoroughfareLeadingType

String

Thoroughfare type at the beginning of the dependent thoroughfare. The leading type is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "St. Hickory E," the dependent thoroughfare leading type would be "St.

DependentThoroughfareName

String

Dependent thoroughfare name parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "E Hickory Ln," the dependent thoroughfare name would be "Hickory.

DependentThoroughfarePostDirection

String

Cardinal directional at the end of the dependent thoroughfare. The postfix directional is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "Hickory Ln N," the dependent thoroughfare post direction would be "N.

DependentThoroughfarePreDirection

String

Cardinal directional at the beginning of the dependent thoroughfare. The prefix directional is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "W Hickory Ln," the dependent thoroughfare pre direction would be "W.

DependentThoroughfareTrailingType

String

Thoroughfare type at the end of the dependent thoroughfare. The trailing type is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "W Hickory Ln," the dependent thoroughfare trailing type would be "Ln.

DoubleDependentLocality

String

A double dependent locality is a logical area unit that is smaller than a dependent locality but bigger than a thoroughfare. This field is very rarely used. Great Britain is an example of a country that uses double dependent locality.

FormattedAddress

String

Mailing address. The full mailing address in the preferred format for the country of the address. This includes the Organization as the first line, one or more lines in the origin country’s format, and the destination country (if required). Separate lines will be delimited by what is specified in the option.

Latitude

String

Returns the geocoded latitude for the address entered in the AddressLine field.

Locality

String

This is the most common geographic area and used by virtually all countries. This is usually the value that is written on a mailing label and referred to by terms like City, Town, Postal Town, etc.

Longitude

String

Returns the geocoded longitude for the address entered in the AddressLine field.

Organization

String

This is a string value that matches the Organization request element. It is not modified or populated by the service.

PostBox

String

Post box information for a particular delivery point.

PostalCode

String

Returns the 9-digit postal code for U.S. addresses and 6-digit postal code for Canadian addresses.

PremisesNumber

String

Alphanumeric indicator within premises field. Parsed from the premises parameter.

PremisesType

String

Leading premise type indicator within premises field. Parsed from the premises parameter.

Results

String

String value containing a comma-separated list of status, error codes, and change codes for the record. Refer the the Melissa documentation for more details.

SubAdministrativeArea

String

The smallest geographic data element.

SubNationalArea

String

A sub-national area is a logical area unit that is larger than an administrative area but smaller than the country itself. It is extremely rarely used.

SubPremises

String

Alphanumeric code identifying an individual location. More specific than premises.

SubPremisesNumber

String

Sub premises number indicator within premises field. Parsed from the subPremises parameter.

SubPremisesType

String

Sub premises type indicator within premises field. Parsed from the subPremises parameter.

Thoroughfare

String

This value is a part of the address lines and contains all the sub-elements of the thoroughfare like trailing type, thoroughfare name, pre direction, post direction, etc.

ThoroughfareLeadingType

String

Leading thoroughfare type indicator parsed from the thoroughfare parameter. A leading type is a thoroughfare type that is placed before the thoroughfare. This value is a part of the Thoroughfare field. For example, the thoroughfare type of "Rue" in Canada and France is placed before the thoroughfare, making it a leading type.

ThoroughfareName

String

Name indicator parsed from the thoroughfare parameter.

ThoroughfarePostDirection

String

Postfix directional parsed from the thoroughfare parameter.

ThoroughfarePreDirection

String

Prefix directional parsed from the thoroughfare parameter.

ThoroughfareTrailingType

String

Trailing thoroughfare type indicator parsed from the thoroughfare parameter. A trailing type is a thoroughfare type that is placed after the thoroughfare. This value is a part of the Thoroughfare field. For example, the thoroughfare type of "Avenue" in the US is placed after the thoroughfare, making it a trailing type.

TransmissionResults

String

This is a string value that lists error codes from any errors caused by the most recent request as a whole.

Melissa Personator Enricher

Plug-in ID

Melissa Personator Enricher - com.semarchy.engine.plugins.melissa.PersonatorConsumerEnricher

Description

The Melissa Personator Enricher fixes and completes contact data for US/Canada using the Personator Consumer service.

This plug-in requires a valid license string to access the Melissa service. Contact Melissa for the license.
For more details about the service, the parameters, inputs and outputs, refer to the Melissa Personator Consumer Documentation
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

License String

Yes

String

Your license string. This must be valid for you to access the Melissa Service.

Action Append

No

Boolean

The Append Action will return elements based on the selected point of centricity which can either be the address, email or phone. For example, an address centric Append will return the name, company, phone and email associated with the given address. US only.

Action Check

No

Boolean

The Check Action will validate the individual input data pieces for validity and correct them if possible. If the data is correctable, additional information

Action Move

No

Boolean

The Move Action will return the latest address for an individual or business if a previous address was entered. Move requires either a Last Name and Address, or a Business/Company Name and Address as inputs. US only.

Action Verify

No

Boolean

The Verify Action will return to you the relationships between your different input data pieces. It can show you if your name,

Advanced Address Correction

No

Boolean

Uses the name input to perform more advanced address corrections. This can correct or append house numbers, street names, cities, states, and ZIP codes.

Append Options

No

String

Possible values: blank, checkError, always. Setting the Append option to Blank will cause the service to return information only when the input address, phone, email, name or company is blank.

Centric Hint

No

String

Possible values: auto, address, phone, email. Default value is Auto. When set to Auto, it first uses Address if available, followed by Phone if no Address is available, and lastly Email if neither Address nor Phone are available. Use this to tell the service which piece of information to use as the primary point of reference when appending or verifying data.

Columns

No

String

By default requested columns are restricted to mapped outputs, this parameter allow to specifies (force) which column(s) to be requested, see Melissa documentation

Diacritics

No

String

Possible values: auto, on, off. Determines whether or not French language characters are returned. If set to auto, those characters are only returned if they are in the input.

Failure Error Codes

No

String

Comma-separated list of codes (AE01, AE02) or code families (AE). When this result code is returned by the API, the enrichment is failed.

SSL Connection

No

Boolean

Default is true. Set to false if you don’t wish to use a secure connection.

Use Preferred City

No

Boolean

There is an official name that is preferred by the U.S.PS and there may be one or more unofficial "vanity" names in use. Normally, Personator allows you to verify addresses using known vanity names. Setting this to true, will return the prefered city.

Requests Limit

No

Number

When set, this numeric value limits the number of requests made to the Melissa API and the number of enriched records. Records after this limit are not enriched and the plugin returns blank outputs. This parameter is intended for tests purposes only.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

AddressLine1

No

String

The input field for the address line 1. This should contain the delivery address information (house number, street, building, suite, etc.) but should not contain locality information (city, state, postal code, etc.) which have their own inputs.

AddressLine2

No

String

The input field for the address line 2. This can be a continuation of AddressLine1 (ex: suite) or another address.

City

No

String

The city.

CompanyName

No

String

The company name.

Country

No

String

The country.

Email

No

String

The email address.

FirstName

No

String

The given (first) name.

FreeForm

No

String

Single line contact information. Address, phone, email could be all in a single field and they will be parsed out. Please don’t map any other fields if using FreeForm.

FullName

No

String

This field can contain a full name. The API will parse and check Names only if the First Name and Last Name fields are left blank.

LastLine

No

String

The city, state, and ZIP.

LastName

No

String

The family (last) name.

Phone

No

String

The phone number.

PostalCode

No

String

The postal code.

State

No

String

The US state.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

AddressDeliveryInstallation

String

Returns the parsed delivery installation for the address entered in the AddressLine field.

AddressExtras

String

Any extra information that does not fit in the AddressLine fields.

AddressHouseNumber

String

Returns the parsed house number for the address entered in the AddressLine field.

AddressKey

String

Returns a unique identifier for an address. This key can be used with other current and future Melissa services.

AddressLine1

String

These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code.

AddressLine2

String

These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code.

AddressLockBox

String

Returns the parsed lock box number for the address entered in the AddressLine field.

AddressPostDirection

String

Returns the parsed post-direction for the address entered in the AddressLine field.

AddressPreDirection

String

Returns the parsed pre-direction for the address entered in the AddressLine field.

AddressPrivateMailboxName

String

Returns the parsed private mailbox name for the address entered in the AddressLine field.

AddressPrivateMailboxRange

String

Returns the parsed private mailbox range for the address entered in the AddressLine field.

AddressRouteService

String

Returns the parsed route service number for the address entered in the AddressLine field.

AddressStreetName

String

Returns the parsed street name for the address entered in the AddressLine field.

AddressStreetSuffix

String

Returns the parsed street suffix for the address entered in the AddressLine field.

AddressSuiteName

String

Returns the parsed suite name for the address entered in the AddressLine field.

AddressSuiteNumber

String

Returns the parsed suite number for the address entered in the AddressLine field.

AddressTypeCode

String

Returns a code for the address type in the AddressLine field.

CBSACode

String

Census Bureau’s Core Based Statistical Area (CBSA). Returns the 5-digit code for the CBSA associated with the requested record.

CBSADivisionCode

String

Returns the code for a division associated with the requested record, if any.

CBSADivisionLevel

String

Returns whether the CBSA division, if any, is metropolitan or micropolitan.

CBSADivisionTitle

String

Returns the title for the CBSA division, if any.

CBSALevel

String

Returns whether the CBSA is metropolitan or micropolitan.

CBSATitle

String

Returns the title for the CBSA.

CarrierRoute

String

Returns a 4-character code defining the carrier route for this record.

CensusBlock

String

Returns a 4-digit string containing the census block number associated with the requested record.

CensusTract

String

Returns a 4-to 6-digit string containing the census tract number associated with the requested record.

City

String

Returns the city entered in the City field.

CityAbbreviation

String

Returns an abbreviation for the city entered in the City field, if any.

CompanyName

String

Returns the company name.

CongressionalDistrict

String

Returns the 2-digit congressional district that belongs to the requested record.

CountryCode

String

Returns the country code for the country in the Country field.

CountryName

String

Returns the country name for the record.

DeliveryIndicator

String

Returns an indicator of whether an address is a business address or residential address.

DeliveryPointCheckDigit

String

Returns a string value containing the 1-digit delivery point check digit.

DeliveryPointCode

String

Returns a string value containing the 2-digit delivery point code.

EmailAddress

String

Returns the email address entered in the Email field.

EmailDomainName

String

Returns the parsed domain name for the email entered in the Email field.

EmailMailboxName

String

Returns the parsed mailbox name for the email entered in the Email field.

EmailTopLevelDomain

String

Returns the parsed top-level domain name for the email entered in the Email field.

FormattedAddress

String

Mailing address. The full mailing address in the preferred format for the country of the address. This includes the Organization as the first line, one or more lines in the origin country’s format, and the destination country (if required). Separate lines will be delimited by what is specified in the option.

Gender

String

Returns a gender for the name in the FullName field.

Gender2

String

Only used if 2 names are in the FullName field. Returns a gender for the second name in the FullName field.

Latitude

String

Returns the geocoded latitude for the address entered in the AddressLine field.

Longitude

String

Returns the geocoded longitude for the address entered in the AddressLine field.

NameFirst

String

Returns the first name in the FullName field.

NameFirst2

String

Only used if 2 names are in the FullName field. Returns the second name in the FullName field.

NameFull

String

Returns the full name for the record.

NameLast

String

Returns the last name in the FullName field.

NameLast2

String

Only used if 2 names are in the FullName field. Returns a last name for the second name in the FullName field.

NameMiddle

String

Returns a middle name for the name in the FullName field.

NameMiddle2

String

Only used if 2 names are in the FullName field. Returns a middle name for the second name in the FullName field.

NamePrefix

String

empty

NamePrefix2

String

Returns a prefix for the name in the FullName field.

NameSuffix

String

Returns a suffix for the name in the FullName field.

NameSuffix2

String

Only used if 2 names are in the FullName field. Returns a suffix for the second name in the FullName field.

PhoneAreaCode

String

Returns the parsed area code for the phone number entered in the Phone field.

PhoneExtension

String

Returns the parsed extension for the phone number entered in the Phone field.

PhoneNewAreaCode

String

Returns the parsed new area code for the phone number entered in the Phone field.

PhoneNumber

String

Returns the standardized phone number for the record.

PhonePrefix

String

Returns the parsed prefix for the phone number entered in the Phone field.

PhoneSuffix

String

Returns the parsed suffix for the phone number entered in the Phone field.

PlaceCode

String

When ZIP codes overlap, the City field will always return the city that covers most of the ZIP area. If the address is located outside of that city but within the ZIP Code, Place Code will refer to that area.

PlaceName

String

When ZIP codes overlap, the City field will always return the city that covers most of the ZIP area. If the address is located outside of that city but within the ZIP Code, Place Name will refer to that area.

PostalCode

String

Returns the 9-digit postal code for U.S. addresses and 6-digit postal code for Canadian addresses.

Results

String

String value containing a comma-separated list of status, error codes, and change codes for the record. Refer the the Melissa documentation for more details.

Salutation

String

Returns a salutation for the name in the FullName field.

State

String

Returns the state for the record.

StateName

String

Returns the full name of the state entered in the State field.

TransmissionResults

String

This is a string value that lists error codes from any errors caused by the most recent request as a whole.

UTC

String

Returns the time zone of the requested record. All Melissa products express time zones in UTC (Coordinated Universal Time).

UrbanizationName

String

Returns the urbanization name for the address entered in the AddressLine field. Usually only used if the address is in Puerto Rico.

Google Maps Plug-in

The Google Maps Plug-in for Semarchy xDM provides an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding information.

Google Maps Enricher

Plug-in ID

Google Maps Enricher - com.semarchy.integration.rowTransformers.googleMapsEnricher

Description

This enricher takes an input address, enriches and validates this postal address using the Google Geocoding Service.

This plug-in must be used in compliance with the Google Maps/Google Earth APIs Terms of Service.
This enricher uses the Google Geocoding Service, which must be accessible from the Semarchy xDM Application at the following URL: http://maps.googleapis.com/maps/api/geocode/json?<parameters>;. Make sure to make this URL accessible through your firewalls.
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Client ID or API Key

No

String

This parameter may contain either an API Key (for Standard API usage) or the Client ID (for Premium Usage), both provided by Google. The Client ID should begin with the gme- prefix. When providing a Client ID, the signature (Private Key) is required.

Channel

No

String

This parameter assigns a specific channel name and allows tracking usage for this plugin in the Google Maps usage reports.

Default Language

No

String

Code of the default language used for the returned results. For example, for same address, "Rue Mathieu Misery" would appear in French and "Mathieu Misery Street" in English. This code can be overridden by the Language plug-in input. See the list of supported domain languages for more information.

Private Key

No

String

Cryptographic signature key provided by Google with the Client ID.

Request per Second

No

Integer

This parameter limits the number of requests per second made by the enricher to remain within the limits of the API. It defaults to 50 requests per seconds.

You can use the Google Maps service with one of the following authentication methods:

Keyless access to this API is not supported by Google.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.

Language

No

String

Code of the language for the returned result for this record. This language overrides the Default Language parameter. See the list of supported domain languages for more information.

The state, region or province information can be passed in the City input, concatenated with the city name. For example: Address.City || ' ' || Address.State
The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs. Outputs marked with an * appear in a Full and a Short form in the output list.

Output NameTypeDescription

Address Types

String

Comma-separated list of address types (See Address Types for more information.).

Administrative Area Level 1*

String

First-order civil entity below the country level. Within the United States, these administrative levels are states. Not all countries exhibit these administrative levels.

Administrative Area Level 2*

String

Second-order civil entity below the country level. Within the United States, these administrative levels are counties. Not all countries exhibit these administrative levels.

Administrative Area Level 3*

String

Third-order civil entity below the country level. Not all countries exhibit these administrative levels.

Airport

String

Indicates an airport. NOTE: This output is deprecated.

Country*

String

The national political entity.

East Bound Longitude

String

Bounding box eastern limit.

Floor*

String

Indicates the floor of a building address.

Formatted Address

String

Human-readable version of the geocoded address.

Intersection

String

Major intersection, usually of two major roads. NOTE: This output is deprecated.

Latitude

String

Latitude of the address.

Locality*

String

Incorporated city or town political entity.

Longitude

String

Longitude of the address.

Natural Feature*

String

Prominent natural feature.

Neighborhood*

String

Named neighborhood.

North Bound Latitude

String

Bounding box northern limit.

Park*

String

Named park.

Point of Interest*

String

Named point of interest.

Post Box*

String

Specific postal box.

Postal Code*

String

Postal code as used to address postal mail within the country.

Premise*

String

Named location, usually a building or collection of buildings with a common name.

Quality

String

The value of an Address Quality element defines the granularity of the location described by an address. Should return a value that expresses this quality between 0 and 100 (100 being the best quality)

Room*

String

The room of a building address.

Route*

String

Named route (such as US 401).

South Bound Latitude

String

Bounding box southern limit.

Status

String

Status of the request. OK indicates that no error occurred and the address was geocoded. ZERO_RESULTS indicates that no error occurred but the address was not geocoded. See the API documentation for a list of status and error codes

Street Address

String

Precise street address. NOTE: This output is deprecated.

Street Number*

String

Precise street number.

Sub-Locality*

String

First-order civil entity below a locality.

Sub-Premise*

String

First-order entity below a named location, usually a singular building within a collection of buildings with a common name.

West Bound Longitude

String

Bounding box western limit.

Embedded a Google Map in a Form

The Google Geocoding service data must be used to display maps rendered with the Google Maps service.

You can display such a map in Semarchy xDM in a form, by embedding generated HTML and JavaScript.

  1. Create a new form field with the SemQL expression given below.
  2. In the SemQL expression, modify the following line to concatenate your address information:
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
  1. If you are a Google Maps API for Work customer, modify in the code the URL to the Google maps service to include your Google Client ID. Note that the embedded map will stop working after adding the client ID. You must register authorized URLs with Google by following the instructions given in the Google Maps API for Work site:
<script src="https://maps.googleapis.com/maps/api/js?client=YOUR_CLIENT_ID&v3.20&sensor=false"></script>
  1. Edit the field:
    • In the Display Properties, Set the Component Type to Object, and in Data, set the Source Type to Content.
      This configuration tells Semarchy xDM to interpret this code as HTML and JavaScript on the browser.
'<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <script src="https://maps.googleapis.com/maps/api/js?sensor=false"></script>
    <script>

/* Modify the line below */
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";

var zoom = 18;
var mapType = google.maps.MapTypeId.ROADMAP;
var useMarker = true;
var map;

function initialize() {
	var geocoder = new google.maps.Geocoder();
	geocoder.geocode( { "address": address}, function(results, status) {
	 if (status == google.maps.GeocoderStatus.OK) { displayMap(results[0].geometry.location); }
	});
	window.onresize = resize;
}

function displayMap(latlng) {
	var mapOptions = { zoom: zoom, center: latlng, mapTypeId: mapType }
	map = new google.maps.Map(document.getElementById("map_canvas"), mapOptions);
	if (useMarker) {
		var marker = new google.maps.Marker({ map: map, position: latlng});
	}
	resize("");
}

function resize(e) {
	var center = map.getCenter();
	map.getDiv().style.height = window.innerHeight +"px";
	map.getDiv().style.width = window.innerWidth +"px";
	google.maps.event.trigger(map, ''resize'');
	map.setCenter(center);
}

google.maps.event.addDomListener(window, "load", initialize);
    </script>
  </head>
  <body style="margin:0px;">
    <div id="map_canvas" style="margin:0px;"></div>
  </body>
</html>'

Open Street Map Plug-in

The OpenStreetMap Plug-in for Semarchy xDM uses the OpenStreetMap API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address.

OpenStreetMap Enricher

Plug-in ID

OpenStreetMap Enricher - com.semarchy.engine.plugins.openstreetmap

Description

This enricher takes an input address, enriches and validates this postal address using the OpenStreetMap Service.

This enricher uses the OpenStreetMap Service, which must be accessible from the Semarchy xDM Application at the URL specified in the OpenStreetMap URL parameter. Make sure to make this URL accessible through your firewalls.
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

OpenStreetMap URL

Yes

String

URL used to query OpenStreetMap API. Typically http://nominatim.openstreetmap.org/

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.

The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Address

String

Complete address of the location.

City

String

City of the location.

Country

String

Country of the location.

Country Code

String

Country code of the location.

County

String

County of the location.

Latitude

String

Latitude of the location.

Longitude

String

Longitude of the location.

Postal Code

String

Postal code of the location.

Process Code

String

Code that indicates the result status of the address processing.

State

String

State of the Location.

Street Number

String

Street number of the location.

Street Name

String

Street name of the location.

Microsoft Bing Maps Plug-in

The Microsoft Bing Maps Plug-in for Semarchy xDM uses the Bing Location API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address with geocoding information.

Bing Maps Enricher

Plug-in ID

Google Bing Enricher - com.semarchy.engine.plugins.bing.address

Description

This enricher takes an input address, enriches and validates this postal address using the Bing Maps Service.

This plug-in must be used in compliance with the Microsoft Bing Maps APIs Terms of Service.
This enricher uses the Bing Maps Service, which must be accessible from the Semarchy xDM Application at the URL specified in the Bing Location URL parameter. Make sure to make this URL accessible through your firewalls.
This plug-in is thread-safe and supports parallel execution.

Plug-in Parameters

The following table lists the plug-in parameters.

Parameter NameMandatoryTypeDescription

Bing Maps Key

Yes

String

To use the Bing Maps Services, you must have a Bing Maps Key.

Bing Location URL

Yes

String

This URL will be used to query Bing Location API.

Plug-in Inputs

The following table lists the plug-in inputs.

Input NameMandatoryTypeDescription

Address Line

Yes

String

Address line to process.

Postal Code

No

String

Postal code of the address.

City

No

String

City of the address.

Country

No

String

Country of the address.

The entire address, including the Address Line, Postal Code, City and Country values can be passed to the plug-in as a single concatenated string in the Address Line input. If the source data contains the address in a single string, then you can pass this string directly in the Address Line input.

Plug-in Outputs

The following table lists the plug-in outputs.

Output NameTypeDescription

Administrative District

String

The subdivision name within the country or region for an address, such as the abbreviation of a US state.

Administrative District 2

String

The subdivision name within the administrative district for an address.

Confidence

String

Defines the confidence of the location match found by the geocoding service. Possible values: High, Medium, Low.

Country or Region

String

The country or region name of the address.

Formatted Address

String

A string specifying the complete address. This address may not include the country or region.

Status Code

String

The HTTP Status code for the request.

Status Description

String

A description of the HTTP status code.

Latitude

String

Latitude of the location.

Locality

String

The locality, such as the primary city, that corresponds to an address.

Longitude

String

Longitude of the address.

Match Code

String

Defines the geocoding level of the location match found by the geocoder. One or more of the following values: Good, Ambiguous, UpHierarchy

Postal Code

String

The city or neighborhood that corresponds to the postal code.

Process Code

String

Code that indicates the result status of the process.

Appendices

Appendix A: Semarchy Email Enricher Domain Name Cache

The Semarchy Email Enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.

The plug-in stores the cache in the table name EXT_EMAIL_DOMAINS. This table is created at first run of the enricher, by default in the data location served by the enricher. You can specify a specific datasource location to store this table in the Datasource enricher parameter.

Domain Name Cache Table Structure

The structure of the EXT_EMAIL_DOMAINS table is the following:

Column NameDescription

HOST_NAME

Domain name. e.g. "gmail.com"

PREFIX

2 first letters of the domain name. e.g. "gm"

SUFFIX

2 last letters of the domain name. e.g. "om"

HIT_COUNT

Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher.

SEED_DATA

Indicates whether this record was part of the seeded data, of created by the enricher. The value is 1 for seeded data, 0 otherwise.

VALID

Indicates whether the domain name is valid 1 or invalid 0. The value is N/A if the validity is unknown (for example, when a new domain is added in the cache in offline mode).

SUGGESTION

Latest correction found for an invalid domain.

FIRST_INVALID_DATE
LAST_INVALID_DATE
LAST_VALID_DATE

Additional date information used to reconsider a domain validity after a certain period of time.

Fixing Domain Names

The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:

  • The Edit Distance between the invalid domain and cached domain.
  • The hit count of the cached domain.

A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.

Adding Records to the Cache

It is possible to force the creation of new records in the cache, for example to create new fix suggestions.

To manually insert a domain correction <domain_name_replacement> for a <domain_host_name> invalid domain, use the following query sample:

INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	<invalid_host_name>,
	SUBSTR(<invalid_host_name>, 0, 2),
	SUBSTR(<invalid_host_name>, -2, 2),
	0,
	'1',
	'0',
	<host_name_replacement>,
	CURRENT_TIMESTAMP,
	CURRENT_TIMESTAMP
	);

Cache Refresh

The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.

If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.

It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.