Functions and Other Constructs
SemQL support in expressions and conditions built-in or customized functions that return a value.
Functions differ from comparison operators as they return a non-boolean value. They cannot be used as is in conditions unless used with a comparison operator. For example,
The functions available in Semarchy xDM include functions in the following categories:
Date & Time
The complete set of built-in functions with their description is available in SemQL functions list.
When you use a function, Semarchy xDM executes it with the connection information of the data location’s datasource:
Useful and Noteworthy Functions
The following list contains noteworthy functions and expressions:
TO_NUMBERfunctions to perform conversion across data types.
RPADto trip or pad with blanks.
SUBSTRto retrieve a part of a string.
REGEXP_REPLACEto replace part of a strings.
INSTRto find the location of a substring in a string.
NVLto handle null values.
LEASTto return the greatest and least of a list of expressions.
SYSDATEto retrieve the system date.
You have a StoreLocation attribute containing values such as '5433 - Midtown'. To extract the 'Midtown' StoreName, use the following function combination in an enricher:
SUBSTR(StoreLocation, STRPOS(StoreLocation, ' - ') + 3)
Functions for Matching
Certain functions are key in a fuzzy matching process.
Functions for normalizing of transforming values to reduce the noise during fuzzy matching:
INITCAPabsorb the case-sensitivity differences in strings.
DMETAPHONEreturn phonetic representations (phonetization) of strings, absorbing typos.
SEM_NORMALIZEreturns a string with non-ASCII characters transformed to ASCII-equivalent or a blank.
|Soundex is not recommended as a general purpose method for phonetizing strings. Phonetization methods such as CAVERPHONE or METAPHONE for person names and METAPHONE or REFINEDSOUNDEX for organization names give better results for matching. These methods are available as functions for certain databases, and in the Text Normalization and Transliteration plug-in.|
Functions that implement fuzzy matching capabilities:
SEM_EDIT_DISTANCE_SIMILARITYrespectively returns the distance and percentage of similarity between two strings according to the Levenshtein distance algorithm.
SEM_JARO_WINKLER_SIMILARITYrespectively return the distance and percentage of similarity between two strings according to the Jaro-Winkler distance algorithm.
SEM_NGRAMS_SIMILARITYreturns the percentage of similarity of two strings according to the Dice’s coefficient similarity measure applied to the n-grams of the strings.
With Oracle and PostgreSQL data locations, matching functions rely on database native capabilities. For SQL Server, matching functions rely Transact-SQL implementations, which do not provide the same performances as native capabilities.
For large data volumes, it is recommended to use third-party common language runtime (CLR) implementations of these functions for better performances. For example the Fastenshtein implementation of the Levenshtein algorithm. These functions must be installed in the SQL Server instance, and then declared/used as custom functions.
SemQL allows you to use custom database functions implemented in the database instance hosting the hub.
You must declare these functions in the model to have them appear in the list of functions. See Database Functions and Procedures for more information about declaring customized functions.
|Functions that are not declared can still be used in SemQL, but will not be recognized by the SemQL parser and will cause validation warnings.|
Call these functions as regular functions by prefixing them with their schema and (optionnally) their package name:
|The database user of the schema hosting the hub must have sufficient privileges to execute the customized functions.|
Database functions process data with the database engine. For certain processing involving for example algorithms, libraries or services not easily implemented with the database capabilities, it is preferable to opt for the Java plug-in or REST client option.
CASE expression selects a result from one or more alternatives, and returns this result.
This syntax returns the first result for which the expression matches the selector. If none match, it returns the default result.
CASE selector WHEN expression_1 THEN result_1 ... WHEN expression_n THEN result_n [ELSE default_result] END
This syntax returns the first result for which the condition is true. If none is true, it returns the default result.
CASE WHEN condition_1 THEN result_1 ... WHEN condition_n THEN result_n [ELSE default_result] END
CASE PublisherID WHEN 'CRM' THEN Upper(CustomerName) WHEN 'MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
CASE WHEN PublisherID='CRM' THEN Upper(CustomerName) WHEN PublisherID='MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
SemQL supports searching for an expression’s value in the values returned by a table function, using the following syntax:
expression IN table_function(parameter_1, parameter_2 ...)
SEARCH_FOR_IDSthat returns a list of IDs from a customer name.
CUSTOMER_ID in SEARCH_FOR_IDS(CUSTOMER_NAME)