Welcome to Semarchy xDM.
This guide contains information about the SemQL language which is used across Semarchy xDM.
Preface
Overview
Using this guide, you will learn the SemQL language and its usage in Semarchy xDM.
Audience
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface | Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic | Italic type indicates special emphasis or placeholder variable that you need to provide. |
| Monospace type indicates code example, text or commands that you enter. |
Other Semarchy Resources
In addition to the product manuals, Semarchy provides other resources available on its web site: https://www.semarchy.com.
Obtaining Help
There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see https://www.semarchy.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness
of this documentation.
If you find any error or have any suggestion for improvement, please
mail support@semarchy.com and indicate the title of the documentation
along with the chapter, section, and page number, if available. Please
let us know if you want a reply.
Introduction to Semarchy xDM
Semarchy xDM is the Intelligent Data Hub platform for Master Data
Management (MDM), Reference Data Management (RDM), Application Data Management
(ADM), Data Quality, and Data Governance.
It provides all the features for data
quality, data validation, data matching, de-duplication, data authoring,
workflows, and more.
Semarchy xDM brings extreme agility for defining and implementing data management applications and releasing them to production. The platform can be used as the target deployment point for all the data in the enterprise or in conjunction with existing data hubs to contribute to data transparency and quality.
Its powerful and intuitive environment covers all use cases for setting up a successful data governance strategy.
Introduction to SemQL
SemQL is a language to express declarative rules in Semarchy xDM. It is used for example to define:
- Enrichers, Matchers, Validations and Survivorship Rules involved in the Certification Process.
- Composite and transformed attributes in form and collections appearing in MDM Application.
- Filters when browsing data in the MDM Hub.
It has the following main characteristics:
- The syntax is close to the SQL language and most SemQL functions map to Database functions.
- SemQL is converted on the fly and executed by the hub database.
- SemQL uses Qualified Attribute Names instead of columns names. The code remains implementation-independent.
The following sections describe the main characteristics of the SemQL language.
SQL-Like Clauses
The SemQL Language allows users to define SQL-Like clauses. Depending on the context, these clauses may be one of the following:
- Condition: A clause that returns a boolean result from the evaluation of expressions using operators. A condition can be used for example for filtering or validating data records (if the clause is false, then the record is filtered out or considered invalid).
- Expression: A clause that returns a value. In the context of a SemQL Enricher for example, an expression transforms, standardizes and enriches source attributes.
- Order By Clause: An expression used to sort records. In consolidation rules, such a clause is used to manage the consolidation conflicts. For example, consider a consolidation made by Most Frequent Value. When multiple values occur with equal frequency, then the SemQL in the Ranking Expression determines which value is used.
TO_CHAR(IsValidCustomer)
is a valid expression, but not a valid
condition. TO_CHAR(IsValidCustomer)='1'
is a valid condition.Qualified Attribute Names
SemQL clauses manipulate attributes and variables defined in the Semarchy xDM model. Attributes are accessed through an unambiguous Qualified Attribute Name. The Qualified Attribute Name is the path to an attribute from the entity being processed.
Built-in and Customized Functions
In expressions, conditions and order by clauses, it is possible to use the built-in SemQL functions. Most of these functions are functions built in the hub’s database and processed by it. Other functions (for example, matching functions) are specific to Semarchy xDM.
You can also use in SemQL customized functions implemented in the database. You must declare these functions in the model to have them appear in the list of functions. See the Declaring Database Functions and Procedures section in the Semarchy xDM Developer’s Guide for more information about declaring customized functions.
SemQL Syntax
Language Elements
The SemQL syntax supports the equivalent of SQL Expressions, Conditions or Order By Clause, which are a combination of one or more Values, Operators, and Functions.
Values, Operators and Functions
Values, operators and functions are the tokens in the SemQL language.
- Values are simple expressions. They may be literals, attributes or variables.
- Operators modify or compare expressions. SemQL support most SQL
operators, including arithmetic and character operators
(
+
,-
,*
,/
,||
), comparison operators (=
,!=
,>
,>=
,<
,⇐
,IN
,BETWEEN
,LIKE
,REGEXP_LIKE
,IS NULL
) and logical operators (AND
,OR
,NOT
). - Functions & Expression Constructs combine other tokens to create new expressions. They include most functions available in the hub’s database, plus the functions implemented by the user.
Operators are not case-sensitive. Functions are case-sentive for SQL Server, and Values are always case-sensitive.
For example:
StartYear BETWEEN 2012 and 2014
is equivalent toStartYear Between 2012 AND 2014
UPPER( CustomerName )
is equivalent toUpper( CustomerName )
FirstName LIKE 'Unknown%'
is NOT equivalent toFirstName LIKE 'UNKNOWN%'
Expressions, Conditions, Order By Clause
Expressions, Conditions and Order By Clauses are the phrases supported by the SemQL Language.
- Expressions combine values, operators and functions to return a non-boolean value.
- Conditions combine values, operators and functions to return a boolean value (true or false).
- Order By Clauses are expressions used to sort data by ascending or
descending values of the expression.
TheASC
orDESC
post-fix define the sort order. Default sort order is ascending.
Order by clauses in the consolidation rules' Ranking Expression also support theNULLS FIRST
orNULLS LAST
clause to specifies whether NULL values should be ordered before or after non-NULL values. By default, the order by clause usesNULLS LAST
if the sort isASC
andNULLS FIRST
if the sort isDESC
.
Examples of Expressions
FirstName
is an attribute.'USA'
is a string literal.Initcap(FirstName)
is a function using an attribute.Initcap(FirstName) || ' ' || Initcap(LastName)
is a combination of operators, functions, attributes and literals.
Examples of Conditions
1=1
compares two literals.Country='USA'
compares an attribute and a literal.Upper(Country) in ('USA', 'US', 'CANADA')
uses a function.
Examples of Order By Clauses
Country
sorts by the Country attribute (ascending by default)Country DESC
sorts by the Country attribute (descending). Nulls values are sorted (default behavior forDESC
) before all non-null values.Country DESC NULLS LAST
sorts by the Country attribute (descending). Nulls values are sorted after all non-null values.CASE PublisherID WHEN 'MKT' THEN 1 WHEN 'CRM' THEN 2 ELSE 99 END ASC
sorts records where PublisherID equalsMKT
, thenCRM
, then the rest.
The following sections detail the elements of a SemQL clause.
Comments
Comments in SemQL are surrounded by /*
and */
.
ANY Contacts HAVE (1=1) /* customer with contacts */ AND NOT ANY Contacts HAVE ( IsInfluencer = 1 ) /* and no contact is an influencer */
Values
The values in a SemQL expression may be literals, attributes or model variables:
- literals are constant values. Numeric are provided as is, other
literals must be surrounded by single quotes
'
.
Examples:
'John'
-42
'1994-11-07'
- Attributes refer to attributes of the entities in the model.
- Model Variables contain values that are used to customize the user experience or parameterize an integration job. Variable values are local to the user session or executed job. They are set either via a job parameter (for jobs), or retrieved from remote servers (declared as Variable Value Providers) when the user opens his session. Variables can be used in SemQL filters and expressions created at design and run-time.
- Search Parameters store the values entered in search form and submitted to the search condition attached to the search form. Search parameters are available only in their own search form’s condition.
Operators
Operators are used to:
- Combine expressions to create new expressions (Arithmetic or Character Operators).
- Evaluate expressions to return a boolean value. Such operators are used to create conditions.
This section details the operators supported in SemQL.
Arithmetic Operators
Operator | Description |
---|---|
| Addition |
| Subtraction |
| Multiplication |
| Division |
Character Operators
The ||
(double pipe) is used for string concatenation.
In SQL Server, the ||
operator is automatically rewritten to the database +
operator. Using +
also works but is not recommended.
Note that concatenation and most operators in SQL Server do not do implicit type conversion, so concatenating two attributes with incompatible types will raise no issue for the the SemQL language but will fail at execution-time.
Comparison Operators
Operator | Description |
---|---|
| Equality |
| Inequality |
| Greater than, greater than or equal |
| Smaller than, smaller than or equal |
| Compares a value with each value in the list, returns true if one value matches. |
| Greater than or equal to value_1 and less than or equal to value_2 |
| TRUE if value matches the pattern. Within the pattern, the character |
| Returns true if the expression matches the pattern. REGEXP_LIKE has a boolean result and is considered a condition and not a function.The parameter value may contain one or more of the following options:
This function behaves differently depending on the data location database technology:
|
| Tests for nulls |
| Condition that returns true if any of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities' Attributes. |
| Condition that returns true if all of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities' Attributes. |
| Condition that returns true if the two values are equal or one of them is null. |
| Condition that returns true if the two values are equal or both are null. |
Logical Operators
Operator | Description |
---|---|
| Return true if both conditions are true. |
| Return true if one condition of the other is true. |
| Returns true if the following condition is false. |
Functions & Expression Constructs
SemQL support functions and expression constructs, that is elements that return a value.
TO_CHAR(IsValidCustomer)
is a valid expression, but not a valid
condition. TO_CHAR(IsValidCustomer)='1'
is a valid condition.Built-in Functions
The functions available in Semarchy xDM include functions in the following categories:
- Strings
- Comparison
- Conversion
- Date & Time
- Matching
- Miscellaneous
- Null Management
- Numeric
When you use a function, Semarchy xDM executes it with the connection information of the data location’s datasource:
- For Oracle, the function is looked for in the default schema accessed by this connection. If the function that does not exist in that schema, then the public synonym for the function is used.
- For PostgreSQL, the function is looked for in the search path of the user configured for the JNDI datasource. This will determine where the function is called from.
- For SQL Server, a function called in SemQL is automatically prefixed with
dbo
.
Useful & Noteworthy Functions
The following list contains noteworthy functions and expressions:
TO_CHAR
,TO_DATE
,TO_NUMBER
functions to perform conversion across data types.TRIM
,LTRIM
,RTRIM
,PAD
LPAD
,RPAD
to trip or pad with blanks.SUBSTR
to retrieve a part of a string.REPLACE
,REGEXP_REPLACE
to replace part of a strings.INSTR
to find the location of a substring in a string.NULLIF
,COALESCE
andNVL
to handle null values.GREATEST
andLEAST
to return the greatest and least of a list of expressions.SYSDATE
to retrieve the system date.
You have a StoreLocation attribute containing values such as '5433 - Midtown'. To extract the 'Midtown' StoreName, use the following function combination in an enricher:
SUBSTR(StoreLocation, STRPOS(StoreLocation, ' - ') + 3)
The complete set of built-in functions with their description is available in Appendix A
Functions for Matching
Certain functions are key in a fuzzy matching process.
Functions for normalizing of transforming values to reduce the noise during fuzzy matching:
UPPER
,LOWER
andINITCAP
absorb the case-sensitivity differences in strings.SOUNDEX
,METAPHONE
andDMETAPHONE
return phonetic representations (phonetization) of strings, absorbing typos.SEM_NORMALIZE
returns a string with non-ASCII characters transformed to ASCII-equivalent or a blank.
Functions that implement fuzzy matching capabilities:
SEM_EDIT_DISTANCE
andSEM_EDIT_DISTANCE_SIMILARITY
respectively returns the distance and percentage of similarity between two strings according to the Levenshtein distance algorithm.SEM_JARO_WINKLER
andSEM_JARO_WINKLER_SIMILARITY
respectively return the distance and percentage of similarity between two strings according to the Jaro-Winkler distance algorithm.SEM_NGRAMS_SIMILARITY
returns the percentage of similarity of two strings according to the Dice’s coefficient similarity measure applied to the n-grams of the strings.
*_SIMILARITY
functions return a value between 0 (no match) and 100 (perfect match). If one or both strings are null, the returned value is 0.Other Constructs
CASE Expression
The CASE
expression selects a result from one or more alternatives,
and returns this result.
This syntax returns the first result for which the expression matches the selector. If none match, it returns the default result.
CASE selector WHEN expression_1 THEN result_1 ... WHEN expression_n THEN result_n [ELSE default_result] END
This syntax returns the first result for which the condition is true. If none is true, it returns the default result.
CASE WHEN condition_1 THEN result_1 ... WHEN condition_n THEN result_n [ELSE default_result] END
CASE PublisherID WHEN 'CRM' THEN Upper(CustomerName) WHEN 'MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
CASE WHEN PublisherID='CRM' THEN Upper(CustomerName) WHEN PublisherID='MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
Table Functions
SemQL supports searching for an expression’s value in the values returned by a table function, using the following syntax:
expression IN table_function(parameter_1, parameter_2 ...)
SEARCH_FOR_IDS
that returns a list of IDs from a customer name.CUSTOMER_ID in SEARCH_FOR_IDS(CUSTOMER_NAME)
Customized Functions
SemQL allows you to access database functions implemented in the database instance hosting the hub.
Call these functions as regular functions by prefixing them with their schema and (optionnally) their package name: <schema>.<package>.<function>
.
COMMON_TOOLS.CUST001.CUSTFUNC(CustomerName)
Attributes and Variables/Search Parameters
Variables
Variables store values that are integration job-specific or user session-specific. Variable are either Built-in Platform Variables, or Model Variables.
Built-in Platform Variables
Built-in Platform Variables are built-in the platform. They contain information about the load or batch being processed or about the user connected or performing the operations. They are listed below:
Variable Name | Definition |
---|---|
| ID of the batch running the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in steppers. |
| Name of the deployment data location. |
| ID of the external load that has submitted the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in steppers. |
| Base URL of the Semarchy xDM server, as configured by the administrator. |
| For the certification process, name of the user who has submitted the integration job. For a user data session, the name of the connected user. |
| Comma-separated list of roles of the connected user. Note that this variable only returns user roles that have corresponding roles declared in Semarchy xDM (in the Administration view, under the Roles node.). For the Tomcat application server specifically, all roles are returned, including those with no corresponding role declared. |
| User Address, configured by the user in his profile. |
| User City, configured by the user in his profile. |
| User Company, configured by the user in his profile. |
| User Country, configured by the user in his profile. This variable uses the ISO 3166-1 alpha-2 country code format (e.g.: |
| User Decimal Separator, configured by the user in his profile. |
| User Department, configured by the user in his profile. |
| User Email, configured by the user in his profile. |
| User First Name, configured by the user in his profile. |
| User Job Title, configured by the user in his profile. |
| User Language Tag, configured by the user in his profile. This variable uses IETF BCP 47 language tag format (e.g.: |
| User Last Name, configured by the user in his profile. |
| User Postal Code, configured by the user in his profile. |
| User Primary Phone, configured by the user in his profile. |
| User Secondary Phone, configured by the user in his profile. |
| User Thousand Separator, configured by the user in his profile. |
| User Timezone, configured by the user in his profile. The time zone is one of the TZ Database time zones (e.g.: |
Model Variables
Model Variables can be used:
- In user sessions: They are set when the user accesses an application, using a variable value provider. In this context, variables are used to parameterize the user experience (for example, in filters restricting the user privileges).
- In integration jobs: In an integration job, a variable value is usually set using a job parameter. If no job parameter is set, the value is set using the variable value provider. In this context, variables are used to parameterize the job execution (for example, in an enricher’s filter expression to prevent the enricher from processing any record depending on the value).
Model variables set from a Variable Value Providers are refreshed when the user accesses Semarchy xDM. In the context of a job, the variables are refreshed as if the user starting the job had accessed Semarchy xDM.
Using the built-in platform variables V_USERNAME
, it is
possible to query (via a Variable Value Provider definition) the
corporate LDAP directory and retrieve the email of the connected user,
and then store this value in a model variable called CORPORATE_USER_EMAIL
.
Using Variables
Variables are used with the following syntax: :<variable_name>
, for
example: :CORPORATE_USER_EMAIL
.
Search Parameters
Search Parameters store the values entered into a search form and submitted to the search condition attached to that search form. They are available only for their own search form’s condition.
Using Search Parameters
Parameters are used using their defined Binding, using the following syntax: :<binding_name>
, for example: :SEARCHED_NAME
.
Named Queries Parameters
Query Parameters store the values passed to a request made to a named query. It is used via its binding name, similarly to a Search parameter.
Attribute Qualified Names
An Attribute Qualified Name is the path to an attribute from the current entity being processed. This path not only allows accessing the attributes of the entity. It also allows access to:
- The attributes of the entities related to the current entity (parent and child entities).
- The Lineage. For example, to access the attributes of the golden record a master record relates to.
customerName
and CustomERName
do not represent the same attribute.Using Current Entity’s Attributes
A given SemQL clause is expressed for a given entity. For example, an enricher, a validation or a filter is performed on a pre-defined entity. Such a clause always has access to the attributes of this entity. These attributes may be simple or complex attributes.
Simple Attributes
Simple attributes can be accessed using their Name.
FirstName
returns the value of the FirstName simple
attribute of the current entity (Employee).
Complex Attributes
Display Name
Complex attributes can be accessed using their Attribute Name. This returns the value of the complex attribute in the format of the corresponding complex type’s Display Name.
The SimpleAddressType complex type is defined with a
display type that shows the Address, City and Country definition
attributes separated by a space. This type is used for the
InputAddress attribute of the Customer entity. The InputAddress
qualified name therefore returns a string containing
<Address> <City> <Country>
value for each Customer.
Definition Attribute
It is also possible to retrieve the value of each definition attribute of a complex type by prefixing this definition attribute name by the name of the complex attribute.
The SimpleAddressType complex type includes the Country
definition attribute. This type is used for the InputAddress attribute
of the Custome entity. The InputAddress.Country
qualified name
therefore returns the Country stored in the InputAddress complex
type value for each Customer.
ID Attributes
SemQL exposes several ID attributes to identify a record at certain phases in the certification process.
For example:
- A customer golden record is identified by the value in the
CustomerID
attribute. - Each master record consolidated into this golden is identified by the
PublisherID
,SourceID
pair.
The table below lists the attributes representing a record ID and the value they take depending on the data access view. In this table, <IDAttribute>
refers to the name of the ID attribute of the entity.
ID Attribute | Label | Description |
---|---|---|
| Golden ID | The ID of the golden record or ID of the golden record related to the current record if any. This attribute is not available for basic entities. For fuzzy matched entities, this attribute is completed by the following attributes that track changes due to duplicates management operations:
|
| Record Source ID | The ID of the source record. This attribute is available only for views exposing master or source records, for fuzzy matched entities. |
| Record Publisher | The Code of the publisher of the source record. This attribute is available only for views exposing master or source records for fuzzy or ID matched matched. |
| Record ID | The ID of the current record. Its value depends on the view:
|
Foreign Attributes
SemQL exposes multiple attributes representing a related (parent) record ID at certain phases of the certification process.
The table below lists the various attributes representing a related record ID the value they take depending on the data access view. In this table, <ForeignAttribute>
refers to the name of the foreign attribute in the reference.
ID Attribute | Label | Description |
---|---|---|
| Referenced Record Source ID | The ID of the referenced record. This attribute is available when referring to fuzzy matched entities. |
| Referenced Record Publisher | The code of the publisher of referenced record. This attribute is available when referring to fuzzy matched entities. |
| Referenced Record ID | The ID of the referenced record. Its value depends on the referenced record type: When referencing a golden record or a basic entity record it exposes the ID of the referenced golden record. |
| Referenced Record | In the forms and collections, this attribute returns a pointer to the referenced record. You can use it as is in a form attribute to display the reference in a user-friendly format — like a hyperlink of a chip — using the format defined in a display card configured for the referenced entity. When using this attribute in SemQL expressions, the value returned is the primary text of the referenced entity’s default display card. |
Built-in Attributes
Built-in attributes are provided by Semarchy xDM to support the certification process. They appear in addition to the attributes designed in the data model.
Data Access Views
Data is accessed or manipulated using SemQL via a data access view.
A data access view represents the data at a particular phase of its lifecycle in the MDM hub. For example, Golden Data, Master Data, Source Error, etc. Views are available depending on the type of the entity. For example, the Master Data view does not exist for a basic entity.
Each view is identified by an alias (GD for Golden Data, SE for source error, etc). This alias is used in the REST API to query this view, and as a prefix for the physical table containing this view’s data, if such a table exists.
Available attributes depend on the data access view running the SemQL expression and on the type of entity (basic, ID of fuzzy matched).
For example, a clause that involves the Source Data view (for example, an enricher on source data) will support built-in attributes such as the SourceID
(ID of the source record) or the PublisherID
(Code of the
application that published the source record). On Golden Records - which are not directly related to one source - these built-in attributes no longer make sense.
Built-In Attributes List
Built-in attributes are available depending on the type of entity (basic, ID or fuzzy matched), and view being used.
Using Related Entities' Attributes
Related entities may be either parent entities (for a given relation, the current entity has zero or one parent), or child entities (for a given relation, the current entity has zero or more children).
Parent Entities
It is possible to access attributes of a parent entity by prefixing this attribute by the Role of this parent in the relation.
The current Customer entity references the Employee entity in a relation. In this relation, the Role of this second entity is AccountManager. The AccountManager.FirstName
attribute refers to the FirstName of the Employee that is the parent - in this relation, the AccountManager - of the current Customer.
Referring to parent entities can follow a chain of relations.
AccountManager.CostCenter.CostCenterName
follows two relations to return the CostCenterName of the CostCenter to which the AcountManager of the current Customer reports to.
Child Entities
You may access, from an entity, information in its child entities with two types of expressions:
- The
any/all
conditions allows you to filter using data from child entities. This syntax works in conditions only. - The
lookup
expression allows you to lookup and aggregate information from child entities.
Any/All Conditions
The any
condition returns true if one of the child records meets the given condition.
any <child_entity_role> have ( <condition_on_child_entity> )
or
any <child_entity_role> has ( <condition_on_child_entity> )
any Contacts have ( FirstName = 'John' )
The all
condition returns true if all the child records meet the given condition.
all <child_entity_role> have ( <condition_on_child_entity> )
or
all <child_entity_role> has ( <condition_on_child_entity> )
all Contacts have ( isInfluencer = '0' )
It is possible to cascade through several relations' roles using the any/all
conditions.
any Customers.Contacts have ( IsInfluencer = '1' )
In the any/all
conditions, it is possible to access the direct parent’s record from the condition on the child entity, through the ParentRecord
reserved keyword or through the parent’s role name in the relation.
any Contacts have ( any ParentRecord.Contacts have ( ParentRecord.ContactID != ContactID and ParentRecord.FirstName = FirstName ) )
Lookup Expression
The lookup
expression returns values or aggregated values from a child entity.
lookup <lookup_type> <expression_on_child_entity> separated by <separator> from <child_entity_role> where <condition_on_child_entity> order by <order_by_clause_on_child_entity> end
The components of the lookup syntax are:
<child_entity_role>
: This path points to a child entity via the relationships. It may be a direct path through the child entity role (e.g.:Contacts
), or a path (e.g.:Contacts.Emails
).<expression_on_child_entity>
is a SemQL expression to return. This expression is resolved for the child record.<lookup_type>
defines which expression or aggregation is returned from the child records:min
: The smallest value.max
: The largest value.first
: The value in the first record. Records are ordered according to the<order_by_clause_on_child_entity>
.sum
: The sum of the values.average
: The average of the values.count
: The count of values.distinct_count
: The distinct count of values.string_agg
: A concatenation of the values, ordered by the<order_by_clause_on_child_entity>
, and separated by the<separator>
.distinct_string_agg
: A concatenation of distinct values, ordered by the<order_by_clause_on_child_entity>
, and separated by the<separator>
.Note that:
- Only
string_agg
anddistinct_string_agg
use theseparated by <separator>
clause. - The
order by
clause is supported indistinct_string_agg
only for PostgreSQL and Oracle 19c+. - The
<separator>
value may be a literal or an expression. Expressions may only use attributes in the context of the parent entity of the lookup. You cannot use in these expressions attributes from the child entity. Note that expression separators are not supported in SQL Server.
- Only
<condition_on_child_entity>
is a filtering condition: only child records matching this condition are considered for the lookup.<order_by_clause_on_child_entity>
is a sorting condition for the expressions, used in thefirst
andstring_agg
lookup types.
lookup sum Value from Orders where Status = 'Paid' end
lookup first Email from Contacts where Primary = 1 order by CreationDate desc end
lookup string_agg ErrorType || - || ErrorName separated by ',' from Errors order by ErrorType, ErrorName end
Using the Lineage
You can navigate records lineage using SemQL. Using this navigation, you can for example access the master and the golden records consolidated from a source record, or you can access all the attached master records from a golden record.
Lineage Parent Records
You can access the attributes of a parent record related to your current record in the lineage.
This navigation is possible using a pseudo-role name representing the parent relation in the lineage, such as GoldenRecord
(of a master record for example), MasterRecord
or SourceRecord
.
To filter out the master records that are singletons, you can access the golden record using the GoldenRecord
pseudo-role and then use its number of masters MastersCount
:
GoldenRecord.MastersCount > 1
To get the CustomerName
consolidated in the golden record that results from a given source record:
MasterRecord.GoldenRecord.CustomerName
Lineage Child Records
You can access the attributes of child records related to your current record in the lineage.
This navigation is possible using a pseudo-role name representing the child records in the lineage relation, such as MasterRecords
(the master records attached to a golden record) or SourceRecords
(the source records attached to a master record).
You can use the lineage child records similarly to the Child Entities records, using the SemQL any
and all
syntax.
To filter golden records that have duplicates only from the CRM publisher:
MastersCount > 1 and all MasterRecords have ( PublisherID = 'CRM' )
To filter all master records created from source records older than 5 days.
all SourceRecords have (CreationDate < SYSDATE() - 5)
Special Cases
Attributes of Duplicates
Certain SemQL expressions manipulate two similar records simultaneously:
- The SemQL condition that defines the match rule in a Matcher.
- The SemQL condition used to filter duplicates.
For these expressions, the two similar records are identified by the RECORD1
and RECORD2
pseudo-record prefixes.
The following condition returns the duplicates with the same InputAddress.Address (complex type) but a different CustomerName
Record1.CustomerName <> Record2.CustomerName and Record1.InputAddress.Address = Record2.InputAddress.Address
Attributes in Reference Pickers Filters
In Steppers, SemQL conditions can filter the records selectable with reference pickers. These conditions usually manipulate two different records.
In this context, the records are identified using pseudo-record prefixes:
Record
represents the record being edited. It is the referencing record.Referenced
represents the selectable referenced record (the one filtered in the reference picker).
The following condition is set on a reference picker that enables selecting an Account Manager (Employee) from a Customer entity managed in a stepper. It reduces the selectable employee records to only those in the same country as the Customer being edited.
Record.Country = Referenced.Country
The following condition is set on a reference picker to select the Manager (an instance of Employee) of a given Employee. It filters the selectable managers (Referenced
) so that they are in the same CostCenter as the the current manager (Record.Manager
).
Record.Manager.FID_CostCenter = Referenced.FID_CostCenter
See the Reference Selection section in the Steppers chapter in the Semarchy xDM Developer’s Guide for more information.
Using SemQL in Semarchy xDM
This section describe the various uses of SemQL in Semarchy xDM, as well as the attributes available for these uses.
Using SemQL at Design-Time
At design-time, SemQL is used:
- In the entities, to generate automatically the Source ID of an entity when creating or importing new records.
- In the various rules defining the Certification Process, that is the enrichers, validations, matchers and survivorship rules.
- In the Applications to define the filters applied to the lists in the business views and the form/collections attributes.
- In properties that support SemQL in addition to literal values.
- In the conditions used for securing data Row-Based Filtering.
SemQL for ID Generation
The Source ID of an entity can be generated using a SemQL expression. This expression is executed and the ID generated when a record form is saved for the first time in a stepper or when a record is imported from a file containing no IDs.
This expression can use:
- Attributes from the entity.
- Attributes of the referenced (parent) entities.
- SemQL functions and operators. You can for example use the
SEQ_NEXTVAL
function to retrieve the next increment of a named sequence.
Examples of SemQL for ID Generation:
- To create an ID using a literal concatenated with a sequence increment, for a customer entity:
'CUST_' || SEQ_NEXTVAL('CUST_SEQ')
- To create an ID using values from the record itself, for a customer entity:
Upper(LastName) || '_' || CustomerNumber
- To create an ID concatenating the IDs of two related records. Typically, for an entity that relates products and distribution markets:
DistributedProduct.ProductID || '_' || DistributionMarket.MarketID
SemQL in the Certification Process
The certification process takes source records pushed to the MDM Hub by identified Publishers and creates enriched, validated, matched and consolidated golden records. SemQL is involved in the various phases of this process.
Certification Process Clauses
The following table describes the expressions, condition and order by clauses used in the certification process.
Process Step | Clause Type | Description |
---|---|---|
SemQL Enricher Filter | Condition | This condition filters the source records of the entity that must go through this enricher. |
SemQL Enricher Expressions | Expression | The result of the expressions load the enriched entity’s attributes. |
API Enricher Filter | Condition | This condition filters the source records of the entity that must go through this enricher. |
API Enricher Inputs | Expression | Each expression result is pushed to an enricher input. The outputs of the enricher load the entity’s attributes with enriched values. |
SemQL Validation Condition | Condition | This condition defines which source or golden records pass the validation. |
Plug-in Validation Input | Expression | Each expression result is pushed to a plug-in input. The plug-in outputs a boolean defining the records that pass the validation. |
Matcher Binning Expression | Expression | Master records having the same value for all binning expressions are in the same bin, and will be matched using the matching condition. |
Matcher Matching Condition | Condition | Master records in the same bin for which the matching condition returns true are considered as matches. |
Consolidation Rule Ranking Expression | Order By | For Custom Ranking strategy, the consolidation uses values from the first of the records ordered by this expression. For other strategies, if the consolidation strategy returns two records with the same rank, the consolidation sorts them by this expression and uses the value of the first one. |
Available Attributes by Clause
In the certification process:
- Attributes from the entity being processed are always available.
- Built-in attributes availability depends on the type of entity and on the clause being created.
- Attributes from related entities (parent or child entities as well as Lineage parent and child records) availablity also depends on the type of the entity and on the clause being created.
Certification Process SemQL Example
Enricher Expressions
Examples of enricher conditions:
- FirstName:
InitCap(FirstName)
- Name:
InitCap(FirstName) || ' ' || Upper(LastName)
- City:
Replace(Upper(InputAddress.City),'CEDEX','')
- EmployeeCostCenterName :
CostCenter.CostCenterName
: Current Employee entity references the CostCenter entity and this expression returns an employee’s cost center name
Validation Conditions
Checking the Customer’s InputAddress complex attribute validity:
InputAddress.Address is not null and ( InputAddress.PostalCode is not null or InputAddress.City is not null)
In this example, the IS NOT NULL
, AND
and OR
SemQL operators are
used to build the condition.
It is possible to use child records in validations ( See Child Entities for more information).
For example, a Customer is valid only if all its attached Contacts have an email:
ALL Contacts HAVE (EmailAddress is not null)
Matcher
- Binning Expression to group customers by their Country/PostalCode:
InputAddress.Country || InputAddress.PostalCode
- Matching Condition: Matching two customer records by name, address and city name similarity:
SEM_EDIT_DISTANCE_SIMILARITY( Record1.CustomerName, Record2.CustomerName ) > 65 and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.Address, Record2.InputAddress.Address ) > 65 and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.City, Record2.InputAddress.City ) > 65
In this second example, SEM_EDIT_DISTANCE_SIMILARITY
is a SemQL
function. Record1
and Record2
are predefined names for qualifying
the two records to match.
Matching on child entities is possible by configuring the rule to use Child Records.
For example, to match two Customers when they have a Contact with the same FirstName and LastName and a CEO title, and the Customer names are 50% similar, configure the match rule with the Child Records set to Contact and use the following clause :
Record1.FirstName = Record2.FirstName AND Record1.LastName = Record2.LastName AND Record1.JobTitle = 'CEO' and Record2.JobTitle = 'CEO' AND SEM_EDIT_DISTANCE_SIMILARITY (Record1.Customer.Name, Record2.Customer.Name) > 50
This rule matches Customers but its base entity is Contact. So the Record1.Customer.Name
expression returns the name of the customer the contact we match is attached to.
SemQL in Privilege Grants
Entity privileges support Row-Level Filtering, to apply privileges only
to a subset of the records of the entity. The subsets are defined using
SemQL filters.
Each entity privilege grant may be associated with a filtering
Condition.
This condition has access to all attributes of the entity, its related
entities and built-in attributes ClassName, CreationDate,
UpdateDate, Creator and Updator.
To grant a given privilege to a user whose email address
(stored in a CORPORATE_USER_EMAIL
variable) appears in the current record in the
EmailAddress attribute, the filter is EmailAddress = :CORPORATE_USER_EMAIL
.
SemQL in Applications
Business View Transition Path
Business entities in business views are related using transition paths. A transition path is an expression pointing to records in a directly or indirectly related entity.
For example, to display all the siblings of an employee in a manager/report hierarchy, the following transition path is used:
Manager.Reports
.
Business View Filters
Each business entity of a business view supports a Filter condition that
filters the records displayed in this business entity.
This condition has access to all attributes of the entity, its related
entities and built-in attributes ClassName, CreationDate,
UpdateDate, Creator and Updator.
To filter at the root of the hierarchy of cost centers only
those with no parent, the following filter is applied:
FID_ParentCostCenter is null
. FID_ParentCostCenter is the attribute
representing the relation between a cost center and its parent cost
center.
Form Fields/Table Columns/Properties
Form fields and table columns, as well as application properties support SemQL Expressions. These expressions allow building composite attributes for display purposes and customizing the appearance of the application.
This expression has access to all attributes of the entity, its related entities and built-in attributes.
To display a custom attribute called ContactsStatus on a Customer form which displays whether this customer has contacts and whether these contacts have the IsInfluencer flag set to '1':
CASE WHEN ANY Contacts HAVE (IsInfluencer = 1 ) THEN 'One or More Influencers' WHEN NOT ANY Contacts HAVE (1=1) THEN 'No Contacts' ELSE 'No Influencer Contact' END
Predefined Sort
In Business Views, you can define an Expression that is used to sort the records under the node in the business view.
Reference Picker Filters
In Business Views, SemQL conditions can filter the records selectable with reference pickers. See Reference Pickers for more information.
Search Forms
In Search Forms, you define a SemQL condition to filter records based on the search parameters values entered in the form.
This condition has access to all attributes of the searched entity, its related entities and built-in attributes. Note that certain built-in attributes should be used with caution as some do not apply to all the views the search form may apply to.
Using SemQL at Run-Time
Filters
All date view in an application support filters in the form of
Conditions.
This condition has access to all attributes of the entity, its related
entities and built-il attributes that depend on the view that is
accessed.
For example, on a Golden Record: BatchID, ClassName, CreationDate, UpdateDate, Creator and Updator
are available.
Duplicates Filters
When filtering duplicates, for example to filter records to checkout in
a duplicate management activity, it is possible to use Record1
and
Record2
as in a Matcher, except that the comparison
takes place on the pairs of records that have matched.
To show matching records, but with the CustomerName fields less similar by less than 95%:
SEM_EDIT_DISTANCE_SIMILARITY(Record1.CustomerName, Record2.CustomerName) < 95
The SemQL Editor
The SemQL editor can be called when a SemQL expression, condition or clause needs to be built.
This editor is organized as follows:
- Attributes available for the expression appear in left panel. Double-click an attribute to add it to the expression.
- Functions declared in SemQL appear in the left bottom panel, grouped in function groups. Double-click a function to add it to the expression.
- Variables available for the expression appear in the bottom center panel.
- Messages appear in the right bottom panel, showing parsing errors and warnings.
- Description for the selected function or attribute appear at the bottom of the editor.
- The Toolbar allows to indent the code or hide/display the various panels of the editor and to undo/redo code edits.
Appendix A: SemQL Functions List
The following tables lists the built-in functions available in SemQL.
Functions for Oracle
The following functions are available when using Oracle.
Function | Description |
---|---|
| Returns the absolute value of number. |
| Returns the arc cosine of number. The argument number must be in the range of -1 to 1, and the function returns a value in the range of 0 to pi, expressed in radians. |
| Returns the date "date" plus "integer" months. The date argument can be a datetime value or any value that can be implicitly converted to DATE. The integer argument can be an integer or any value that can be implicitly converted to an integer. |
| Returns the decimal representation in the database character set of the first character of string. |
| Takes as its argument a string, or an expression that resolves to a string, in any character set and returns an ASCII version of the string in the database character set. Non-ASCII characters are converted to the form xxxx, where xxxx represents a UTF-16 code unit. |
| Returns the arc sine of number. The argument number must be in the range of -1 to 1, and the function returns a value in the range of -pi/2 to pi/2, expressed in radians. |
| Returns the arc tangent of number. The argument number can be in an unbounded range and the function returns a value in the range of -pi/2 to pi/2, expressed in radians. |
| Returns the arc tangent of number1 and number2. The argument number1 can be in an unbounded range and the function returns a value in the range of -pi to pi, depending on the signs of number1 and number2, expressed in radians. ATAN2(n1,n2) is the same as ATAN2(n1/n2). |
| Converts a bit vector to its equivalent number. Each argument to this function represents a bit in the bit vector. This function takes as arguments any numeric data type, or any nonnumeric data type that can be implicitly converted to a number. Each expr must evaluate to 0 or 1. |
| Computes an AND operation on the bits of expr1 and expr2, both of which must resolve to nonnegative integers, and returns an integer. |
| Returns the smallest integer greater than or equal to number. |
| Returns the character having the binary equivalent to number as a string value in the database character set. |
| Returns the first non-null expr in the expression list. At least one expr must not be the literal NULL. If all occurrences of expr evaluate to null, then the function returns null. |
| Takes as its argument a string, or an expression that resolves to a string, in any data type, and returns a Unicode string in its fully normalized form in the same character set as the input. |
| Returns string1 concatenated with string2. This function is equivalent to the concatenation operator. |
| Converts a character string from one character set to another. |
| Returns the cosine of number (an angle expressed in radians). |
| Returns the hyperbolic cosine of number. |
| Returns the current date in the session time zone, in a value in the Gregorian calendar. |
| Returns the current date and time in the session time zone. If you omit precision, then the default is 6. |
| Returns the value of the database time zone. The return type is a time zone offset (a character type in the format '+/- HH:MM') or a time zone region name. |
| Compares expr to each search value one by one. If expr is equal to a search, then it returns the corresponding result. If no match is found, then it returns default. If default is omitted, then Oracle returns null. |
| Takes as its argument a string in any data type and returns a Unicode string after decomposition in the same character set as the input. For example, an o-umlaut code point will be returned as the "o" code point followed by an umlaut code point. |
| Returns e raised to the number-th power, where e = 2.71828183 … The function returns a value of the same type as the argument. |
| Extracts and returns the day from expr. expr must be a valid ANSI date. |
| Extracts and returns the hour from expr. expr must be a valid ANSI datetime. |
| Extracts and returns the minute from expr. expr must be a valid ANSI datetime. |
| Extracts and returns the month from expr. expr must be a valid ANSI date. |
| Extracts and returns the second from expr. expr must be a valid ANSI datetime. |
| Extracts and returns the abbreviation of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
| Extracts and returns the hour of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
| Extracts and returns the minute of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
| Extracts and returns the region of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
| Extracts and returns the year from expr. expr must be a valid ANSI date. |
| Returns largest integer equal to or less than number. |
| Converts a timestamp value and a time zone to a TIMESTAMP WITH TIME ZONE value. Time_zone_value is a character string in the format 'TZH:TZM' or a character expression that returns a string in TZR with optional TZD format. |
| Returns the greatest of the list of one or more expressions. |
| Converts string containing hexadecimal digits to a raw value. |
| Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space or characters that are not alphanumeric. |
| Searches string for substring using the input character set. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
| Searches string for substring using UC2 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
| Searches string for substring using UC4 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
| Searches string for substring using bytes instead of character. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
| Searches string for substring using Unicode complete characters. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
| Returns the date of the last day of the month that contains date. |
| Returns the least of the list of expressions. |
| Returns the length of string. Length is calculated using characters as defined by the input character set. |
| Returns the length of string. Length is calculated using characters as defined by the UC2 code point. |
| Returns the length of string. Length is calculated using characters as defined by the UC4 code point. |
| Returns the length of string. Length is calculated using bytes instead of characters. |
| Returns the length of string. Length is calculated using Unicode complete characters. |
| Returns the natural logarithm of number, where number is greater than 0. |
| Returns the current date and time in the session time zone. |
| Returns the logarithm, base number2, of number1. The base number1 can be any positive value other than 0 or 1 and number2 can be any positive value. |
| Returns char, with all letters lowercase. |
| Returns expr1, left-padded to length number characters with the sequence of characters in expr2. If you do not specify expr2, then the default is a single blank. |
| Removes from the left end of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank. |
| Returns the remainder of number2 divided by number1. Returns number2 if number1 is 0. |
| Returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. |
| If number1 is not a number (NaN) then NANVL returns number2. Otherwise, it returns number1. |
| Returns the character having the binary equivalent to number as a string value in the national character set. |
| Returns the date and time (given in timezone 1) converted in time zone timezone2. |
| Returns the date of the first weekday named by day_name that is later than the date date. |
| Returns a collation key for string, that is a string of bytes used to sort strings. lsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific sorting. |
| Returns string with the first letter of each word in uppercase and all other letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific characters case changes. |
| Returns string with all letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. |
| Returns string with all letters in uppercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. |
| Compares expr1 and expr2. If they are equal, then the function returns null. If they are not equal, then the function returns expr1. You cannot specify the literal NULL for expr1. The NULLIF function is logically equivalent to the following CASE expression: CASE WHEN expr1 = expr 2 THEN NULL ELSE expr1 END |
| Converts number to an INTERVAL DAY TO SECOND literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: 'DAY', 'HOUR', 'MINUTE', 'SECOND'. |
| Converts number to an INTERVAL YEAR TO MONTH literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: 'YEAR', 'MONTH'. |
| If expr1 is null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1. |
| If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3. |
| Computes a hash value for a given expression. The expr argument determines the data for which you want to compute a hash value. The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295. The optional seed_value argument produces many different results for the same set of data. |
| Returns number2 raised to the number1 power. The base number2 and the exponent number1 can be any numbers, but if number2 is negative, then number1 must be an integer. |
| Converts raw to a character value containing its hexadecimal equivalent. |
| Extends the functionality of the REGEXP_INSTR function by returning the number of times a pattern occurs in a source string, starting at position. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow '.' to match even the newline character, m - treat input as multiple lines, x - ignore whitespaces. |
| Extends the functionality of the INSTR function by letting you search a string for a regular expression pattern. It returns an integer indicating the beginning or ending position of the matched substring, depending on the value of the return_option argument. If no match is found, the function returns 0. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow '.' to match even the newline character, m - treat input as multiple lines. |
| Extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow '.' to match even the newline character, m - treat input as multiple lines. |
| Extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow '.' to match even the newline character, m - treat input as multiple lines. |
| Returns the remainder of number2 divided by number1. |
| Returns string with every occurrence of search_string replaced with replacement_string. If replacement_string is omitted or null, then all occurrences of search_string are removed. If search_string is null, then string is returned. |
| Returns date_or_number rounded to the unit specified by the format model fmt_or_integer. |
| Returns expr1, right-padded to length number characters with expr2, replicated as many times as necessary. If you do not specify expr2, then it defaults to a single blank. |
| Removes from the right end of string all of the characters that appear in set_of_chars. If you do not specify set_of_chars, then it defaults to a single blank. |
| Converts a boolean to its standard string representation. The returned value is "true" or "false". |
| Returns a string concatenating the input values with the separator. Set the skip_null parameter to '1' to skip null values. Set the sep parameter to an empty string '' to have no separator. |
| Calculates the distance between two strings, that is the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647). Note that this function measures the distance in number of bytes and not in characters. As a consequence, strings stored using variable-width characters sets (UTF-8, for example) can cause counter-intuitive results. It is recommended to convert these strings to a fixed-width character set (AL16UTF16, for example) prior to passing them to this function. |
| Calculates the distance between string1 into string2 (as described in the SEM_EDIT_DISTANCE function), and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
| Finds the location of a substring within a specified string. |
| Calculates the measure of agreement between two strings using Jaro-Winkler method. The value is between 0 (no match) and 1 (perfect match). If one or both strings are null the result will be 0. |
| Calculates the measure of agreement between two strings using Jaro-Winkler method, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
| Calculates the measure of agreement between two strings using the Dice’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default). |
| Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated. |
| Converts a number to its standard string representation. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set. |
| Converts a timestamp to its standard string representation. |
| Converts a string or a number to its standard string representation. |
| Generic conversion to a string representation. |
| Converts a UUID to its standard string representation. |
| Get the next value of a sequence. Note that this function is not supported in Enrichers. |
| Returns the sign of number. The sign is: -1 if n<0, 0 if n=0, 1 if n>0. |
| Returns the sine of number (an angle expressed in radians). |
| Returns the hyperbolic sine of number. |
| Returns a character string containing the phonetic representation of string. This function lets you compare words that are spelled differently, but sound alike in English. Note that phonetization methods such as CAVERPHONE or METAPHONE for person names and METAPHONE or REFINEDSOUNDEX for organization names give better results for matching. These methods are available in the Text Normalization and Transliteration plugin. |
| Returns the square root of number. |
| Computes a hash value for a given expression and returns it in a RAW value. The optional method lets you choose the hash algorithm (defaults to SHA1) in the following list: SHA1, SHA256, SHA384, SHA512 and MD5. This function requires Oracle version 12c or above. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS2 code points. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS4 code points. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using bytes instead of characters. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using Unicode complete characters. |
| Returns the current date and time set for the operating system on which the database resides. |
| Returns the system date, including fractional seconds and time zone, of the system on which the database resides. |
| Returns the value of parameter associated with the context namespace. |
| Extracts the UTC (Coordinated Universal Time—formerly Greenwich Mean Time) from a datetime value with time zone offset or time zone region name. |
| Generates and returns a globally unique identifier (RAW value) made up of 16 bytes. |
| Returns the tangent of number (an angle expressed in radians). |
| Returns the hyperbolic tangent of number. |
| Returns a double-precision floating-point number. |
| Returns a single-precision floating-point number. |
| Converts expr to its string representation optionally using fmt and nlsparam for the conversion. |
| Converts expr to a CLOB (large string) |
| Converts string to a date value. The fmt is a datetime model format specifying the format of string. If you omit fmt, then string must be in the default date format. If fmt is J, for Julian, then string must be an integer. |
| Converts a character string to an INTERVAL DAY TO SECOND value. |
| Returns string with all of its single-byte characters converted to their corresponding multibyte characters. |
| Converts expr to a number value using the optional format model fmt and nlsparam. |
| Returns string with all of its multibyte characters converted to their corresponding single-byte characters. |
| Converts string a timestamp value. The optional fmt specifies the format of string. |
| Converts string to a TIMESTAMP WITH TIME ZONE value. The optional fmt specifies the format of string. |
| Converts string to an INTERVAL YEAR TO MONTH type. |
| Returns expr with all occurrences of each character in from_string replaced by its corresponding character in to_string. Characters in expr that are not in from_string are not replaced. The argument from_string can contain more characters than to_string. In this case, the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in string, then they are removed from the return value. |
| Removes from the left and right ends of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank. |
| When expr is a date, returns expr with the time portion of the day truncated to the unit specified by the format model fmt_or_number. If you omit fmt_or_number, then date is truncated to the nearest day. When expr is a number, returns expr truncated to fmt_or_number decimal places. If fmt_or_number is omitted, then expr is truncated to 0 places. |
| Takes as its argument a text literal or an expression that resolves to character data and returns it in the national character set. The national character set of the database can be either AL16UTF16 or UTF8. UNISTR provides support for Unicode string literals by letting you specify the Unicode encoding value of characters in the string. |
| Returns string with all letters uppercase. |
| Lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. For a given expression, the function returns the bucket number into which the value of this expression would fall after being evaluated. Expr must evaluate to a numeric or datetime. Min_value and max_value are expressions that resolve to the end points of the acceptable range for expr. Both of these expressions must also evaluate to numeric or datetime values, and neither can evaluate to null. |
Functions for PostgreSQL
The following functions are available when using PostgreSQL.
Function | Description |
---|---|
| Returns the absolute value |
| Returns the inverse cosine |
| Subtracts arguments, producing a symbolic result that uses years and months, rather than just days |
| Concatenates input values, including nulls, into an array. If the inputs are arrays, concatenates them into an array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or NULL) |
| ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character. |
| Returns the inverse sine |
| Returns the inverse tangent |
| Returns the inverse tangent of number_1/number_2 |
| Returns the average (arithmetic mean) of all input values |
| Returns the bitwise AND of all non-null input values, or null if none |
| Returns the number of bits in string |
| Returns the bitwise OR of all non-null input values, or null if none |
| Returns true if all input values are true, otherwise false |
| Returns true if at least one input value is true, otherwise false |
| Removes the longest string consisting only of characters in characters (a space by default) from the start and end of string |
| Returns the cube root |
| Returns the nearest integer greater than or equal to argument |
| Returns the nearest integer greater than or equal to argument (same as ceil) |
| Returns the number of characters in string |
| Returns the character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes. |
| Returns the current date and time (changes during statement execution) |
| The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. |
| Concatenates the text representations of all the arguments. NULL arguments are ignored. |
| Concatenates all but the first argument with separators. The first argument is used as the separator string. NULL arguments are ignored. |
| Converts string to dest_encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. |
| Converts string to the database encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. |
| Converts string to dest_encoding. |
| Returns the cosine |
| Returns the cotangent |
| Returns the number of input rows for which the value of expression is not null |
| Returns the current date |
| Returns the current time of day |
| Returns the current date and time (at the beginning of current the transaction) |
| Return value most recently obtained with nextval for specified sequence |
| Creates a range of dates. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Get subfield from a timestamp or an interval. The part to extract is defined by the text. |
| Truncate timestamp or interval to the precision specified in the text. |
| Decodes binary data from textual representation in string. Options for format are same as in encode. |
| Converts radians to degrees |
| Converts two strings to their Soundex codes and then reports the number of matching code positions. Since Soundex codes have four characters, the result ranges from zero to four, with zero being no match and four being an exact match. |
| Returns the integer quotient of number_1/number_2 |
| Returns a character string containing the phonetic representation of string using the Double Metaphone algorithm. This function returns the primary code for the string. See also: DMETAPHONE_ALT. |
| Returns a character string containing the phonetic representation of string using the Double Metaphone algorithm. This function returns the secondary or alternate code for the string. See also: DMETAPHONE. |
| Encodes binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. |
| Equivalent to bool_and |
| Returns the exponential |
| Returns the nearest integer less than or equal to argument |
| Formats the arguments according to format_string. This function is similar to the C function sprintf. |
| Extract bit from string |
| Extract byte from string |
| The GREATEST function selects the largest value from a list of any number of expressions. |
| Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. |
| Creates a range of integers. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Creates a range of bigints. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Returns a boolean indicating whether the range empty. |
| Tests for finite date, interval or timestamp (not +/-infinity) |
| Aggregates values as a JSON array |
| Aggregates name/value pairs as a JSON object |
| Aggregates values as a JSON array |
| Aggregates name/value pairs as a JSON object |
| Adjust interval so 30-day time periods are represented as months |
| Adjust interval so 24-hour time periods are represented as days |
| Adjust interval using justify_days and justify_hours, with additional sign adjustments |
| Return value most recently obtained with nextval for any sequence |
| The LEAST function selects the smallest value from a list of any number of expressions. |
| Returns the first n characters in the string. When n is negative, return all but last n characters. |
| Returns the number of characters in string with an optional given encoding. The string must be valid in this encoding. |
| Returns the Levenshtein distance between two strings, computed according to the cost specified for a character insertion, deletion, or substitution, respectively (you may set these values to 1). |
| This function is an accelerated version of the LEVENSTHTEIN function that returns accurate values for actual distances smaller than max_distance. |
| Returns the natural logarithm. |
| Returns the current time of day |
| Return the current date and time (at start of current transaction) |
| Returns the logarithm in base 10. |
| Convert string to lower case |
| Returns a boolean indicating whether the lower bound of the range is inclusive. |
| Returns a boolean indicating whether the lower bound of the range is infinite. |
| Fills up the string to length length by prepending the characters fill_text (a space by default). If the string is already longer than length then it is truncated (on the right). |
| Removes the longest string containing only characters from characters (a space by default) from the start of string |
| Create date from integer year, month and day fields |
| Create interval from years, months, weeks, days, hours, minutes and seconds fields. If a field is left empty, it defaults to zero. |
| Create time from hour, minute and seconds fields |
| Create timestamp from year, month, day, hour, minute and seconds fields |
| Create timestamp with time zone from year, month, day, hour, minute and seconds fields; if timezone is not specified, the current time zone is used |
| Returns the maximum value of expression across all input values |
| Calculates the MD5 hash of string, returning the result in hexadecimal |
| Returns a character string containing the phonetic representation of string using the Metaphone algorithm, with a maximum length equal to the integer argument. |
| Returns the minimum value of expression across all input values |
| Returns the remainder of number_1/number_2 |
| Advance sequence and return new value |
| Returns the current date and time (start of current transaction) |
| The NULLIF function returns a null value if value1 equals value2; otherwise it returns value1. |
| Creates a range of numerics. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Returns the number of non-null values |
| Returns the number of null values |
| Returns the number of bytes in string |
| Overlays string with overlay_string, starting at start_position and for length characters. |
| Returns the PI constant. |
| Returns number_1 raised to the power of number_2 |
| Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. |
| Returns the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable. |
| Returns the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. |
| Converts degrees to radians |
| Returns a random value in the [0, 1] range. |
| Returns the lower bound of range |
| Returns the smallest range which includes both of the given ranges |
| Returns the upper bound of range |
| Returns the captured substring(s) resulting from the first match of a POSIX regular expression to the string. flags contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match. |
| Replaces the first substring matching a POSIX regular expression. flags contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match. To replace all substrings, add 'g' to the flags. |
| Splits the string using a POSIX regular expression as the delimiter. flags contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match. |
| Repeats string the specified number of times |
| Replaces all occurrences in string of substring from with substring to |
| Returns the reversed string. |
| Returns last n characters in the string. When n is negative, return all but first |n| characters. |
| Rounds the number to the nearest integer or to int decimal places |
| Fills up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated. |
| Removes the longest string containing only characters from characters (a space by default) from the end of string |
| Returns the scale of the argument (the number of decimal digits in the fractional part) |
| Converts a boolean to its standard string representation. The returned value is "true" or "false". |
| Explicitly cast its argument as a boolean. This can help the database or JDBC driver to infer the type of a bound parameter. |
| Explicitly cast its argument as a numeric value (decimal or integer). This can help the database or JDBC driver to infer the type of a bound parameter. |
| Explicitly cast its argument as a string (varchar). This can help the database or JDBC driver to infer the type of a bound parameter. |
| Explicitly cast its argument as a timestamp. This can help the database or JDBC driver to infer the type of a bound parameter. |
| Explicitly cast its argument as a UUID. This can help the database or JDBC driver to infer the type of a bound parameter. |
| Returns a string concatenating the input values with the separator. Set the skip_null parameter to '1' to skip null values. Set the sep parameter to an empty string '' to have no separator. |
| Calculates the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647). |
| Calculates the number of insertions, deletions or substitutions required to transform string1 into string2, and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
| Finds the location of a substring within a specified string. |
| Calculates the measure of agreement between two strings using the Dice’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default). |
| Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated. |
| Converts a number to its standard string representation. |
| Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set. |
| Converts a timestamp to its standard string representation. |
| Converts a string or a number to its standard string representation. |
| Generic conversion to a string representation. |
| Converts a UUID to its standard string representation. |
| Get the next value of a sequence. Note that this function is not supported in Enrichers. |
| Sets the seed for subsequent random() calls (value between -1.0 and 1.0, inclusive) |
| Set sequence’s current value |
| Set bit in string |
| Set byte in string |
| Returns the sign of the argument (-1, 0, +1) |
| Returns the sine. |
| Returns a character string containing the phonetic representation of string. This function lets you compare words that are spelled differently, but sound alike in English. Note that phonetization methods such as CAVERPHONE or METAPHONE for person names and METAPHONE or REFINEDSOUNDEX for organization names give better results for matching. These methods are available in the Text Normalization and Transliteration plugin. |
| Splits string on delimiter and return the element at position (counting from one) |
| Returns the square root |
| Returns the current date and time (start of current statement) |
| Return input values concatenated into a string, separated by delimiter |
| Returns the location of specified substring in string |
| Extracts a substring from string starting at from position and for count characters. |
| Extracts from string a substring matching the pattern (a POSIX regular expression). |
| Extracts from string a substring matching an SQL regular expression. |
| Returns the sum of expression across all input values |
| Returns the tangent. |
| Returns the current date and time (like clock_timestamp, but as a text string) |
| Converts string to ASCII from another encoding (only supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings) |
| Convert the value (time stamp, interval, integer, real/double, numeric, string) to string according to the format. See https://www.postgresql.org/docs/current/functions-formatting.html for more information about format patterns. |
| Converts the string to date. See https://www.postgresql.org/docs/current/functions-formatting.html for more information about format patterns. |
| Converts number to its equivalent hexadecimal representation |
| Converts the string to numeric. See https://www.postgresql.org/docs/current/functions-formatting.html for more information about format patterns. |
| Converts the string to time stamp. See https://www.postgresql.org/docs/current/functions-formatting.html for more information about format patterns. |
| Returns the current date and time (start of current transaction) |
| Any character in string that matches a character in the from set is replaced by the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are removed. |
| Truncates the number toward zero or to int decimal places |
| Creates a range of timestamps without time zone. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Creates a range of timestamps with time zone. Inclusivity is a string with an opening/closing square bracket or parenthesis indicating inclusiveness, with a default of '(]'. |
| Converts string to upper case |
| Returns a boolean indicating whether the upper bound of the range is inclusive. |
| Returns a boolean indicating whether the upper bound of the range is infinite. |
| Returns the bucket number to which operand would be assigned in a histogram having count equal-width buckets spanning the range b1 to b2; returns 0 or count+1 for an input outside the range. |
| Returns the bucket number to which operand would be assigned given an array listing the lower bounds of the buckets; returns 0 for an input less than the first lower bound; the thresholds array must be sorted, smallest first, or unexpected results will be obtained. |
| Returns the concatenation of XML values |
Functions for SQLServer
The following functions are available when using SQLServer.
Function | Description |
---|---|
| A mathematical function that returns the absolute (positive) value of the specified numeric expression. |
| A function that returns the angle, in radians, whose cosine is the specified float expression. |
| This function returns the approximate number of unique non-null values in a group. |
| Returns the ASCII code value of the leftmost character of a character expression. |
| A function that returns the angle, in radians, whose sine is the specified float expression. |
| A function that returns the angle, in radians, whose tangent is a specified expression. |
| Returns the angle, in radians, between the positive x-axis and the ray from the origin to the point (y, x), where x and y are the values of the two specified float expressions. |
| This function returns the average of the values in a group. It ignores null values. |
| This function returns the smallest integer greater than, or equal to, the specified numeric expression. |
| This function searches for one character expression inside a second character expression, returning the starting position of the first expression if found. |
| This function returns the checksum of the values in a group. |
| Returns the item at the specified index from a list of values in SQL Server. |
| Evaluates the arguments in order and returns the current value of the first expression that initially does not evaluate to NULL. |
| This function concatenates two or more strings together. |
| A mathematical function that returns the trigonometric cosine of the specified angle - measured in radians - in the specified expression. |
| A mathematical function that returns the trigonometric cotangent of the specified angle - in radians - in the specified float expression. |
| Returns the number of input rows for which the value of expression is not null. |
| This function adds a specified number value (as a signed integer) to a specified datepart of an input date value, and then returns that modified value. Note that datepart must be one of the following values, provided between quotes (e.g.: 'year'): year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond. |
| This function returns the count (as a signed integer value) of the specified datepart boundaries crossed between the specified startdate and enddate. Note that datepart must be one of the following values, provided between quotes (e.g.: 'year'): year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond. |
| This function returns a date value that maps to the specified year, month, and day values. |
| This function returns a character string representing the specified datepart of the specified date. Note that datepart must be one of the following values, provided between quotes (e.g.: 'year'): year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond. |
| This function returns an integer representing the specified datepart of the specified date. Note that datepart must be one of the following values, provided between quotes (e.g.: 'year'): year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond. |
| This function returns a datetime2 value for the specified date and time arguments. The returned value has a precision specified by the precision argument. |
| This function returns an integer that represents the day (day of the month) of the specified date. |
| This function returns the corresponding angle, in degrees, for an angle specified in radians. |
| This function returns an integer value measuring the difference between the SOUNDEX values of two different character expressions. |
| This function returns the last day of the month containing a specified date, with an optional offset. |
| Returns the exponential value of the specified float expression. |
| Returns the largest integer less than or equal to the specified numeric expression. |
| Returns a value formatted with the specified format and optional culture in SQL Server 2017. Use the FORMAT function for locale-aware formatting of date/time and number values as strings. For general data type conversions, use CAST or CONVERT. |
| Returns the current database system timestamp as a datetime value without the database time zone offset. This value is derived from the operating system of the computer on which the instance of SQL Server is running. |
| Returns the MD2, MD4, MD5, SHA, SHA1, or SHA2 hash of its input in SQL Server. |
| Returns 1 if the expression is a valid date, time, or datetime value; otherwise, 0. |
| Replaces NULL with the specified replacement value. |
| Returns the left part of a character string with the specified number of characters. |
| Returns the number of characters of the specified string expression, excluding trailing blanks. |
| Returns the length of string. Length is calculated using characters as defined by the input character set. |
| Returns the natural logarithm of the specified float expression. |
| Returns the base-10 logarithm of the specified float expression. |
| Returns a character expression after converting uppercase character data to lowercase. |
| Returns the string, left-padded to length number characters with the sequence of characters in expr2. If you do not specify expr2, then the default is a single blank. |
| Returns a character expression after it removes leading blanks. |
| Returns the maximum value in the expression. |
| Returns the minimum value in the expression. |
| Returns an integer that represents the month of the specified date. |
| Returns the Unicode character with the specified integer code, as defined by the Unicode standard. |
| Creates a unique value of type uniqueidentifier. |
| Returns a null value if the two specified expressions are equal. |
| Returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types. |
| Returns the constant value of PI. |
| Returns the value of the specified expression to the specified power. |
| Returns a Unicode string with the delimiters added to make the input string a valid SQL Server delimited identifier. |
| Returns radians when a numeric expression, in degrees, is entered. |
| Returns a pseudo-random float value from 0 through 1, exclusive. |
| Replaces all occurrences of a specified string value with another string value. |
| Repeats a string value a specified number of times. |
| Returns the reverse order of a string value. |
| Returns the right part of a character string with the specified number of characters. |
| Returns a numeric value, rounded to the specified length or precision. |
| Returns a character string after truncating all trailing spaces. |
| Converts a boolean to its standard string representation. The returned value is "true" or "false". |
| Returns a string concatenating the input values with the separator. Set the skip_null parameter to '1' to skip null values. Set the sep parameter to an empty string '' to have no separator. |
| Converts a timestamp to its standard string representation. |
| Calculates the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647). |
| Calculates the number of insertions, deletions or substitutions required to transform string1 into string2, and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
| Finds the location of a substring within a specified string. |
| Calculates the measure of agreement between two strings using the Dice’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default). |
| Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated. |
| Converts a number to its standard string representation. |
| Returns part of a character, binary, text, or image expression. |
| Converts a timestamp to its standard string representation. |
| Converts a string or a number to its standard string representation. |
| Generic conversion to a string representation. |
| Converts a UUID to its standard string representation. |
| Get the next value of a sequence. Note that this function is not supported in Enrichers. |
| Returns the positive (+1), zero (0), or negative (-1) sign of the specified expression. |
| Returns the trigonometric sine of the specified angle, in radians, and in an approximate numeric, float, expression. |
| Returns a four-character (SOUNDEX) code to evaluate the similarity of two strings. |
| Returns a string of repeated spaces. |
| Returns the square root of the specified float value. |
| Returns the square of the specified float value. |
| Returns the statistical standard deviation of all values in the specified expression. |
| Returns the statistical standard deviation for the population for all values in the specified expression. |
| Returns character data converted from numeric data. |
| Concatenates the values of string expressions and places separator values between them. The separator is not added at the end of string. |
| Escapes special characters in texts and returns text with escaped characters. STRING_ESCAPE is a deterministic function. |
| Splits the character expression using specified separator. |
| The STUFF function inserts a string into another string. It deletes a specified length of characters in the first string at the start position and then inserts the second string into the first string at the start position. |
| Returns part of a character, binary, text, or image expression in SQL Server. |
| Returns the sum of all the values, or only the DISTINCT values, in the expression. |
| Returns a datetime2(7) value that contains the date and time of the computer on which the instance of SQL Server is running. |
| Returns the tangent of the input expression. |
| Returns the string provided as a first argument after some characters specified in the second argument are translated into a destination set of characters. |
| Removes the characters (by default a space) from the start or end of the string expression. |
| Returns the integer value, as defined by the Unicode standard, for the first character of the input expression. |
| Returns a character expression with lowercase character data converted to uppercase. |
| Returns the statistical variance of all values in the specified expression. |
| Returns the statistical variance for the population for all values in the specified expression. |
| Returns an integer that represents the year of the specified date. |
Appendix B: Data Access Views, Built-in Attributes and Lineage Navigation
The following tables lists the built-in views, attributes and lineage navigation transitions available in SemQL.
Data Access Views
The following table lists the available data access views. Each view is identified by an alias (for example SD for the Source Data) that corresponds to the physical table storing this data if such a table exists.
Alias | Name | Description |
---|---|---|
AE | Source Authoring Errors | Errors detected on source authoring data. |
DU | Duplicates | Duplicates detected by the matching process. |
GD | Golden Data | Golden consolidated and certified records. |
GDWE | Golden Data with Errors | Golden data with errors. |
GE | Post-Consolidation Errors | Errors detected after master data consolidation (post-consolidation). |
GH | Golden History | Golden records history. |
GH4B | Golden Data As of Batch | Golden data history as of batch or golden data if not historized. |
GI | Golden Integration | Latest consolidated (integrated) golden records before post-consolidation validation. |
GX | Deleted Golden | Deleted golden records logs (and data for soft delete). |
MD | Master Data | Enriched, validated and cleansed master records. |
MH | Master History | Master records history. |
MH4B | Master Data As of Batch | Master data history as of batch or master data if not historized. |
MI | Master Integration | Latest integrated master records. |
MX | Deleted Master | Deleted master records logs (and data for soft delete). |
SA | Source Authoring | Source data authored by users. |
SA4L | Source Authoring for Load | Source data authored by users for a given load. |
SA4LK | Source Authoring Lookup | References lookup for source authoring data. |
SAWE | Source Authoring with Errors | Source data authored by users with errors. |
SD | Source Data | Source data successively loaded by the publishers. |
SD4L | Source Data for Load | Source data loaded by the publishers for a given load. |
SD4LK | Source Data Lookup | References lookup for source data loaded by the publishers for a given load. |
SDWE | Source Data with Errors | Source data loaded by the publishers with errors. |
SE | Source Errors | Errors detected on source data loaded by the publishers (pre-consolidation). |
UG | Duplicates Management Consolidated Data | Data consolidated from user decisions taken in a duplicates manager. |
UM | Duplicates Management User Decisions | Master records changed by a user using duplicates management. |
Built-in Attributes
The following table lists the attributes, with the views into which they are available.
Name | Label | Views | Description |
---|---|---|---|
| Authoring Type | SA, SA4L, SAWE | For matched entities, type of authoring operation:
|
| Batch Date | SA, SAWE, SD, SDWE | Date of execution of the batch. |
| Batch ID | AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SAWE, SD, SD4L, SDWE, SE, UM | ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected. |
| Batch Submitter | SA, SAWE, SD, SDWE | User who submitted this batch. |
| Class Name | AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM | Unique name of class / entity to which this record belongs |
| Confidence Score | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM | Confidence Score of the golden record. It is the average of the match scores in the match group. |
| Confidence Score Type | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM | Algorithm used to compute Confidence Score. |
| Confirmation Status | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM | Confirmation status for duplicate management:
|
| Constraint Name | AE, GE, SE | For error records, name of the constraint causing this error. |
| Constraint Type | AE, GE, SE | For error records, type of the constraint causing this error. |
| Created On | GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM | Creation date of a record |
| Created By | GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM | User who created the record |
| Deleted By | GX, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE | User for deleted the record |
| Deleted On | GX, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE | Deletion date of a record |
| Delete Operation | GX, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE | Delete operation ID. |
| Delete Path | GX, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE | Cascade path through which the record was reached during a delete. Null for record directly selected for deletion. |
| Delete Type | GX, MX | Delete Type ( |
| Checkout Cause | UM | Cause that made the record part of the duplicate management transaction. Possible causes are:
|
| Dups Operation ID | UM | Identifier for a duplicate management operation |
| Error Status | GD, GDWE, GH4B, GI, SA, SAWE, SD, SDWE | Error Status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:
|
| Exclusion Group | MD, MH, MH4B, MI, MX, UM | Exclusion Group ID. An exclusion group represents a group of records for which a user has taken split decisions. |
| From Batch ID | GH, GH4B, MH, MH4B | Batch at which the history record was created. |
| Golden Type | GD, GDWE, GH4B | For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM ( |
| Has Override | GD, GH4B | For fuzzy matching and ID matching entities, this flag ( |
| Has Sugg. Merge | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM | Flag ( |
| Is Confirmed | GD, GDWE, GH4B, GI | Flag ( |
| Load ID | AE, SA, SA4L, SAWE, SD, SD4L, SDWE, SE, UM | Load Identifier used as the unique transaction ID for external application pushing data to the platform |
| Masters Count | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM | Number of master records contributing to the golden record. |
| Match Group ID | MD, MH, MH4B, MI, MX | ID of the match group for the master record. This column is set when matching takes place. |
| Old Match Group ID | MI | Previous identifier of the match group for the master record. |
| Original Batch ID | SA, SA4L, SAWE, SD, SD4L, SDWE, UM | Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager. |
| Original Confidence Score | UM | Confidence Score of the original golden in a duplicate management operation. |
| Original Confidence Score Type | UM | Algorithm used to compute Original Confidence Score. |
| Original Confirmation Status | UM | Original Confirmation Status in a duplicate management operation. |
| Original Exclusion Group | UM | Original Exclusion Group in a duplicate management operation. |
| Original Masters Count | UM | Number of master records in the original golden in a duplicate management operation. |
| Publisher ID | AE, GD, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM | For matching entities, code of the publisher that published the record. |
| Source ID | AE, GD, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM | ID of the source record in the source publisher system (Fuzzy Matched entities only). |
| Sugg. Merge Confidence Score | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM | Confidence Score for the suggested match group. |
| Sugg. Merge Confidence Score Type | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM | Algorithm used to compute Confidence Score for the suggested match group. |
| Sugg. Merge ID | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UG, UM | ID of the merge suggested by the automated matching. |
| Sugg. Merge Masters Count | GD, GDWE, GH4B, GI, MD, MH, MH4B, MI, MX, UM | Number of master records in the suggested merge. |
| To Batch ID | GH, GH4B, MH, MH4B | Batch at which history record stopped being current or null if the records is still current. |
| Updated On | GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM | Update date of a record |
| Updated By | GD, GDWE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, UM | User who updated the record |
| View Type | AE, GD, GDWE, GE, GH, GH4B, GI, GX, MD, MH, MH4B, MI, MX, SA, SA4L, SA4LK, SAWE, SD, SD4L, SD4LK, SDWE, SE, UG, UM | Returns the current view type for the record. e.g. GD, MD… |
The following attributes are deprecated.
Name | Label | Views | Description |
---|---|---|---|
| Branch ID | AE, GD, GDWE, GE, GI, MD, MH4B, MI, SD, SD4L, SD4LK, SDWE, SE | Branch ID to which this record belongs. This attribute is deprecated and always returns |
| From Edition | GD, GDWE, MD, MH4B | Data Edition ID in which this record was created or last updated. This attribute is deprecated and always returns |
| To Edition | GD, GDWE, MD, MH4B | Data Edition ID into which this record was deleted or closed. This attribute is deprecated and always returns null. |
Built-in Lineage Navigation
Parent Navigation
The following table lists the built-in lineage navigation to a parent record.
Pseudo-Role | Navigation | Description | Navigation Path |
---|---|---|---|
| Current Golden Record | Current golden record a golden history record is attached to. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Current Master Record | Current master record a master history record is attached to. | Fuzzy matched entity:
ID matched entity:
|
| Golden Record | Golden record into which the master, source or golden integration record consolidates. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Master Record | Master record corresponding to a source record. | Fuzzy matched entity:
ID matched entity:
|
| Record with Error | Authoring record with error attached to a source authoring error. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Source Authoring Record | Authoring record attached to a source authoring error. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Source Record | Source Record attached to the current error record. | Fuzzy matched entity:
ID matched entity:
|
Child Navigation
The following table lists the built-in lineage navigation to child records.
Pseudo-Role | Navigation | Description | Navigation Path |
---|---|---|---|
| Errors | Errors detected for a given source or golden record. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Golden History Records | Golden History Records | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Integration Master Records | Master integration record attached to the golden integration record. Available in post-consolidation enrichers and validations. | Fuzzy matched entity:
ID matched entity:
|
| Master History Records | Master History Records | Fuzzy matched entity:
ID matched entity:
|
| Master Records | Master Records consolidated in the golden record. | Fuzzy matched entity:
ID matched entity:
|
| Source Authoring Records | Source data authored by users attached to a golden record. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|
| Source Records | Source Records attached to the current golden record (for ID matching) or master record. | Basic entity:
Fuzzy matched entity:
ID matched entity:
|