Publishing Data into the Hub | ||
---|---|---|
Previous | Next | |
Integration Job | Consuming Data from the Hub |
Publishing data into the hub is done using either:
In this approach, external loads are handled via PL/SQL and SQL interface. It works as follows:
INTEGRATION_LOAD.GET_NEW_LOADID
function on the repository schema. This function call returns a Load ID.
INTEGRATION_LOAD.SUBMIT_LOAD
to submit a load identified by its Load ID. This function call returns a Batch ID.
INTEGRATION_LOAD.CANCEL_LOAD
to cancel a load identified by its Load ID.
Tip: This approach is recommended when the middleware is an ETL or data integration product.
In this approach, external loads are handled via the Integration Load Web Service, and the data is loaded using SQL. It works as follows:
getNewLoadID
operation on the Integration Load Web Service. This call returns a Load ID.
submitLoad
to submit a load identified by its Load ID. This call returns a Batch ID.
cancelLoad
to cancel a load identified by its Load ID.
Tip: This approach may be preferred when the middleware is an ESB product, or when the load management is part of a business process. This approach is also preferred to avoid granting database access to the repository schema.
The Integration Load Web Service is accessed at the following URL:
http://<semarchy_server>/ws/1.1/platform/IntegrationLoadService?wsdl
Note that accessing the web service requires WS-Security authentication with a user having the Read/Write privileges on Data Location. See the "Managing the Security" chapter in the "Semarchy Convergence for MDM Administration Guide" for more information on privilege grants.
Initializing an external load uses the following parameters:
data_location_name
: name of the data location. This data location is the one hosting the MDM hub into which the external load is performed.
branch_id
: ID of the data edition branch. It should be set to 0 (zero). Note that the ID of the data edition is not required. By default, the external load is initialized on open (latest data edition).
program_name
: This variable is for information only. It is used to identify the middleware performing the external load. For example
'ETL Custom Script'.
load_description
: This variable is for information only. It is used to describe the nature of this load. For example:
'Daily updates for Customers and Contacts'.
user_name
: name of the user initializing the external load. This user may or may not be a user defined in the security realm of the application server. With the web service operation, the user is not needed as a call to the web service requires authentication to the application server.
Note: To mandate the use of an authenticated user, you must use the web service interface.
This function call to
INTEGRATION_LOAD.GET_NEW_LOADID
initializes an external load and returns a Load ID. It is performed on the repository database schema.
vLoad_id := <semarchy_repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
'<data_location_name>' /* Data Location Name (As in the UI) */
<branch_id> /* Branch ID (0, 1, ...) */
'<program_name>' /* Informational. Identifies the Middleware*/
'<load_description>' /* Informational. Describes the load. */
'<user_name>' /* User initializing the load. */
);
The following example performs a function call initializing an external load on the
CustomerMDMHub data location, on the latest data edition of the data branch 0. The repository schema is
SEMREPO
. The returned Load ID is stored in the
vLoad_id
variable.
vLoad_id := SEMREPO.INTEGRATION_LOAD.GET_NEW_LOADID(
'CustomerMDMHub',
0,
'Custom ETL',
'Initial Load for the Hub',
'John Doe' );
This web service sample call on the
getNewLoadID
operation initializes an external load and returns a Load ID.
<soapenv:Envelope .../>
<soapenv:Header/>
<soapenv:Body>
<ws:getNewLoadID>
<dataLocationName><!-- data_location_name --></dataLocationName>
<dataBranchID><!-- branch_id --></dataBranchID>
<programName><!-- program_name --></programName>
<loadDescription><!-- load_description --></loadDescription>
</ws:getNewLoadID>
</soapenv:Body>
</soapenv:Envelope>
Note: No user name required in the payload since the Web Service will use the credentials of the connected user (WS_Security)
The following example is a request initializing an external load on the CustomerMDMHub data location, on the data edition [0.0] (branch 0, version 0).
<soapenv:Envelope .../>
<soapenv:Header/>
<soapenv:Body>
<int:getNewLoadID>
<dataLocationName>CustomerMDMHub</dataLocationName>
<dataBranchID>0</dataBranchID>
<programName>Custom ETL</programName>
<loadDescription>Initial Load for the Hub</loadDescription>
</int:getNewLoadID>
</soapenv:Body>
</soapenv:Envelope>
The response to this request contains the Load ID: 141.
<soap:Envelope .../>
<soap:Body>
<ns1:getNewLoadIDResponse ...>
<return>141</return>
</ns1:getNewLoadIDResponse>
</soap:Body>
</soap:Envelope>
Loading data consists in inserting new source records in the SD tables for the set of entities taken into account in the integration job.
When loading data in the SD tables:
Make sure to all the load tables that will be taken into account in the integration job. For example, if the integration job processes the
Customer and
Contact entities, then you should load the
SD_CUSTOMER
and
SD_CONTACT
tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the
Customer and
Contact entities, data loaded the
SD_EMPLOYEE
table will not be taken into account by the integration job to certify golden records for the
Employee entity.
There is no required order to load the SD_ tables, as no foreign keys are implemented on the SD tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the
SD Columns to Load section.
It is recommended to use a database transaction when writing to the SD tables just before submitting the load.
Canceling an external load – using the INTEGRATION_LOAD.CANCEL_LOAD() function or the cancelLoad() web service call – cancels the load but
does not delete records from the SD tables. Writing for example in an auto-commit transaction in the SD tables then canceling the load will leave useless information in the SD tables.
Using a transaction gives the capability to rollback all changes on the SD_ tables when canceling a load.
This section provides some guidance for loading the columns of the SD tables.
The following system columns must be loaded as indicated:
B_LOADID
: This column must be loaded with the
Load ID provided by
INTEGRATION_LOAD.GET_NEW_LOADID()
or
getNewLoadID()
call.
B_BRANCHID
: This column must be loaded with the ID of the data branch into which data is loaded. This value is usually
0
(zero).
B_CLASSNAME
: Name of the entity (or class) being loaded. When inheritance is used, the same SD table stores data for all parent and child classes. Set explicitly
B_CLASSNAME
to the name of the entity for which data is being published. For example:
Person
,
Party
,
Company
. Seethe
Class Name and Inheritance section for more details.
B_PUBID
: This column must be loaded with a
Publisher Code. For example:
CRM
,
MKT
, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the
Publishers node in the
Model Edition view in the
Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a
Preferred Source consolidation strategy.
Note: Within a single load, you can load the records from various publishers, using the
B_PUBID
column to identify each publisher
The primary key that you load in the SD table allows identifying the source record from the source system – identified by Publisher Code in
B_PUDIB
.
The primary key column to load depends on the Matching Behavior of the entity.
If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.
If the entity uses Fuzzy Matching, then you must load in the
B_SOURCEID
column the value of the primary key from the source system. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the
B_SOURCEID
column.
Note: The
B_SOURCEID
column is aVARCHAR(128)
column. Make sure to perform the appropriate conversions for loading this column.
When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the matching behavior the referenced entity.
If the referenced entity is an ID Matching entity, then you need to load the column representing the referencing attribute. This column is
F_<Physical Name of the Reference To Role Name>
.
For example, if
Customer references
Employee and this entity uses ID Matching, then you must load in
SD_CUSTOMER
the
F_EMPLOYEE
column with the primary key of the source employee record referenced by each customer record.
If the referenced entity is a Fuzzy Matching entity, then you need to load two columns:
FS_<Physical Name of the Referenced To Role Name>
: Load this column with the Source ID of the referenced record.
FP_<Physical Name of the Referenced To Role Name>
: Code of the publisher of the referenced record.
Warning: Note that these columns should be considered together. You should not load the
FP_
column with a publisher code and leaveFS_
to a null value, and vice versa.
For example, if
Contact references
Customer and this entity use Fuzzy Matching, you must load in
SD_CONTACT
the following columns:
FP_CUSTOMER
: Code of the publisher providing the customer referenced by the given contact. E.g.:
MKT
.
FS_CUSTOMER
: Source ID of the customer referenced by the given contact. E.g.:
81239
.
You should load the attribute columns relevant for the entity that you are loading.
Make sure to load:
B_CLASSNAME
) that you are loading.
Note: Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation.
The following columns do not need to be loaded or can be optionally loaded:
B_STATUS
: This column should not be loaded as it will be internally set by the integration job.
B_ORIGINALBATCHID
: This column is not used for external loads and should not be loaded.
B_CREATOR
,
B_UPDATOR
: The columns can be optionally loaded. Their value will default to the name of user submitting the load.
B_CREDATE
,
B_UPDDATE
: The columns can be optionally loaded. Their value will default to the submission timestamp.
Initializing an external load uses the following parameters:
load_id
: Load ID returned by the load initialization.
integration_job
: Name of the integration job to process this load.
user_name
: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server. With the web service operation, the user is not needed as a call to the web service requires authentication to the application server.
Note: To mandate the use of an authenticated user, you must use the web service interface.
The
INTEGRATION_LOAD.SUBMIT_LOAD
function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.
vBatch_id := <semarchy_repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
<load_id> /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
'<integration_job>' /* Name of the Integration Job to trigger. */
'<user_name>' /* User who has initialized the load. */
);
The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name
INTEGRATE_DATA
. The repository schema is
SEMREPO
. The returned Batch ID is stored in the
vBatch_id
variable.
vBatch_id := SEMREPO.INTEGRATION_LOAD.SUBMIT_LOAD(
22,
'INTEGRATE_DATA',
'John Doe' );
This web service sample call on the
submitLoad
operation submits an external load identified by its Load ID and returns a Batch ID in the response.
<soapenv:Envelope ...>
<soapenv:Header/>
<soapenv:Body>
<ws:submitLoad>
<loadID><!-- load_id --></loadID>
<jobName><!-- integration_job --></jobName>
</ws:submitLoad>
</soapenv:Body>
</soapenv:Envelope>
The following example is a request submitting an external load identified by the Load ID 125, with the
INTEGRATE_DATA
job.
<soapenv:Envelope .../>
<soapenv:Body>
<int:submitLoad>
<loadID>125</loadID>
<jobName>INTEGRATE_DATA</jobName>
</int:submitLoad>
</soapenv:Body>
</soapenv:Envelope>
The response to this request contains the Batch ID: 456.
<soap:Envelope .../>
<soap:Body>
<ns1:submitLoadResponse ...>
<return>456</return>
</ns1:submitLoadResponse>
</soap:Body>
</soap:Envelope>
Note: No user name required here since the Web Service will use the credentials of the connected user (WS_Security).
Note: You can only submit the authenticated user’s loads.
Canceling a load is performed using a PL/SQL function call or a web service call, with the following parameters:
load_id
: Load ID returned by the load initialization.
user_name
: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server. With the web service operation, the user is not needed as a call to the web service requires authentication to the application server.
Note: To mandate the use of an authenticated user, you must use the web service interface.
The following example performs a function call to
INTEGRATION_LOAD.CANCEL_LOAD
to cancel an external load identified by the Load ID 22.
SEMREPO.INTEGRATION_LOAD.CANCEL_LOAD(
22,
'John Doe' );
This web service sample call on the
cancelLoad
operation cancels a load identified by its Load ID.
<soapenv:Envelope ...>
<soapenv:Header/>
<soapenv:Body>
<ws:cancelLoad>
<loadID><!-- load_id --></loadID>
</ws:cancelLoad>
</soapenv:Body>
</soapenv:Envelope>
Note: No user name required here since the Web Service will use the credentials of the connected user (WS_Security).
Note: You can only cancel the authenticated user’s loads.
The following example is a request canceling an external load identified by the Load ID 125.
<soapenv:Envelope .../>
<soapenv:Body>
<int:cancelLoad>
<loadID>125</loadID>
</int:cancelLoad>
</soapenv:Body></soapenv:Envelope>
Previous | Top | Next |
Integration Job | Consuming Data from the Hub |