Introduction to job execution

Jobs are processing units that run in the execution engine.

Job types

There are three main types of jobs running in the execution engine:

  • Integration Jobs that process incoming batches to perform golden data certification. An Integration Job is a job executed by Semarchy xDM to integrate and certify source data into golden records. This job is generated from the rules (enrichment, validation, etc) defined in the model and deployed with the model edition.

  • Deployment Jobs that deploy new model editions in data locations.

  • Purge Jobs: The jobs for purge the logs and data history according to the retention policies.

Queues

The Execution Engine processes the jobs in Queues. Queues work in First-In-First-Out (FIFO) mode. When a job runs in the queue, the next jobs are queued and wait for their turn to run. To run two jobs in parallel, it is necessary to distribute them into different queues.

Queues are grouped into Clusters. There is one cluster per Data Location, named after the data location.

Specific queues and cluster exist for administrative jobs:

  • For each data location cluster, a specific System Queue called SEM_SYS_QUEUE is automatically created. This queue is used to run administrative operations for the data location. For example, this queue processes the deployment jobs updating the data structures in the data location.

  • A specific System Cluster cluster called SEM_SYS_CLUSTER, which contains a single SEM_SYS_QUEUE queue, is used to run platform-level maintenance operations.

Job execution order

As a general rule, integration jobs are processed in their queues, and have the same priority.
There are two exceptions to this rule:

  • Jobs updating the data location, principally model edition Deployment Jobs.

  • Platform Maintenance Jobs that updating the entire platform.

Deployment jobs

When a new model edition is deployed and requires data structure changes, DDL commands are issued as part of a job called DB Install<model edition name>. This job is launched in the in the SEM_SYS_QUEUE queue of the data location cluster.

This job modifies the tables used by the DML statements from the integration jobs. As a consequence, it needs to run while no integration job runs. This job takes precedence over all other queued jobs in the cluster, which means that:

  1. Jobs currently running in the cluster are completed normally.

  2. All the queues in the cluster, except the SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs remain in the queue and are no longer executed.

  3. The model edition deployment job is executed in the SEM_SYS_QUEUE.

  4. When this job is finished, the other queues return to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimum downtime of the integration activity while avoiding conflicts between integration jobs and model edition deployment.

Platform maintenance jobs

If a job is queued in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE queue, it takes precedence over all other queued jobs in the execution engine.

This means that:

  1. Jobs currently running in all the clusters are completed.

  2. All the clusters and queues except the SEM_SYS_CLUSTER/SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs are no longer executed in these queues/clusters.

  3. The job in the in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE is executed.

  4. When this job is finished, the other queues are moved to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimal disruption of the platform activity while avoiding conflicts between the platform activity and maintenance operations.

Queue behavior on error

When a job running in a queue encounters a run-time error, it behaves differently depending on the queue configuration:

  • If the queue is configured to Suspend on Error, the job hangs on the error point, and blocks the rest of the queued jobs. This job can be resumed when the cause of the error is fixed, or can be canceled by user choice.

  • If the queue is not configured to Suspend on Error, the job fails automatically and the next jobs in the queue are executed. The failed job cannot be restarted.

The integration jobs are processed in a FIFO mode, a job that is failed automatically or canceled by user choice cannot be restarted. To resubmit the source data for certification, the external load needs to be resubmitted entirely as a new load.
The integration job performs a commit after each task. As a consequence, when a job fails or is suspended, already processed entities have their golden data certified and committed in the hub.
A user can explicitly choose to halt a running job by suspending it. When such a use operation is performed, the job is considered in error and can be restarted or canceled.

Suspending a job on error is the preferred configuration under the following assumptions:

  1. All the data in a batch needs to be integrated as one single atomic operation.
    For example, due to referential integrity, it is not possible to integrate contacts without customers and vice versa. Suspending the job guarantees that it can be continued - after fixing the cause of the error - with the data location preserved in the same state.

  2. Batches and integration jobs are submitted in a precise sequence that represents the changes in the source, and need to be processed in the order they were submitted.
    For example, missing a data value change in the suspended batch that may impact the consolidation of future batches. Suspending the job guarantees that the jobs are processed in their exact submission sequence, and no batch is skipped without an explicit user choice.

There may be some cases when this behavior can be changed:

  • If the batches/jobs do not have strong integrity or sequencing requirement, then they can be skipped on error by default. These jobs can run in a queue where Suspend on Error is disabled.

  • If the integration velocity is critical for making golden data available as quickly as possible, it is possible to configure the queue running the integration job with Suspend on Error disabled.

Queue status

A queue is in one the following statuses:

  • READY: The queue is available for processing jobs.

  • SUSPENDED: The queue is blocked because a job has encountered an error or was suspended by the user. This job remains suspended. Queued jobs are not processed until the queues becomes READY again, either when the job is cancelled or finishes successfully. For more information, see Manage job execution.

  • BLOCKED: When a job is running in the SEM_SYS_QUEUE queue of the cluster, the other queues are moved to this status. Jobs cannot be executed in a blocked queue and remain queued until the queue becomes READY again.

A cluster can be in one the following statuses:

  • READY: The cluster is not blocked by the SEM_SYS_CLUSTER cluster, and queues under this cluster can process jobs.

  • BLOCKED: The cluster is blocked when a job is running in the SEM_SYS_CLUSTER cluster. When a cluster is blocked, all its attached queues are also blocked.