Monitoring

For more information on the design decisions behind Pipeline monitoring, see Design Documents - Monitoring

Overview

Monitoring Data Pipelines is intended to answer the following questions:

  1. For a given Pipeline, what is its top-level status? Is it running, completed, or in a degraded state?

  2. For a given Pipeline, what are the individual statuses of each Transformer in it? Are they running, completed, or in a degraded state?

  3. Why did a Pipeline’s status transition from one state to another?

  4. When did a Pipeline’s status transition?

This document will outline the ways that status is reported for a Pipeline, intended for developers working on SDL Data Pipelines.

Where Status is Reported

The source of truth for the pipeline CR comes from the operator code.

Pipelines are instantiated via the Pipeline Kubernetes CR, which tracks both the spec for the Pipeline (i.e. how each resident Transformer should be instantiated) as well as the status of the Pipeline (via the Status subresource). There are three top-level fields denoting status:

  • currentStatus: A top-level indication of the current status of the Pipeline. Options are Inactive, Active, Finished, or Invalid.

  • conditions: An array of Kubernetes Conditions referring to the object. Note that Kubernetes Conditions serve a very specific role and may not be what you expect.

  • stages: Individual statuses for each Transformer in this Pipeline.

See Pipeline Status for more information

A Transformer status has the following structure:

  • status: The status of this individual Transformer.

  • message: A human-readable message referring to the status of this Transformer.

  • started: When this Transformer started (if at all)

  • stopped: When this Transformer stopped (if at all)

See Transformer Status for more information.

Pipeline Status

A Pipeline’s currentStatus may be one of a few values:

  • Active: The Pipeline is currently active without errors. Note that in the case of periodic tasks (i.e. cron jobs), this is higher-level than simply "are all stages running?"

  • Finished: All stages in the Pipeline have finished.

  • Degraded: One or more Pipeline stages have failed, though all were instantiated successfully..

  • Invalid: One or more Pipeline stages failed to be instantiated, which caused a rollback.

Transformer Status

A Transformer’s status may be one of a few values:

  • Unknown: The controller failed to determine the current state of the stage.

  • FailedToInstantiate: The stage failed to be instantiated.

  • Active: The stage is currently running.

  • Complete: All work for this stage has completed.

  • Degraded: One or more operations in this stage have failed. Note that this doesn’t mean ALL have failed.