Inputs and Outputs

The inputs and outputs sections define connections the transformer will require. In the case of Kubernetes Jobs, each input or output is mapped to an environment variable whose value is provided by the client. Think of these as the arrows pointing in and out of the transformer in a drag-and-drop UI:

transformer io

Given that the data pipeline engine is dynamic by design, the exact references to inputs/outputs (i.e. Kafka Topics, locations on S3, etc.) shouldn’t be known ahead of time (otherwise, what would connecting them to other components mean?).

Thus, inputs/outputs are always dynamically configured at runtime. Statically setting configuration can be achieved by (in the case of a K8s app) adding to configuration.static_environment.

Inputs and outputs are defined as a map where each key is the connection name (which becomes an environment variable in the transformer):

"inputs": {
  "KAFKA_TOPIC_NAME": {
    "display_name": "Destination Topic",
    "conn_type": "INTERNAL_KAFKA",
    "arity": { "min": 1, "max": 1 }
  }
}

conn_type can be one of these values:

  • INTERNAL_KAFKA - inputs/outputs will map to a Kafka Topic

  • INTERNAL_MINIO - inputs/outputs will map to a path on MinIO, i.e s3://abc

  • INTERNAL_ICEBERG - inputs/outputs will map to an Iceberg table

  • DATASET - inputs/outputs reference a dataset registered in the catalog

  • TRANSFORMER - inputs/outputs are not mapped to an underlying data storage driver; instead, communication between two transformers is handled directly by them

  • RDP_WORKFLOW - inputs/outputs are handled by the SDL workflow runtime. Connections of this type must define workflow_accepted_types, which specifies the data types the connection can accept (e.g. string, number, double, boolean, map, array)