Transformer Basics

Each transformer is a function of:

  • Code (container, Flink SQL, etc.)

  • Inputs and Outputs (Kafka Topics, etc.)

  • Configuration supplied by the pipeline engine

  • Configuration supplied by clients when instantiating it as part of a pipeline

A transformer can be defined to take any number of inputs or outputs, where each input/output can be a Kafka Topic path on MinIO, or direct connection to another Transformer. For Kubernetes-based transformers (i.e. containers), environment variables are the main way dynamic configuration is provided. Input and output dataset references (i.e. Kafka Topic to pub/sub, MinIO file path) will be provided by the pipeline engine as environment variables, though their names are controlled by the transformer’s JSON configuration.

Writing the Transformer Template JSON is how the pipeline engine maps a client’s request to something running. A breakdown of the various fields is as follows:

{
    "uid": "649152b4-ac10-4e2d-9a58-f3ec00d6d1c1",
    "name": "My Transformer",
    "description": "Transforms data",
    "status": "available",
    "security_markings": "UNCLASSIFIED",
    "types": ["sink"],
    "inputs": {
        "SOURCE_TOPIC": {
            "display_name": "Source Topic",
            "conn_type": "INTERNAL_KAFKA",
            "arity": {
                "min": 1,
                "max": 1
            }
        }
    },
    "outputs": {
        "DEST_TOPIC": {
            "display_name": "Destination Topic",
            "conn_type": "INTERNAL_KAFKA",
            "arity": {
                "min": 0,
                "max": 1
            }
        }
    },
    "configuration": {
        "environment": [
            {
                "name": "MY_VAR_1",
                "description": "Something useful for UIs",
                "default_value": "83412"
            },
            {
                "name": "MY_VAR_2",
                "description": "A required variable",
                "required": true
            },
            {
                "name": "MY_SECRET",
                "description": "A sensitive variable",
                "sensitive": true
            }
        ],
        "static_environment": [
            {"name": "STATIC_VAR_1", "value": "sdl-backend"},
            {"name": "KEYCLOAK_CLIENT_SECRET", "valueFrom": {"secretKeyRef": {"key": "rdpPlatformClientSecret", "name": "keycloak-realm-init"}}}
        ],
        "engine_provided_environment": [
            "KEYCLOAK_REALM_URL",
            "KEYCLOAK_URL",
            "KEYCLOAK_REALM"
        ],
        "environment_mapping": {
            "KEYCLOAK_REALM_URL": "MY_CUSTOM_REALM_URL"
        }
    },
    "instantiation": {
        "job_image": {
            "image": "ghcr.io/raft-tech/my-transformer:v1.0",
            "pull_policy": "IfNotPresent",
            "image_pull_secret": "regcred-default",
            "default_replicas": 1,
            "args": ["--arg", "val"]
        }
    }
}

The key sections are:

uid

Unique identifier for the transformer. If one is not specified, a random UUID will be generated and assigned.

name, description, status

Basic metadata shown in the UI.

security_markings

Classification level applied to this template (e.g. UNCLASSIFIED, SECRET).

types

Array of transformer types. Options: sink, source, ai_agent, ai_svc, rdp_workflow, geoserver_client.

inputs / outputs

Named data connections. Each key becomes an environment variable name for container-based transformers. conn_type specifies the storage medium (e.g. INTERNAL_KAFKA). arity controls how many connections are valid (min/max).

configuration.environment

Array of environment variables that clients fill in when creating a pipeline. Each can have a default_value, be marked required, or flagged sensitive.

configuration.static_environment

Environment set as-is on the container. Supports both literal values and Kubernetes secret references (corev1.EnvVar format).

configuration.engine_provided_environment

Environment variable names populated automatically by the pipeline engine (e.g. Keycloak URLs).

configuration.environment_mapping

Renames engine-provided variables to custom names expected by your code.

instantiation.job_image

Container image configuration — image name, pull policy, pull secret, default replica count, and optional args.