Transformer Basics
Each transformer is a function of:
-
Code (container, Flink SQL, etc.)
-
Inputs and Outputs (Kafka Topics, etc.)
-
Configuration supplied by the pipeline engine
-
Configuration supplied by clients when instantiating it as part of a pipeline
A transformer can be defined to take any number of inputs or outputs, where each input/output can be a Kafka Topic path on MinIO, or direct connection to another Transformer. For Kubernetes-based transformers (i.e. containers), environment variables are the main way dynamic configuration is provided. Input and output dataset references (i.e. Kafka Topic to pub/sub, MinIO file path) will be provided by the pipeline engine as environment variables, though their names are controlled by the transformer’s JSON configuration.
Writing the Transformer Template JSON is how the pipeline engine maps a client’s request to something running. A breakdown of the various fields is as follows:
{
"uid": "649152b4-ac10-4e2d-9a58-f3ec00d6d1c1",
"name": "My Transformer",
"description": "Transforms data",
"status": "available",
"security_markings": "UNCLASSIFIED",
"types": ["sink"],
"inputs": {
"SOURCE_TOPIC": {
"display_name": "Source Topic",
"conn_type": "INTERNAL_KAFKA",
"arity": {
"min": 1,
"max": 1
}
}
},
"outputs": {
"DEST_TOPIC": {
"display_name": "Destination Topic",
"conn_type": "INTERNAL_KAFKA",
"arity": {
"min": 0,
"max": 1
}
}
},
"configuration": {
"environment": [
{
"name": "MY_VAR_1",
"description": "Something useful for UIs",
"default_value": "83412"
},
{
"name": "MY_VAR_2",
"description": "A required variable",
"required": true
},
{
"name": "MY_SECRET",
"description": "A sensitive variable",
"sensitive": true
}
],
"static_environment": [
{"name": "STATIC_VAR_1", "value": "sdl-backend"},
{"name": "KEYCLOAK_CLIENT_SECRET", "valueFrom": {"secretKeyRef": {"key": "rdpPlatformClientSecret", "name": "keycloak-realm-init"}}}
],
"engine_provided_environment": [
"KEYCLOAK_REALM_URL",
"KEYCLOAK_URL",
"KEYCLOAK_REALM"
],
"environment_mapping": {
"KEYCLOAK_REALM_URL": "MY_CUSTOM_REALM_URL"
}
},
"instantiation": {
"job_image": {
"image": "ghcr.io/raft-tech/my-transformer:v1.0",
"pull_policy": "IfNotPresent",
"image_pull_secret": "regcred-default",
"default_replicas": 1,
"args": ["--arg", "val"]
}
}
}
The key sections are:
- uid
-
Unique identifier for the transformer. If one is not specified, a random UUID will be generated and assigned.
- name, description, status
-
Basic metadata shown in the UI.
- security_markings
-
Classification level applied to this template (e.g.
UNCLASSIFIED,SECRET). - types
-
Array of transformer types. Options:
sink,source,ai_agent,ai_svc,rdp_workflow,geoserver_client. - inputs / outputs
-
Named data connections. Each key becomes an environment variable name for container-based transformers.
conn_typespecifies the storage medium (e.g.INTERNAL_KAFKA).aritycontrols how many connections are valid (min/max). - configuration.environment
-
Array of environment variables that clients fill in when creating a pipeline. Each can have a
default_value, be markedrequired, or flaggedsensitive. - configuration.static_environment
-
Environment set as-is on the container. Supports both literal values and Kubernetes secret references (corev1.EnvVar format).
- configuration.engine_provided_environment
-
Environment variable names populated automatically by the pipeline engine (e.g. Keycloak URLs).
- configuration.environment_mapping
-
Renames engine-provided variables to custom names expected by your code.
- instantiation.job_image
-
Container image configuration — image name, pull policy, pull secret, default replica count, and optional args.