`file_configuration`

While the default way most transformers are configured is via environment (especially in the case of Kubernetes-based apps), environment variables aren’t suitable for every use case. The pipeline engine also supports configuration via files whose contents are passed to the Transformer at runtime. In the case of a Kubernetes-based Transformer, file contents are stored in a Secret and volume-mounted into the Transformer when it is created. The model for providing file configuration is meant to be as flexible as possible while not allowing users to mount files in arbitrary locations in a Transformer, primarily for security reasons. For that reason, all file configuration must be specified in a Transformer’s template when it is created by a Transformer author, similar to how environment configuration is done. Both the values that users provide for file configuration and their default values as provided in a Transformer template are base64-encoded strings, which allows users to pass newlines without breaking JSON syntax.

A Transformer template with file configuration would add a file_configuration block under the main configuration block.

Example

"configuration": {
  // ...
  "file_configuration": [
    {
      "name": "config-1",
      "path": "/config/config-1.yaml", // mounted at /config
      "default_value_base64": "SGVsbG8gZnJvbSBjb25maWctMQo="
    },
    {
      "name": "config-2",
      "path": "/config/config-2.yaml", // mounted at /config (sharing Volume with config-1)
      "default_value_base64": "SGVsbG8gZnJvbSBjb25maWctMgo="
    },
    {
      "name": "config-3",
      "path": "config-3.yaml", // mounted at /opt/transformer-configuration
      "default_value_base64": "SGVsbG8gZnJvbSBjb25maWctMwo="
    },
    {
      "name": "config #4",
      "path": "config-4.yaml", // mounted at /opt/transformer-configuration (sharing Volume with config-3)
      "required": true // The user MUST provide a value (similarly to environment)
    }
  ]
  // ...
}

file_configuration is a list of objects, each containing a single file to be configurable by a user when submitting a pipeline. Each item of file_configuration has the following fields:

name: A human-readable name for the configuration. Does not need to be a filename and can include spaces. When submitting pipelines, users will use this field to tell the engine which configuration they’re supplying.
description: A description of the file configuration.
path: The path in the Transformer where this configuration will be mounted. If path is absolute, it functions as expected. In Kubernetes Transformers, if path is relative, the configuration is stored at /opt/transformer-configuration.
default_value_base64: If the user doesn’t supply a value, this will be provided.
required: Whether or not a pipeline submission should fail if the user does not provide a value. Note that this field has no meaning if default_value_base64 is also supplied.

The pipeline engine does not allow user-defined mount paths, as that would impact security. It also does not allow variadic arguments, i.e. defining a directory where the user can provide as many configuration files as they want. This use case is best served by storing those files on MinIO/S3 and loading them into the Transformer as a dataset.

When submitting a pipeline, a user would provide file configurations in a similar manner to environment, as a JSON map:

"configuration": {
  // ...
  "file_configuration": {
    "config #4": { // Corresponds to 'name'
      "base64_encoded_text": "T3ZlcnJpZGVuIGJ5IHVzZXIgYXQgc3VibWlzc2lvbiB0aW1lIQo="
    }
  }
  // ...
}