Configuration Management

This document outlines the configuration management strategy for SDL. The primary approach is decentralized configuration management. This ensures consistency and simplifies updates across all deployments. This approach emphasizes distributed control, autonomy, and flexibility while maintaining consistency and governance.

Whenever possible, use GitOps principles, managing our configurations as code in a Git repository.

Key Principles

  1. Autonomy: Each instance of SDL must have control over their own services and configurations

  2. Consistency: Shared standards and best practices across all instances of SDL

  3. Transparency: Configurations are visible and auditable across the organization

  4. Flexibility: Each instance of SDL can adapt configurations to a specific need

  5. Governance: Providing central oversight to ensures security and compliance across instances of SDL

Implementation Details

  • SDL uses Helm for packaging and deploying applications with default and minimal configuration overrides

  • Short-lived environment should leverage the Hacks Scripts and [Override Strcture] to deploy

  • Lond lived environments should leverage ArgoCD and IaC for their deployment

  • Each instance of SDL requires a Git repository for its unique configurations leveraging Customize for environment-specific customizations to overlay changes that are instance-specific

  • Changes made at the root helm repo for SDL propagate to every instance, unless that instance has an override specific to it

  • SDL can leverage HashiCorp Vault with a federated setup for distributed secrets management

  • SDL can leverage Terraform to manage our infrastructure as code as well

Disaster Recovery

The ability to recreate an instance of SDL is built into the system. SDL uses Velero for cross-cluster backup and restore See df-backup-restore for more details

Challenges and Mitigations

Challenge Mitigation

Configuration drift

Regular compliance checks and automated remediation

Increased complexity

Comprehensive documentation and training programs

Performance impact of federated services

Careful performance testing and optimization

Security risks from decentralized control

Strict policy enforcement and regular security audits

Edge Deployments and Policy Management

Edge deployments in our Kubernetes-based data platform introduce unique challenges and opportunities for configuration management. This section outlines how we handle policy changes at the edge and the process of suggesting these changes back to the central deployment.

Edge Deployment Architecture

Our edge deployments consist of:

  • Lightweight Kubernetes clusters (e.g., KinD)

  • Local data processing and storage capabilities

  • Reduced set of services compared to central deployment

  • Local policy enforcement mechanisms

Policy Management at the Edge

  1. Local Policy Enforcement:

    • Use Open Policy Agent (OPA) for policy enforcement at edge locations

    • Deploy a subset of central policies to edge clusters

  2. Policy Customization:

    • Allow limited policy customization for edge-specific requirements

    • Use Kustomize overlays for edge-specific policy adjustments

  3. Policy Synchronization:

    • Implement a pull-based model for edge clusters to fetch policy updates

    • Use GitOps principles for policy distribution

Suggesting Policy Changes from Edge to Central

We implement a feedback loop that allows edge deployments to suggest policy changes back to the central deployment:

  1. Change Proposal Process:

    • Edge cluster administrators can propose policy changes through a Git-based workflow

    • Use pull requests to submit change proposals

  2. Automated Validation:

    • Implement CI/CD pipelines to validate proposed policy changes

    • Use policy simulation tools to assess the impact of changes

  3. Review and Approval:

    • Central platform team reviews proposed changes

    • Collaborate with edge administrators to refine proposals

  4. Integration and Testing:

    • Merge approved changes into a staging environment

    • Conduct thorough testing, including simulation of edge scenarios

  5. Gradual Rollout:

    • Implement a phased rollout of policy changes

    • Use feature flags or policy versioning for controlled deployment

Example: Edge-to-Central Policy Suggestion Workflow

  1. Edge administrator identifies a need for a policy adjustment

  2. Administrator creates a branch in the policy Git repository

  3. Policy changes are made and committed to the branch

  4. A pull request is opened with the proposed changes

  5. CI/CD pipeline runs automated tests and policy simulations

  6. The central platform team reviews the proposal

  7. Iterative feedback and adjustments are made

  8. Approved changes are merged into the staging environment

  9. Comprehensive testing is performed

  10. Changes are gradually rolled out to production, including edge deployments

Challenges and Considerations

  • Connectivity: Ensure the policy suggestion process works with intermittent connectivity

  • Conflict Resolution: Develop strategies for resolving conflicting policy suggestions from multiple edge deployments

  • Performance Impact: Assess the performance impact of policy changes on edge deployments with limited resources

  • Compliance: Ensure all policy changes meet regulatory and compliance requirements across different edge locations