Data Tiering & Storage

SDL implements a multi-tier storage architecture optimized for different access patterns and data freshness requirements. Data moves through well-defined tiers — from raw ingestion through normalization and fusion — with each tier providing the appropriate balance of latency, throughput, and storage efficiency.

Storage Tiers

The platform organizes operational data into three access tiers based on age and query frequency:

Tier	Latency	Description	Capacity Profile
Hot	<2ms	Last 10 minutes of operational data. Optimized for real-time queries and the highest throughput requirements.	Highest throughput
Warm	<2 sec	Last 2 weeks of data. Balances query speed with increased storage capacity for recent historical access.	Balanced speed and capacity
Cold	<5 sec	Older data beyond the warm window. Optimized for storage efficiency while maintaining query latency under 5 seconds.	Optimized for storage efficiency

Tier

Latency

Description

Capacity Profile

Hot

<2ms

Last 10 minutes of operational data. Optimized for real-time queries and the highest throughput requirements.

Highest throughput

Warm

<2 sec

Last 2 weeks of data. Balances query speed with increased storage capacity for recent historical access.

Balanced speed and capacity

Cold

<5 sec

Older data beyond the warm window. Optimized for storage efficiency while maintaining query latency under 5 seconds.

Optimized for storage efficiency

Tier boundaries are configurable per deployment. Data transitions between tiers automatically based on age, with no manual intervention required.

Metadata Store

Beyond the operational access tiers, the platform maintains a structured metadata pipeline that tracks data quality and provenance:

Bronze (RAW): Ingested data in its original format. This tier serves as the immutable audit copy, preserving the exact representation received from the source system. No transformations are applied at this stage.
Silver (Standard Schema): Data normalized to the canonical data model. Records are validated against the schema, enriched with standard metadata (timestamps, source identifiers, classification markings), and made available for downstream processing.
Gold (Fused): Conflict-resolved, multi-source fused entities with full provenance chains. When multiple sources report on the same real-world entity, the fusion process reconciles conflicts, merges attributes, and produces a single authoritative record that retains links to every contributing source.

Object Storage

SDL provides S3-compatible object storage for bulk data, attachments, and artifacts that fall outside the structured entity model:

S3-compatible APIs for programmatic access using standard tooling and client libraries.
LZ4 compression applied to stored objects for storage efficiency without significant CPU overhead.
Lifecycle policies for automatic tier migration, moving infrequently accessed objects to lower-cost storage classes on a configurable schedule.
Multi-region replication where the deployment topology and network connectivity support it, ensuring object availability across geographically distributed nodes.

ETL & Enrichment Pipelines

Data movement through the tier hierarchy is managed by configurable pipelines. Operators build, monitor, and manage pipelines through the pipeline catalog, which provides a centralized view of all active, inactive, and degraded pipelines across the deployment.

Figure 1. Pipeline Catalog — centralized view of all configured pipelines with status, scheduling, and lifecycle controls

Each pipeline is assembled visually as a directed graph of stages. Operators chain together data generators, format transformers, storage connectors, and enrichment steps using a drag-and-drop builder. The built-in data preview shows live streaming records flowing through the pipeline, enabling operators to verify transformations in real time before promoting a pipeline to production.

Figure 2. Pipeline Detail — visual stage builder with live data preview showing a CoT-to-GeoServer transformation chain

Low/No-Code ETL configuration: Pipeline definitions use declarative configuration that allows operators to define extraction, transformation, and loading workflows without writing custom code.
Enrichment pipelines: Dedicated enrichment stages add context to records as they progress through the tiers. Enrichments include geospatial correlation, temporal alignment, and classification marking assignment based on source and content analysis.
Schema validation at tier boundaries: Each transition between tiers — Bronze to Silver, Silver to Gold — includes schema validation to ensure data integrity. Records that fail validation are flagged and routed to exception handling rather than silently propagated.
Lineage tracking: The platform maintains a complete lineage record from raw ingestion through fusion. Every transformation, enrichment, and merge operation is logged, enabling operators to trace any fused entity back to its original source records.