Federated Query & Virtual Knowledge Graph

{rdp-product-acronym} provides a "zero ETL" federated query capability that lets operators and analysts query data wherever it lives — without copying, transforming, or centralizing it first. A Virtual Knowledge Graph (VKG) engine virtualizes heterogeneous data sources into a single semantic layer, translating user queries into optimized native calls against each source system.

Why Federated Query

Traditional data integration requires extracting data from operational systems, transforming it into a common schema, and loading it into a centralized warehouse. This ETL approach introduces latency, duplicates data, and creates maintenance overhead every time a source schema changes.

{rdp-product-acronym} eliminates this burden by querying data in place:

No data movement — data remains in its source system. No copies, no synchronization jobs, no stale extracts.
Real-time access — queries hit live operational data, not a batch-loaded snapshot from hours or days ago.
Semantic abstraction — the VKG layer hides physical data locations, schemas, and query dialects behind a unified ontology.
Broad user access — technical users write SPARQL queries; non-technical users ask questions in natural language and the VKG translates them automatically.

How It Works

The federated query engine operates in three stages.

Stage 1: Query Submission

Users submit queries through one of several interfaces:

SPARQL endpoint — for analysts and developers who want precise control over the query structure.
Natural-language interface — for operators who describe what they need in plain language. The VKG translates the natural-language request into a structured query against the ontology.
Pre-built query templates — parameterized queries for common operational patterns such as "show all tracks within 50 km of a reference point in the last 4 hours."

Stage 2: Query Translation & Optimization

The VKG engine receives the semantic query and performs the following steps:

Ontology mapping — the query is matched against the platform’s ontology (BFO/CCO-compliant with DICO and JIKO extensions) to determine which source systems hold the requested data.
Query decomposition — the engine breaks the query into sub-queries, one per source system, each expressed in that source’s native query dialect.
Cross-source planning — when a query spans multiple sources, the engine determines the optimal join strategy: push joins down to source systems where possible, or pull partial results and join them in the federation layer.
Optimization — predicate pushdown, projection pruning, and partition elimination reduce the data transferred from each source.

Stage 3: Execution & Assembly

Sub-queries execute in parallel against their respective source systems. The federation engine collects partial results, performs any remaining joins or aggregations, and returns a unified result set to the caller.

Connector Architecture

The VKG engine communicates with data sources through a set of typed connectors. Each connector translates federated sub-queries into the source’s native protocol and query language.

Connector Type	Description
Relational Database	Connects to SQL-compatible databases. Translates VKG sub-queries into native SQL and pushes predicates, joins, and aggregations to the database engine.
Streaming Topics	Queries the latest state or a windowed range of records from event streaming topics. Supports time-range and key-based filtering.
Object Storage	Queries structured files (Parquet, CSV, JSON) stored in S3-compatible object storage. Leverages columnar metadata for predicate pushdown and partition pruning.
Geospatial Services	Connects to geospatial databases and services for spatial queries (bounding box, radius, polygon intersection). Translates spatial predicates into the service’s native format.
Knowledge Graph / SPARQL	Federates to external SPARQL endpoints or RDF stores, enabling cross-graph queries.

Connector Type

Description

Relational Database

Connects to SQL-compatible databases. Translates VKG sub-queries into native SQL and pushes predicates, joins, and aggregations to the database engine.

Streaming Topics

Queries the latest state or a windowed range of records from event streaming topics. Supports time-range and key-based filtering.

Object Storage

Queries structured files (Parquet, CSV, JSON) stored in S3-compatible object storage. Leverages columnar metadata for predicate pushdown and partition pruning.

Geospatial Services

Connects to geospatial databases and services for spatial queries (bounding box, radius, polygon intersection). Translates spatial predicates into the service’s native format.

Knowledge Graph / SPARQL

Federates to external SPARQL endpoints or RDF stores, enabling cross-graph queries.

Connectors are configured declaratively. Operators register a data source by specifying its type, connection parameters, and an ontology mapping that describes how the source’s schema maps to the platform’s semantic model.

Query Capabilities

Ad-Hoc Interactive Queries

Analysts submit free-form queries and receive results in seconds. The VKG engine handles query planning, optimization, and execution transparently. Results can be returned as tables, GeoJSON for map display, or structured data for downstream processing.

Pre-Built Query Templates

Common operational queries are captured as parameterized templates. Operators fill in mission-specific values — time range, geographic bounds, entity type — and execute the template without writing any query syntax.

Templates are version-controlled and can be shared across deployments through the platform’s configuration management system.

Asynchronous Query Scheduling

Long-running analytical queries can be submitted asynchronously. The platform queues the query, executes it when resources are available, and notifies the user when results are ready. Scheduled queries support recurring execution for periodic reporting.

Result Caching

Frequently accessed query results are cached at the federation layer. The cache respects a configurable time-to-live (TTL) and is invalidated when underlying source data changes. Caching reduces load on source systems and improves response times for repeated queries.

Virtual Knowledge Graph in Detail

The VKG layer is the semantic backbone of {rdp-product-acronym}'s federated query capability.

Ontology Layer

The platform’s ontology is built on established upper-ontology standards:

BFO (Basic Formal Ontology) — top-level categories for entities and processes.
CCO (Common Core Ontologies) — mid-level extensions for information, agents, events, and artifacts.
DICO / JIKO — domain-specific extensions for defense intelligence and joint interoperability.

The ontology defines the vocabulary that users query against. Physical data sources are mapped to ontology concepts through OBDA (Ontology-Based Data Access) virtual mappings, which describe how rows, columns, and fields in source systems correspond to ontology classes and properties.

Virtual Mappings

Virtual mappings are declarative rules that connect the ontology to physical data. They are "virtual" because no data is materialized — the mappings are applied at query time to translate semantic queries into native source queries.

A mapping specifies:

The ontology class or property being mapped
The source system and table or topic
The column-to-property correspondence
Any transformation or type-casting rules

Mappings are defined once per source and updated only when the source schema changes. Adding a new data source to the VKG requires only a new mapping definition — no changes to existing queries or to the ontology itself.

Query Translation

When a user submits a SPARQL or natural-language query, the VKG engine:

Resolves the query against the ontology to identify which concepts and properties are referenced.
Looks up the virtual mappings for those concepts to determine which source systems are involved.
Generates native queries (SQL, key-value lookups, spatial queries) for each source.
Executes the native queries and assembles the results into a unified response that conforms to the ontology’s structure.

Security Integration

Federated query results are subject to the same data-policy enforcement as all other {rdp-product-acronym} data flows.

Classification markings are propagated from source records through the query result.
Row-level and column-level policies are evaluated before results are returned, ensuring that users only see data they are authorized to access.
Audit logging records every query submission, the sources accessed, and the policy decisions applied.

See Policy Engine & Data Governance for details on the platform’s policy enforcement model.

Next Steps

Understand the data formats and real-time streaming that feed the VKG: Streaming & Data Processing
Learn about data-policy enforcement on query results: Policy Engine & Data Governance
Explore the ontology and semantic interoperability layer: Semantic Interoperability & Ontology