Federated Query & Virtual Knowledge Graph Demo

This guide walks through a step-by-step demonstration of SDL federated query capabilities. It showcases "zero ETL" — querying data where it lives without movement or duplication.

Prerequisites

  • SDL deployment with federated SQL and VKG engine enabled

  • Sample data loaded in at least 2 different data sources (for example, a relational database and a streaming topic)

  • SPARQL endpoint access

  • curl or SDK client installed

Demo Overview

This demo illustrates the following capabilities:

  • Federated SQL — query across multiple heterogeneous data sources with a single SQL statement.

  • Cross-source joins — join data from different source types without pre-staging or ETL.

  • Virtual Knowledge Graph (VKG) — query an ontology-mapped knowledge graph using SPARQL.

  • Natural language query — ask questions in plain English and receive structured results.

  • Asynchronous analytics — submit long-running queries and retrieve results when ready.

Step 1: Identify Data Sources

List the data sources registered with the federated query engine.

curl -s https://sdl.example.com/api/v1/query/sources | jq .

Example response:

{
  "sources": [
    {
      "name": "operations_db",
      "type": "relational",
      "description": "Operational relational database"
    },
    {
      "name": "sensor_stream",
      "type": "streaming",
      "description": "Real-time sensor data topic"
    },
    {
      "name": "reports_store",
      "type": "object-storage",
      "description": "Archived reports in object storage"
    }
  ]
}

Step 2: Simple Federated SQL Query

Run a SQL query that targets a single data source through the federated engine.

curl -X POST https://sdl.example.com/api/v1/query/sql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT sensor_id, temperature, location, timestamp FROM operations_db.sensor_readings WHERE temperature > 80.0 ORDER BY timestamp DESC LIMIT 10"
  }'

The federated engine routes the query to the appropriate data source and returns results in a unified format.

Step 3: Cross-Source Join

Join data from a relational database and a streaming topic in a single query.

curl -X POST https://sdl.example.com/api/v1/query/sql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT r.sensor_id, r.location, s.current_temp, s.event_time FROM operations_db.sensor_registry r JOIN sensor_stream.readings s ON r.sensor_id = s.sensor_id WHERE s.current_temp > r.threshold ORDER BY s.event_time DESC LIMIT 20"
  }'

This query joins static registration data from the relational database with live readings from the streaming topic — without moving data between systems.

Step 4: SPARQL Ontology Query

Query the virtual knowledge graph using SPARQL.

curl -X POST https://sdl.example.com/api/v1/query/sparql \
  -H "Content-Type: application/sparql-query" \
  -d 'PREFIX sdl: <http://sdl.example.com/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?entity ?type ?location
WHERE {
  ?entity a rdp:SensorReading ;
          rdp:locatedIn ?location ;
          rdp:readingType ?type .
  FILTER (?type = rdp:Temperature)
}
LIMIT 25'

The VKG engine maps the SPARQL query to the underlying data sources through the ontology, returning results without requiring the user to know where the data physically resides.

Step 5: Natural Language Query

Submit a question in plain English and let the platform translate it into the appropriate query.

curl -X POST https://sdl.example.com/api/v1/query/natural-language \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Which sensors in building 7 have reported temperatures above 80 degrees in the last hour?"
  }'

Example response:

{
  "answer": "3 sensors in building 7 reported temperatures above 80\u00b0F in the last hour.",
  "generated_query": "SELECT sensor_id, temperature, timestamp FROM operations_db.sensor_readings WHERE location = 'building-7' AND temperature > 80.0 AND timestamp >= NOW() - INTERVAL '1 hour'",
  "results": [
    { "sensor_id": "sensor-alpha-01", "temperature": 82.1, "timestamp": "2026-02-26T14:32:00Z" },
    { "sensor_id": "sensor-alpha-04", "temperature": 85.7, "timestamp": "2026-02-26T14:28:00Z" },
    { "sensor_id": "sensor-alpha-07", "temperature": 80.3, "timestamp": "2026-02-26T14:15:00Z" }
  ]
}

Step 6: Asynchronous Analytics

Submit a long-running analytical query and retrieve results asynchronously.

# Submit the query
curl -X POST https://sdl.example.com/api/v1/query/async \
  -H "Content-Type: application/json" \
  -d '{
    "query": "SELECT location, AVG(temperature) as avg_temp, MAX(temperature) as max_temp, COUNT(*) as reading_count FROM operations_db.sensor_readings GROUP BY location ORDER BY avg_temp DESC"
  }'

Example response:

{
  "query_id": "q-abc123-def456",
  "status": "RUNNING",
  "submitted_at": "2026-02-26T14:35:00Z"
}

Poll for completion and retrieve results:

# Check query status
curl -s https://sdl.example.com/api/v1/query/async/q-abc123-def456/status | jq .

# Retrieve results once status is COMPLETED
curl -s https://sdl.example.com/api/v1/query/async/q-abc123-def456/results | jq .

Expected Results

Step Expected Behavior

Step 1: Identify Data Sources

All registered data sources are listed with their names, types, and descriptions.

Step 2: Simple Federated SQL Query

Results are returned from the target data source in a unified JSON format.

Step 3: Cross-Source Join

Data from the relational database and streaming topic is joined and returned as a single result set.

Step 4: SPARQL Ontology Query

The VKG engine resolves the SPARQL query against the ontology and returns matching entities.

Step 5: Natural Language Query

The platform translates the English question into a structured query, executes it, and returns both the answer and the generated query.

Step 6: Asynchronous Analytics

The query is accepted and assigned a query_id; results are available once the status transitions to COMPLETED.