Pinot End to End Test Steps:
Setup (Pre-demo)
-
Update the
df-raft-trino-coordinatorconfigMap to enable group-based access control:
apiVersion: v1
data:
access-control.properties: |
fine-grained-access-control-enabled=true
access-control.name=OPA
Demo (Part 1) – Viewing Pinot Data with Proper Access Control
-
Create an enablement for mock UDL.
-
Enable the track data set once it is available.
-
Next, configure a local terminal to connect to SDL:
# Set environment variables for password and client secret
DF_ADMIN_PASSWORD="admin-user-pw"
DF_BACKEND_CLIENT_SECRET=$(kubectl get secret -n data-fabric keycloak-realm-init \
-o jsonpath='\{.data.keycloakAdminClientSecret}' | base64 -d )
BASE_URL="localhost"
# Get access token from Keycloak and save to variable
AUTH_TOKEN=$(curl --request POST \
--url http://$BASE_URL/auth/realms/data-fabric/protocol/openid-connect/token \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data username=admin \
--data password=$DF_ADMIN_PASSWORD \
--data client_secret=$DF_BACKEND_CLIENT_SECRET \
--data client_id=df-backend \
--data scope=openid \
--data grant_type=password | jq -r '.access_token')
echo "Access token: $AUTH_TOKEN"
-
Fetch the information necessary to create a Pinot table. It’s best to get the Kafka topic value from the webpage:
# The name of the Pinot schema, NOT the schema in the catalog!
PINOT_SCHEMA=udl
# The name of the Pinot table
NEW_TABLE_NAME=udltrack
RETENTION_DAYS=7
# This command will only fetch the correct kafka topic if there is only a single mock UDL enabled (and nothing else). It's best to get copy this value from the frontend manually.
KAFKA_TOPIC=$(kubectl get datasets -A -o jsonpath='{.items[0].metadata.labels.datafabric\.goraft\.tech/dataset}')
# Get kafka password from secret
KAFKA_PASSWORD=$(kubectl -n data-fabric get secrets/df-kafka-user-internal --template={{.data.password}} | base64 -d)
-
POSTthe schema to SDL. This schema should match the shape of the data the Pinot table is supposed to fetch from Kafka.
# Create Pinot schema
curl -X POST http://$BASE_URL/api/internal/pinot/schemas \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d '{
"schemaName": "udl",
"dimensionFieldSpecs": [
{"name": "origin", "dataType": "STRING"},
{"name": "dataMode", "dataType": "STRING"},
{"name": "asset", "dataType": "INT"},
{"name": "objType", "dataType": "STRING"},
{"name": "id", "dataType": "STRING"},
{"name": "createdBy", "dataType": "STRING"},
{"name": "classificationMarking", "dataType": "STRING"},
{"name": "objIdent", "dataType": "STRING"},
{"name": "origNetwork", "dataType": "STRING"},
{"name": "trkId", "dataType": "STRING"},
{
"name": "__security__",
"dataType": "JSON"
}
],
"metricFieldSpecs": [
{"name": "alt", "dataType": "DOUBLE"},
{"name": "lon", "dataType": "DOUBLE"},
{"name": "lat", "dataType": "DOUBLE"}
],
"dateTimeFieldSpecs": [
{
"name": "time",
"dataType": "LONG",
"notNull": false,
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}'
-
POSTthe table configuration to SDL. The schema referenced should be the one that was saved on the prior step.
curl -X POST http://$BASE_URL/api/internal/pinot/tables \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d "{
\"tableName\": \"$NEW_TABLE_NAME\",
\"tableType\": \"REALTIME\",
\"segmentsConfig\": {
\"timeColumnName\": \"time\",
\"timeType\": \"MILLISECONDS\",
\"retentionTimeUnit\": \"DAYS\",
\"retentionTimeValue\": $RETENTION_DAYS,
\"replication\": 1,
\"replicasPerPartition\": 1,
\"schemaName\": \"$PINOT_SCHEMA\"
},
\"tableIndexConfig\": {
\"loadMode\": \"MMAP\",
\"streamConfigs\": {
\"streamType\": \"kafka\",
\"stream.kafka.consumer.type\": \"lowLevel\",
\"stream.kafka.topic.name\": \"$KAFKA_TOPIC\",
\"stream.kafka.decoder.class.name\": \"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder\",
\"stream.kafka.consumer.factory.class.name\": \"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory\",
\"stream.kafka.broker.list\": \"df-kafka-bootstrap:9092\",
\"key.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
\"value.serializer\": \"org.apache.kafka.common.serialization.StringDeserializer\",
\"stream.kafka.consumer.prop.auto.offset.reset\": \"smallest\",
\"security.protocol\": \"SASL_PLAINTEXT\",
\"sasl.jaas.config\": \"org.apache.kafka.common.security.scram.ScramLoginModule required username=\\\"internalkafkauser\\\" password=\\\"$KAFKA_PASSWORD\\\";\",
\"sasl.mechanism\": \"SCRAM-SHA-512\",
\"stream.kafka.decoder.prop.format\": \"JSON\",
\"stream.kafka.decoder.prop.schema.registry.rest.url\": \"http://df-schema-registry:8081\"
}
},
\"ingestionConfig\": {
\"transformConfigs\": [
{
\"columnName\": \"time\",
\"transformFunction\": \"now()\"
}
]
},
\"tenants\": {
\"broker\": \"DefaultTenant\",
\"server\": \"DefaultTenant\"
},
\"metadata\": {
\"customConfigs\": {}
}
}"
-
With
df-backendversions older than1.15.165, you need to update the catalog data set storages manually. This will trigger the authorization service to create permissions to the data set in Pinot based on the data set enablement’s groups:
# Get information about enabled data sets.
RESPONSE=$(curl "http://localhost/api/v2/catalog/datasources/all/enablements/all/datasets" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $AUTH_TOKEN")
# Parse the id of the storage for the enablement. NOTE: This assumes there's ONE enablement only!
STORAGE_ID=$(echo $RESPONSE | jq -r '.[] | select(.name == "track") | .storage[] | select(.name == "default") | .id')
echo $STORAGE_ID
# Parse the "self" link that makes up /datasources/{datasource-id}/enablements/{enablement-id}/datasets/{dataset-id}
PREFIX_URL=$(echo $RESPONSE | jq -r '.[0].links[] | select(.rel == "self") | .href')
echo $PREFIX_URL
STORAGE_URL="${PREFIX_URL}/storages/${STORAGE_ID}"
echo $STORAGE_URL
curl -vvv "http://{sdl-host}/api/v2/catalog/$STORAGE_URL" -H 'Content-Type: application/json' -H "Authorization: Bearer $AUTH_TOKEN" | jq
-
Connect to Trino and try to view data from the Pinot catalog. There should be a
defaultschema and audltracktable available. Query the data and confirm the results are visible to the user. -
To test that the group-based access control, sign out of SDL and sign back in as a user that is NOT in any of the groups the enablement was created for.
-
Repeat step 8. This time, no schema nor table suggestions should show up in the dropdown.
-
Try to run the query
SELECT * FROM default.udltrackand confirm that a Trino error is returned to the user.
-
Demo (Part 2) – Creating a Dashboard with Pinot Data
-
Switch back to the Admin user (or whatever user can view the enabled data set).
-
Create a data set from a SQL query:
-
Run the
SELECT * FROM default.udltrack LIMIT 100query. -
On the “Save” button, hit the dropdown, and save the results as a data set named “test.”
-
Hit “Save and explore” to get redirected to the chart editing page.
-
-
Press “View all charts.”
-
On the left side of the screen, click “Map.”
-
Select the
deck.gl Scatterplotand press the “Select” button.
-
-
Update the “Longitude & Latitude” fields to point to
lonandlat, respectively. -
On the “Map” dropdown, play around with the
https://map.localhost/geoserver/wmsURL until you get thenasa:blue_marbleWMS Layer value. -
Increase the point size from
1000to7000. -
Change the point color to bright red.
-
Click “Update Chart” and wait for the map to render.
-
Click “Save” on the top-right to save the chart to and existing dashboard, to save it to a new dashboard.