Lakehouse
SDL provides a set of APIs exposed from the Open API Spec that expose Lakehouse data to users.
Purpose
The Lakehouse APIs provide users the ability to interact with the SDL Lakehouse.
The Lakehouse is structured with a medallion architecture and the layers are exposed through various schemas.
The SDL uses Trino as a distributed query engine which has a direct connection to the Lakehouse. Trino is useful in this architecture for several reasons: . Trino has out-of-the-box support for interacting with the Lakehouse. . Trino allows us to implement Open Policy Agent for tighter controls on data access. . Trino Gateway supports a load balancer, proxy server, and configurable routing gateway for multiple Trino clusters. . Trino supports large workloads and queries by streaming results.
Endpoints
| Operation | Endpoint | Description |
|---|---|---|
|
|
Exposes the available schemas. |
|
|
Exposes all of the tables which belong to the particular |
|
|
Exposes all of the column data belonging to the |
|
|
Allows the user to provide custom queries to the Lakehouse. |
Enablement
To utilize SDL Lakehouse APIs, users must have an account with the program Keycloak or SDL’s Keycloak. The Open API Spec expects a username and password or JSON Web Token (JWT) to authenticate with the Lakehouse APIs. Users must belong to a group upon onboarding that has an associated Open Policy Agent (OPA) policy for data access. If you require access, need access to a group, or need the group’s policy updated, reach out to our help desk.