Lakehouse

SDL provides a set of APIs exposed from the Open API Spec that expose Lakehouse data to users.

Lakehouse Overview

Purpose

The Lakehouse APIs provide users the ability to interact with the SDL Lakehouse.

The Lakehouse is structured with a medallion architecture and the layers are exposed through various schemas.

The SDL uses Trino as a distributed query engine which has a direct connection to the Lakehouse. Trino is useful in this architecture for several reasons: . Trino has out-of-the-box support for interacting with the Lakehouse. . Trino allows us to implement Open Policy Agent for tighter controls on data access. . Trino Gateway supports a load balancer, proxy server, and configurable routing gateway for multiple Trino clusters. . Trino supports large workloads and queries by streaming results.

Endpoints

Operation Endpoint Description

GET

api/v1/lakehouse/schemas

Exposes the available schemas.

GET

api/v1/lakehouse/schemas/{schema}

Exposes all of the tables which belong to the particular {schema}.

GET

api/v1/lakehouse/schemas/{schema}/{table}

Exposes all of the column data belonging to the {schema}/{table}.

POST

api/v1/lakehouse/schemas/{schema}

Allows the user to provide custom queries to the Lakehouse.

Enablement

To utilize SDL Lakehouse APIs, users must have an account with the program Keycloak or SDL’s Keycloak. The Open API Spec expects a username and password or JSON Web Token (JWT) to authenticate with the Lakehouse APIs. Users must belong to a group upon onboarding that has an associated Open Policy Agent (OPA) policy for data access. If you require access, need access to a group, or need the group’s policy updated, reach out to our help desk.