Catalog

The Catalog service is the central repository for managing datasets and data sources within the SOF Data Layer. It provides APIs and interfaces for discovering, organizing, and accessing data assets across the platform.

Key Features

  • Data Source Management: Register and manage connections to various data sources

  • Dataset Organization: Organize datasets with metadata, labels, and hierarchical structures

  • Schema Management: Integration with Schema Registry for schema versioning and validation

  • Dataset Filtering: Create filtered, real-time views of datasets using powerful query expressions

  • Access Control: Fine-grained permissions for dataset access and management

Components

API Documentation

User Interface

Getting Started

To start using the Catalog service:

  1. Browse Available Data: Use the Catalog UI to discover available datasets and data sources

  2. Access Dataset Schemas: Query dataset schemas to understand data structure

  3. Create Filtered Views: Use the Dataset Filtering API to create customized data streams

  4. Monitor Data Flow: Track ingestion status and data pipeline health