Catalog
The Catalog service is the central repository for managing datasets and data sources within the SOF Data Layer. It provides APIs and interfaces for discovering, organizing, and accessing data assets across the platform.
Key Features
-
Data Source Management: Register and manage connections to various data sources
-
Dataset Organization: Organize datasets with metadata, labels, and hierarchical structures
-
Schema Management: Integration with Schema Registry for schema versioning and validation
-
Dataset Filtering: Create filtered, real-time views of datasets using powerful query expressions
-
Access Control: Fine-grained permissions for dataset access and management
Components
API Documentation
-
Dataset Schema API - Retrieve and understand dataset structures and field information
-
Dataset Filtering API - Create filtered datasets with KSQL-based stream processing
User Interface
-
Catalog UI Components - Web interface for browsing and managing catalog entries
Getting Started
To start using the Catalog service:
-
Browse Available Data: Use the Catalog UI to discover available datasets and data sources
-
Access Dataset Schemas: Query dataset schemas to understand data structure
-
Create Filtered Views: Use the Dataset Filtering API to create customized data streams
-
Monitor Data Flow: Track ingestion status and data pipeline health