Overview
Data Fabric is an architecture for a Raft’s secure and scalable data platform, R[DP]
capable of data streaming, storage, data collection and distribution, transformations and much more.
Data Fabric’s primary focus is to enable seamless data discovery, access and sharing, while maintaining data dissemination and access control to a garular level.
It is a modular, containerized and decentralized by design — capable of running on tactical edge, on-prem or cloud services. DF is built on open source projects and can be deployed in 100% air-gap environments. Addtinaolly, DF can federate data across DF nodes while maintaing data governece. DF enables DOD’s VAULTIS, which makes data accessible, usable, and insightful. DF is the bridge that brings the data from source to insights.
Architecture High Level Overview
Source: DF Arch
Users Persona
DF Admin / Dev
-
Needs to know technical details re deployment and configurations
-
New deployment -→ All green
-
Maintence and updates
Package Mangment
DF utlizes [Helm](https://helm.sh) to pacakges it’s componets. The avilable charts will be listed in DF_HOME/kubernetes/helm/charts
.
Default Settings
A default deployment of DF will deploy a minimal chart collection and configation which can be found in DF_HOME/kubernetes/deafults/
. There are 3 files used by the default deployment:
1. chart_order.txt
The list of helm modules to deploy in a particular order.
2. global.yaml
Defulat value files for helm
3. versions.yaml
Particular versions of Raft build images to be used.
DF Data User
-
Needs to know how to interact with DF to create and maintain data products
DF Catalog home page DF_URL/catalog
provides an entry point to interacting with data. On this page
DataSources
DF Catalog uses DataSource to curate it’s data connections with the world. Note that DataSources require an Enablement to setup the data connection information, i.e url, access key, configration, etc. This step is typically done by a DF admin or a user with an enablemnet
role prior to a data user interacting with DF.
DataSets
DataSets in DF are related to a DataSources. DataSets may be a Streaming topic, excel file, FMV, etc. A DataSet may be local or remote and will offer
Interacting with Data in DF
DF connects to data in one of a few methods:
-
proxy
at this stage the DF will proxy services while maintaining the same APIs from upstream. This allows consumers to keep working with their current code base and not requiring any code changes. The proxy has a configrable cache argument that can be passed in to set the time to maintin cache and invalitae the cache if a fresh data pull from source is required. -
onboard
data to DF will localize the data to DF. Onboarded datasets are those recognized as mission-critical that must be available in DDIL situations. Addtionally, DF may localize data for improved performance times in data aviliablity and any data transformations/ETL. -
ingested
and ingested DataSet in DF will be mapped to an ontology to allow graph querying and data coorelation. — IIR / Graph stuff