Administering ArcadeDB

Overview

ArcadeDB is integrated with Keycloak for user authentication, attribution, and Role Based Access Controls (RBAC). The following document describes the Keycloak roles permission structure, how to create users, and how to create databases.

Permissions

ArcadeDB roles are currently managed under the df-backend client, and take the following structure:

arcade__[role type]__[permission args...]
arcade__sa__[createDatabase|dropDatabase|*]
arcade__dba__d-[database]__[updateSecurity|updateSchema|updateDatabaseSettings|*]
arcade__dataSteward__d-[database]__t-[type]
arcade__user__d-[database]__t-[type]__p-[CRUD|*]

The 4 types of roles are:

  1. Server Admin. Server admins perform operations across the cluster, such as managing cluster nodes and databases. Currently delineated SA roles for Arcade are:

    1. createDatabase - grants the role bearer the ability to create databases on the cluster

    2. dropDatabase - grants the role bearer the ability to drop databases on the cluster

    3. * - grants the role bearer the ability to perform any action on the cluster described in the arcade docs, including managing databases.

  2. Database Admin. Database admins can manage schema, tuning, and maintenance of individual databases. DBA roles are authorized on a per database basis. DBA roles include:

    1. updateSecurity - grants the ability to directly manage Arcade’s internal ACLs. This is granted for troubleshooting purposes, and it is not expected for DBAs to directly manage Arcade users in Arcade. Please use Keycloak roles.

    2. updateSchema - Nodes/Vertices and Edges in Arcade by default are schemaless, and allow the users to write whatever shape of data they want within available types of nodes and edges. This can be changed on a per type or per field basis to enforce required properties, data types, validation, etc.

    3. updateDatabaseSettings - grants the ability to manage database tuning and maintenance activities.

    4. * - grants the ability for the user to perform all database level actions

  3. Data steward. Data stewards are able to manage data objects in ArcadeDb that lack proper classification markings. Ordinary users are prevented from creating objects that aren’t properly marked, but we’re allowing service accounts to write objects with missing or incomplete classification markings. These unmarked objects are hidden from ordinary users until a data steward cleans them up. Data Stewards are authorized on a per type, per database basis.

  4. User. Users can be assigned any combination of the following permissions in a user role

    1. C: Create

    2. R: Read

    3. U: Update

    4. D: Delete

    5. *: All

Those permitted actions apply to any nodes and edges within the types of objects to which they’re authorized at the database level.

Users

ArcadeDb users will need to have the following in the SDL Keycloak:

  1. an active account

  2. at least one valid Arcade role in order to be able to log into the ArcadeDb Studio. ArcadeDb API actions are authorized depending upon the presence of any required role for the requested action.

  3. clearance-USA attribute (clearance-USA is the key, input the user’s clearance abbreviation as the value. Leave off SAPs or other markings)

  4. (future) ACCM attributes. This will include stuff like nationality, SAPs, etc.

Databases

Creating new databases can be done through the GUI or an API call to the server command endpoint, as described in the Arcade docs.

Creating a database now requires additional options properties above and beyond what arcade requires:

  1. owner: the name(s) of the owning people or organizations

  2. visibility: public or private

  3. classification: the classification of the database. Options are U, CUI, C, S, TS

In the GUI, there is a checkbox option to auto create the main Palantir wikidata based ontology types in Arcade. Just the vertex types are included, no edges yet.

When a new database is created, corresponding roles are auto created in keycloak, and auto assigned to the user who created the database so they can start using it right away. Those roles may need to be pared back by a keycloak admin after the initial setup is completed.

When a user connects to Arcade Studio, they will need to switch to the database tab (second from the top, on the left side menu), and select which database they want to connect to from the dropdown. From that tab they can also view the available types and schema information.

Importing Data

Import Palantir IIRs

Palantir IIRs are a key intelligence report dataset. To import this data, follow the requirements and steps below.

This requires the following charts:

  1. raft-lakehouse

  2. metastore

  3. raft-trino

  4. cert-manager

  5. flink

  6. raft-arcadedb

Steps:

  1. Enable palantir datasource

  2. Enable IIR dataset

  3. Wait for the ingestors to spin up and start consuming data. You may need to tweak the classification level of the ingestor deployment if in a non prod env to handle test data.

  4. Grab the name of the kafka topic/bucket for the IIR dataset.

  5. Navigate to ~[SDL repo]/kubernetes/helm/charts/raft-arcadedb/disabled-templates/palantir-iir.yaml

  6. Update the deployed arcadedb jobs image version on line 9 as necessary

  7. Update the bucket name in the deltaLakePath on line 42. The name you copied should just replace the inner UUID in the value.

  8. Run the following command: kubectl apply -f palantir-iir.yaml

  9. Give flink/arcadedb a few minutes to load data

  10. In Arcadedb GUI, swap the query type to Gremlin and run g.V().bothE(). You should see IIR data.

Import BlackCape Marble Madness

BlackCape Marble Madness is a key intel-ops fusion dataset. The following procedure imports the dataset, and maps the majority of it to the common core ontology (CCO) based JIKO ontology.

This requires the following charts:

  1. raft-lakehouse

  2. metastore

  3. raft-trino

  4. triplifier

  5. raft-arcadedb

Steps:

  1. Enable BlackCape datasource

  2. Enable Marble Maddness dataset

  3. Wait for the ingestors to spin up and start consuming data. You may need to tweak the classification level of the ingestor deployment if in a non prod env to handle test data.

  4. Confirm data is moving to kafka/delta

  5. Wait for the triplifier to auto detect the MM data and load.

  6. Give arcadedb an extra minute

  7. In arcadedb GUI, swap the query type to Gremlin and run g.V().bothE(). You should see MM data.