Protocol Documentation

flyteidl/datacatalog/datacatalog.proto

AddTagRequest

Request message for tagging an Artifact.

AddTagRequest type fields

Field

Type

Label

Description

tag

Tag

AddTagResponse

Response message for tagging an Artifact.

Artifact

Artifact message. It is composed of several string fields.

Artifact type fields

Field

Type

Label

Description

id

string

The unique ID of the artifact

dataset

DatasetID

The Dataset that the artifact belongs to

data

ArtifactData

repeated

A list of data that is associated with the artifact

metadata

Metadata

Free-form metadata associated with the artifact

partitions

Partition

repeated

tags

Tag

repeated

created_at

Timestamp

creation timestamp of artifact, autogenerated by service

ArtifactData

ArtifactData that belongs to an artifact

ArtifactData type fields

Field

Type

Label

Description

name

string

value

Literal

ArtifactPropertyFilter

Artifact properties we can filter by

ArtifactPropertyFilter type fields

Field

Type

Label

Description

artifact_id

string

CreateArtifactRequest

Request message for creating an Artifact and its associated artifact Data.

CreateArtifactRequest type fields

Field

Type

Label

Description

artifact

Artifact

CreateArtifactResponse

Response message for creating an Artifact.

CreateDatasetRequest

Request message for creating a Dataset.

CreateDatasetRequest type fields

Field

Type

Label

Description

dataset

Dataset

CreateDatasetResponse

Response message for creating a Dataset

Dataset

Dataset message. It is uniquely identified by DatasetID.

Dataset type fields

Field

Type

Label

Description

id

DatasetID

metadata

Metadata

partitionKeys

string

repeated

DatasetID

DatasetID message that is composed of several string fields.

DatasetID type fields

Field

Type

Label

Description

project

string

The name of the project

name

string

The name of the dataset

domain

string

The domain (eg. environment)

version

string

Version of the data schema

UUID

string

UUID for the dataset (if set the above fields are optional)

DatasetPropertyFilter

Dataset properties we can filter by

DatasetPropertyFilter type fields

Field

Type

Label

Description

project

string

name

string

domain

string

version

string

FilterExpression

Filter expression that is composed of a combination of single filters

FilterExpression type fields

Field

Type

Label

Description

filters

SinglePropertyFilter

repeated

GetArtifactRequest

Request message for retrieving an Artifact. Retrieve an artifact based on a query handle that can be one of artifact_id or tag. The result returned will include the artifact data and metadata associated with the artifact.

GetArtifactRequest type fields

Field

Type

Label

Description

dataset

DatasetID

artifact_id

string

tag_name

string

GetArtifactResponse

Response message for retrieving an Artifact. The result returned will include the artifact data and metadata associated with the artifact.

GetArtifactResponse type fields

Field

Type

Label

Description

artifact

Artifact

GetDatasetRequest

Request message for retrieving a Dataset. The Dataset is retrieved by it’s unique identifier which is a combination of several fields.

GetDatasetRequest type fields

Field

Type

Label

Description

dataset

DatasetID

GetDatasetResponse

Response message for retrieving a Dataset. The response will include the metadata for the Dataset.

GetDatasetResponse type fields

Field

Type

Label

Description

dataset

Dataset

GetOrExtendReservationRequest

Try to acquire or extend an artifact reservation. If an active reservation exists, retreive that instance.

GetOrExtendReservationRequest type fields

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

heartbeat_interval

Duration

Requested reservation extension heartbeat interval

GetOrExtendReservationResponse

Response including either a newly minted reservation or the existing reservation

GetOrExtendReservationResponse type fields

Field

Type

Label

Description

reservation

Reservation

KeyValuePair

KeyValuePair type fields

Field

Type

Label

Description

key

string

value

string

ListArtifactsRequest

List the artifacts that belong to the Dataset, optionally filtered using filtered expression.

ListArtifactsRequest type fields

Field

Type

Label

Description

dataset

DatasetID

Use a datasetID for which you want to retrieve the artifacts

filter

FilterExpression

Apply the filter expression to this query

pagination

PaginationOptions

Pagination options to get a page of artifacts

ListArtifactsResponse

Response to list artifacts

ListArtifactsResponse type fields

Field

Type

Label

Description

artifacts

Artifact

repeated

The list of artifacts

next_token

string

Token to use to request the next page, pass this into the next requests PaginationOptions

ListDatasetsRequest

List the datasets for the given query

ListDatasetsRequest type fields

Field

Type

Label

Description

filter

FilterExpression

Apply the filter expression to this query

pagination

PaginationOptions

Pagination options to get a page of datasets

ListDatasetsResponse

List the datasets response with token for next pagination

ListDatasetsResponse type fields

Field

Type

Label

Description

datasets

Dataset

repeated

The list of datasets

next_token

string

Token to use to request the next page, pass this into the next requests PaginationOptions

Metadata

Metadata representation for artifacts and datasets

Metadata type fields

Field

Type

Label

Description

key_map

Metadata.KeyMapEntry

repeated

key map is a dictionary of key/val strings that represent metadata

Metadata.KeyMapEntry

Metadata.KeyMapEntry type fields

Field

Type

Label

Description

key

string

value

string

PaginationOptions

Pagination options for making list requests

PaginationOptions type fields

Field

Type

Label

Description

limit

uint32

the max number of results to return

token

string

the token to pass to fetch the next page

sortKey

PaginationOptions.SortKey

the property that we want to sort the results by

sortOrder

PaginationOptions.SortOrder

the sort order of the results

Partition

An artifact could have multiple partitions and each partition can have an arbitrary string key/value pair

Partition type fields

Field

Type

Label

Description

key

string

value

string

PartitionPropertyFilter

Partition properties we can filter by

PartitionPropertyFilter type fields

Field

Type

Label

Description

key_val

KeyValuePair

ReleaseReservationRequest

Request to release reservation

ReleaseReservationRequest type fields

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

ReleaseReservationResponse

Response to release reservation

Reservation

A reservation including owner, heartbeat interval, expiration timestamp, and various metadata.

Reservation type fields

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

heartbeat_interval

Duration

Recommended heartbeat interval to extend reservation

expires_at

Timestamp

Expiration timestamp of this reservation

metadata

Metadata

ReservationID

ReservationID message that is composed of several string fields.

ReservationID type fields

Field

Type

Label

Description

dataset_id

DatasetID

tag_name

string

SinglePropertyFilter

A single property to filter on.

SinglePropertyFilter type fields

Field

Type

Label

Description

tag_filter

TagPropertyFilter

partition_filter

PartitionPropertyFilter

artifact_filter

ArtifactPropertyFilter

dataset_filter

DatasetPropertyFilter

operator

SinglePropertyFilter.ComparisonOperator

field 10 in case we add more entities to query

Tag

Tag message that is unique to a Dataset. It is associated to a single artifact and can be retrieved by name later.

Tag type fields

Field

Type

Label

Description

name

string

Name of tag

artifact_id

string

The tagged artifact

dataset

DatasetID

The Dataset that this tag belongs to

TagPropertyFilter

Tag properties we can filter by

TagPropertyFilter type fields :header: “Field”, “Type”, “Label”, “Description” :widths: auto

tag_name

string

<!– end messages –>

PaginationOptions.SortKey

Enum PaginationOptions.SortKey values

Name

Number

Description

CREATION_TIME

0

PaginationOptions.SortOrder

Enum PaginationOptions.SortOrder values

Name

Number

Description

DESCENDING

0

ASCENDING

1

SinglePropertyFilter.ComparisonOperator

as use-cases come up we can add more operators, ex: gte, like, not eq etc.

Enum SinglePropertyFilter.ComparisonOperator values :header: “Name”, “Number”, “Description” :widths: auto

EQUALS

0

<!– end enums –>

<!– end HasExtensions –>

DataCatalog

Data Catalog service definition Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. Artifacts are associated with a Dataset, and can be tagged for retrieval.

DataCatalog service methods :header: “Method Name”, “Request Type”, “Response Type”, “Description” :widths: auto

CreateDataset

CreateDatasetRequest

CreateDatasetResponse

Create a new Dataset. Datasets are unique based on the DatasetID. Datasets are logical groupings of artifacts. Each dataset can have one or more artifacts

GetDataset

GetDatasetRequest

GetDatasetResponse

Get a Dataset by the DatasetID. This returns the Dataset with the associated metadata.

CreateArtifact

CreateArtifactRequest

CreateArtifactResponse

Create an artifact and the artifact data associated with it. An artifact can be a hive partition or arbitrary files or data values

GetArtifact

GetArtifactRequest

GetArtifactResponse

Retrieve an artifact by an identifying handle. This returns an artifact along with the artifact data.

AddTag

AddTagRequest

AddTagResponse

Associate a tag with an artifact. Tags are unique within a Dataset.

ListArtifacts

ListArtifactsRequest

ListArtifactsResponse

Return a paginated list of artifacts

ListDatasets

ListDatasetsRequest

ListDatasetsResponse

Return a paginated list of datasets

GetOrExtendReservation

GetOrExtendReservationRequest

GetOrExtendReservationResponse

Attempts to get or extend a reservation for the corresponding artifact. If one already exists (ie. another entity owns the reservation) then that reservation is retrieved. Once you acquire a reservation, you need to periodically extend the reservation with an identical call. If the reservation is not extended before the defined expiration, it may be acquired by another task. Note: We may have multiple concurrent tasks with the same signature and the same input that try to populate the same artifact at the same time. Thus with reservation, only one task can run at a time, until the reservation expires. Note: If task A does not extend the reservation in time and the reservation expires, another task B may take over the reservation, resulting in two tasks A and B running in parallel. So a third task C may get the Artifact from A or B, whichever writes last.

ReleaseReservation

ReleaseReservationRequest

ReleaseReservationResponse

Release the reservation when the task holding the spot fails so that the other tasks can grab the spot.

<!– end services –>