Protocol Documentation#

flyteidl/datacatalog/datacatalog.proto#

AddTagRequest#

Request message for tagging an Artifact.

AddTagRequest type fields#

Field

Type

Label

Description

tag

Tag

AddTagResponse#

Response message for tagging an Artifact.

Artifact#

Artifact message. It is composed of several string fields.

Artifact type fields#

Field

Type

Label

Description

id

string

The unique ID of the artifact

dataset

DatasetID

The Dataset that the artifact belongs to

data

ArtifactData

repeated

A list of data that is associated with the artifact

metadata

Metadata

Free-form metadata associated with the artifact

partitions

Partition

repeated

tags

Tag

repeated

created_at

Timestamp

creation timestamp of artifact, autogenerated by service

ArtifactData#

ArtifactData that belongs to an artifact

ArtifactData type fields#

Field

Type

Label

Description

name

string

value

Literal

ArtifactPropertyFilter#

Artifact properties we can filter by

ArtifactPropertyFilter type fields#

Field

Type

Label

Description

artifact_id

string

CreateArtifactRequest#

Request message for creating an Artifact and its associated artifact Data.

CreateArtifactRequest type fields#

Field

Type

Label

Description

artifact

Artifact

CreateArtifactResponse#

Response message for creating an Artifact.

CreateDatasetRequest#

Request message for creating a Dataset.

CreateDatasetRequest type fields#

Field

Type

Label

Description

dataset

Dataset

CreateDatasetResponse#

Response message for creating a Dataset

Dataset#

Dataset message. It is uniquely identified by DatasetID.

Dataset type fields#

Field

Type

Label

Description

id

DatasetID

metadata

Metadata

partitionKeys

string

repeated

DatasetID#

DatasetID message that is composed of several string fields.

DatasetID type fields#

Field

Type

Label

Description

project

string

The name of the project

name

string

The name of the dataset

domain

string

The domain (eg. environment)

version

string

Version of the data schema

UUID

string

UUID for the dataset (if set the above fields are optional)

DatasetPropertyFilter#

Dataset properties we can filter by

DatasetPropertyFilter type fields#

Field

Type

Label

Description

project

string

name

string

domain

string

version

string

FilterExpression#

Filter expression that is composed of a combination of single filters

FilterExpression type fields#

Field

Type

Label

Description

filters

SinglePropertyFilter

repeated

GetArtifactRequest#

Request message for retrieving an Artifact. Retrieve an artifact based on a query handle that can be one of artifact_id or tag. The result returned will include the artifact data and metadata associated with the artifact.

GetArtifactRequest type fields#

Field

Type

Label

Description

dataset

DatasetID

artifact_id

string

tag_name

string

GetArtifactResponse#

Response message for retrieving an Artifact. The result returned will include the artifact data and metadata associated with the artifact.

GetArtifactResponse type fields#

Field

Type

Label

Description

artifact

Artifact

GetDatasetRequest#

Request message for retrieving a Dataset. The Dataset is retrieved by it’s unique identifier which is a combination of several fields.

GetDatasetRequest type fields#

Field

Type

Label

Description

dataset

DatasetID

GetDatasetResponse#

Response message for retrieving a Dataset. The response will include the metadata for the Dataset.

GetDatasetResponse type fields#

Field

Type

Label

Description

dataset

Dataset

GetOrExtendReservationRequest#

Try to acquire or extend an artifact reservation. If an active reservation exists, retrieve that instance.

GetOrExtendReservationRequest type fields#

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

heartbeat_interval

Duration

Requested reservation extension heartbeat interval

GetOrExtendReservationResponse#

Response including either a newly minted reservation or the existing reservation

GetOrExtendReservationResponse type fields#

Field

Type

Label

Description

reservation

Reservation

KeyValuePair#

KeyValuePair type fields#

Field

Type

Label

Description

key

string

value

string

ListArtifactsRequest#

List the artifacts that belong to the Dataset, optionally filtered using filtered expression.

ListArtifactsRequest type fields#

Field

Type

Label

Description

dataset

DatasetID

Use a datasetID for which you want to retrieve the artifacts

filter

FilterExpression

Apply the filter expression to this query

pagination

PaginationOptions

Pagination options to get a page of artifacts

ListArtifactsResponse#

Response to list artifacts

ListArtifactsResponse type fields#

Field

Type

Label

Description

artifacts

Artifact

repeated

The list of artifacts

next_token

string

Token to use to request the next page, pass this into the next requests PaginationOptions

ListDatasetsRequest#

List the datasets for the given query

ListDatasetsRequest type fields#

Field

Type

Label

Description

filter

FilterExpression

Apply the filter expression to this query

pagination

PaginationOptions

Pagination options to get a page of datasets

ListDatasetsResponse#

List the datasets response with token for next pagination

ListDatasetsResponse type fields#

Field

Type

Label

Description

datasets

Dataset

repeated

The list of datasets

next_token

string

Token to use to request the next page, pass this into the next requests PaginationOptions

Metadata#

Metadata representation for artifacts and datasets

Metadata type fields#

Field

Type

Label

Description

key_map

Metadata.KeyMapEntry

repeated

key map is a dictionary of key/val strings that represent metadata

Metadata.KeyMapEntry#

Metadata.KeyMapEntry type fields#

Field

Type

Label

Description

key

string

value

string

PaginationOptions#

Pagination options for making list requests

PaginationOptions type fields#

Field

Type

Label

Description

limit

uint32

the max number of results to return

token

string

the token to pass to fetch the next page

sortKey

PaginationOptions.SortKey

the property that we want to sort the results by

sortOrder

PaginationOptions.SortOrder

the sort order of the results

Partition#

An artifact could have multiple partitions and each partition can have an arbitrary string key/value pair

Partition type fields#

Field

Type

Label

Description

key

string

value

string

PartitionPropertyFilter#

Partition properties we can filter by

PartitionPropertyFilter type fields#

Field

Type

Label

Description

key_val

KeyValuePair

ReleaseReservationRequest#

Request to release reservation

ReleaseReservationRequest type fields#

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

ReleaseReservationResponse#

Response to release reservation

Reservation#

A reservation including owner, heartbeat interval, expiration timestamp, and various metadata.

Reservation type fields#

Field

Type

Label

Description

reservation_id

ReservationID

owner_id

string

heartbeat_interval

Duration

Recommended heartbeat interval to extend reservation

expires_at

Timestamp

Expiration timestamp of this reservation

metadata

Metadata

ReservationID#

ReservationID message that is composed of several string fields.

ReservationID type fields#

Field

Type

Label

Description

dataset_id

DatasetID

tag_name

string

SinglePropertyFilter#

A single property to filter on.

SinglePropertyFilter type fields#

Field

Type

Label

Description

tag_filter

TagPropertyFilter

partition_filter

PartitionPropertyFilter

artifact_filter

ArtifactPropertyFilter

dataset_filter

DatasetPropertyFilter

operator

SinglePropertyFilter.ComparisonOperator

field 10 in case we add more entities to query

Tag#

Tag message that is unique to a Dataset. It is associated to a single artifact and can be retrieved by name later.

Tag type fields#

Field

Type

Label

Description

name

string

Name of tag

artifact_id

string

The tagged artifact

dataset

DatasetID

The Dataset that this tag belongs to

TagPropertyFilter#

Tag properties we can filter by

TagPropertyFilter type fields#

Field

Type

Label

Description

tag_name

string

UpdateArtifactRequest#

Request message for updating an Artifact and overwriting its associated ArtifactData.

UpdateArtifactRequest type fields#

Field

Type

Label

Description

dataset

DatasetID

ID of dataset the artifact is associated with

artifact_id

string

tag_name

string

data

ArtifactData

repeated

List of data to overwrite stored artifact data with. Must contain ALL data for updated Artifact as any missing ArtifactData entries will be removed from the underlying blob storage and database.

UpdateArtifactResponse#

Response message for updating an Artifact.

UpdateArtifactResponse type fields#

Field

Type

Label

Description

artifact_id

string

The unique ID of the artifact updated

PaginationOptions.SortKey#

Enum PaginationOptions.SortKey values#

Name

Number

Description

CREATION_TIME

0

PaginationOptions.SortOrder#

Enum PaginationOptions.SortOrder values#

Name

Number

Description

DESCENDING

0

ASCENDING

1

SinglePropertyFilter.ComparisonOperator#

as use-cases come up we can add more operators, ex: gte, like, not eq etc.

Enum SinglePropertyFilter.ComparisonOperator values#

Name

Number

Description

EQUALS

0

DataCatalog#

Data Catalog service definition Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. Artifacts are associated with a Dataset, and can be tagged for retrieval.

DataCatalog service methods#

Method Name

Request Type

Response Type

Description

CreateDataset

CreateDatasetRequest

CreateDatasetResponse

Create a new Dataset. Datasets are unique based on the DatasetID. Datasets are logical groupings of artifacts. Each dataset can have one or more artifacts

GetDataset

GetDatasetRequest

GetDatasetResponse

Get a Dataset by the DatasetID. This returns the Dataset with the associated metadata.

CreateArtifact

CreateArtifactRequest

CreateArtifactResponse

Create an artifact and the artifact data associated with it. An artifact can be a hive partition or arbitrary files or data values

GetArtifact

GetArtifactRequest

GetArtifactResponse

Retrieve an artifact by an identifying handle. This returns an artifact along with the artifact data.

AddTag

AddTagRequest

AddTagResponse

Associate a tag with an artifact. Tags are unique within a Dataset.

ListArtifacts

ListArtifactsRequest

ListArtifactsResponse

Return a paginated list of artifacts

ListDatasets

ListDatasetsRequest

ListDatasetsResponse

Return a paginated list of datasets

UpdateArtifact

UpdateArtifactRequest

UpdateArtifactResponse

Updates an existing artifact, overwriting the stored artifact data in the underlying blob storage.

GetOrExtendReservation

GetOrExtendReservationRequest

GetOrExtendReservationResponse

Attempts to get or extend a reservation for the corresponding artifact. If one already exists (ie. another entity owns the reservation) then that reservation is retrieved. Once you acquire a reservation, you need to periodically extend the reservation with an identical call. If the reservation is not extended before the defined expiration, it may be acquired by another task. Note: We may have multiple concurrent tasks with the same signature and the same input that try to populate the same artifact at the same time. Thus with reservation, only one task can run at a time, until the reservation expires. Note: If task A does not extend the reservation in time and the reservation expires, another task B may take over the reservation, resulting in two tasks A and B running in parallel. So a third task C may get the Artifact from A or B, whichever writes last.

ReleaseReservation

ReleaseReservationRequest

ReleaseReservationResponse

Release the reservation when the task holding the spot fails so that the other tasks can grab the spot.

google/protobuf/timestamp.proto#

Timestamp#

A Timestamp represents a point in time independent of any time zone or local calendar, encoded as a count of seconds and fractions of seconds at nanosecond resolution. The count is relative to an epoch at UTC midnight on January 1, 1970, in the proleptic Gregorian calendar which extends the Gregorian calendar backwards to year one.

All minutes are 60 seconds long. Leap seconds are “smeared” so that no leap second table is needed for interpretation, using a [24-hour linear smear](https://developers.google.com/time/smear).

The range is from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z. By restricting to that range, we ensure that we can convert to and from [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt) date strings.

# Examples

Example 1: Compute Timestamp from POSIX time().

Timestamp timestamp; timestamp.set_seconds(time(NULL)); timestamp.set_nanos(0);

Example 2: Compute Timestamp from POSIX gettimeofday().

struct timeval tv; gettimeofday(&tv, NULL);

Timestamp timestamp; timestamp.set_seconds(tv.tv_sec); timestamp.set_nanos(tv.tv_usec * 1000);

Example 3: Compute Timestamp from Win32 GetSystemTimeAsFileTime().

FILETIME ft; GetSystemTimeAsFileTime(&ft); UINT64 ticks = (((UINT64)ft.dwHighDateTime) << 32) | ft.dwLowDateTime;

// A Windows tick is 100 nanoseconds. Windows epoch 1601-01-01T00:00:00Z // is 11644473600 seconds before Unix epoch 1970-01-01T00:00:00Z. Timestamp timestamp; timestamp.set_seconds((INT64) ((ticks / 10000000) - 11644473600LL)); timestamp.set_nanos((INT32) ((ticks % 10000000) * 100));

Example 4: Compute Timestamp from Java System.currentTimeMillis().

long millis = System.currentTimeMillis();

Timestamp timestamp = Timestamp.newBuilder().setSeconds(millis / 1000)

.setNanos((int) ((millis % 1000) * 1000000)).build();

Example 5: Compute Timestamp from Java Instant.now().

Instant now = Instant.now();

Timestamp timestamp =
Timestamp.newBuilder().setSeconds(now.getEpochSecond())

.setNanos(now.getNano()).build();

Example 6: Compute Timestamp from current time in Python.

timestamp = Timestamp() timestamp.GetCurrentTime()

# JSON Mapping

In JSON format, the Timestamp type is encoded as a string in the [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt) format. That is, the format is “{year}-{month}-{day}T{hour}:{min}:{sec}[.{frac_sec}]Z” where {year} is always expressed using four digits while {month}, {day}, {hour}, {min}, and {sec} are zero-padded to two digits each. The fractional seconds, which can go up to 9 digits (i.e. up to 1 nanosecond resolution), are optional. The “Z” suffix indicates the timezone (“UTC”); the timezone is required. A proto3 JSON serializer should always use UTC (as indicated by “Z”) when printing the Timestamp type and a proto3 JSON parser should be able to accept both UTC and other timezones (as indicated by an offset).

For example, “2017-01-15T01:30:15.01Z” encodes 15.01 seconds past 01:30 UTC on January 15, 2017.

In JavaScript, one can convert a Date object to this format using the standard [toISOString()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString) method. In Python, a standard datetime.datetime object can be converted to this format using [strftime](https://docs.python.org/2/library/time.html#time.strftime) with the time format spec ‘%Y-%m-%dT%H:%M:%S.%fZ’. Likewise, in Java, one can use the Joda Time’s [ISODateTimeFormat.dateTime()]( http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTime%2D%2D ) to obtain a formatter capable of generating timestamps in this format.

Timestamp type fields#

Field

Type

Label

Description

seconds

int64

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive.

nanos

int32

Non-negative fractions of a second at nanosecond resolution. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be from 0 to 999,999,999 inclusive.

google/protobuf/duration.proto#

Duration#

A Duration represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution. It is independent of any calendar and concepts like “day” or “month”. It is related to Timestamp in that the difference between two Timestamp values is a Duration and it can be added or subtracted from a Timestamp. Range is approximately +-10,000 years.

# Examples

Example 1: Compute Duration from two Timestamps in pseudo code.

Timestamp start = …; Timestamp end = …; Duration duration = …;

duration.seconds = end.seconds - start.seconds; duration.nanos = end.nanos - start.nanos;

if (duration.seconds < 0 && duration.nanos > 0) {

duration.seconds += 1; duration.nanos -= 1000000000;

} else if (duration.seconds > 0 && duration.nanos < 0) {

duration.seconds -= 1; duration.nanos += 1000000000;

}

Example 2: Compute Timestamp from Timestamp + Duration in pseudo code.

Timestamp start = …; Duration duration = …; Timestamp end = …;

end.seconds = start.seconds + duration.seconds; end.nanos = start.nanos + duration.nanos;

if (end.nanos < 0) {

end.seconds -= 1; end.nanos += 1000000000;

} else if (end.nanos >= 1000000000) {

end.seconds += 1; end.nanos -= 1000000000;

}

Example 3: Compute Duration from datetime.timedelta in Python.

td = datetime.timedelta(days=3, minutes=10) duration = Duration() duration.FromTimedelta(td)

# JSON Mapping

In JSON format, the Duration type is encoded as a string rather than an object, where the string ends in the suffix “s” (indicating seconds) and is preceded by the number of seconds, with nanoseconds expressed as fractional seconds. For example, 3 seconds with 0 nanoseconds should be encoded in JSON format as “3s”, while 3 seconds and 1 nanosecond should be expressed in JSON format as “3.000000001s”, and 3 seconds and 1 microsecond should be expressed in JSON format as “3.000001s”.

Duration type fields#

Field

Type

Label

Description

seconds

int64

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

int32

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

google/protobuf/struct.proto#

ListValue#

ListValue is a wrapper around a repeated field of values.

The JSON representation for ListValue is JSON array.

ListValue type fields#

Field

Type

Label

Description

values

Value

repeated

Repeated field of dynamically typed values.

Struct#

Struct represents a structured data value, consisting of fields which map to dynamically typed values. In some languages, Struct might be supported by a native representation. For example, in scripting languages like JS a struct is represented as an object. The details of that representation are described together with the proto support for the language.

The JSON representation for Struct is JSON object.

Struct type fields#

Field

Type

Label

Description

fields

Struct.FieldsEntry

repeated

Unordered map of dynamically typed values.

Struct.FieldsEntry#

Struct.FieldsEntry type fields#

Field

Type

Label

Description

key

string

value

Value

Value#

Value represents a dynamically typed value which can be either null, a number, a string, a boolean, a recursive struct value, or a list of values. A producer of value is expected to set one of these variants. Absence of any variant indicates an error.

The JSON representation for Value is JSON value.

Value type fields#

Field

Type

Label

Description

null_value

NullValue

Represents a null value.

number_value

double

Represents a double value.

string_value

string

Represents a string value.

bool_value

bool

Represents a boolean value.

struct_value

Struct

Represents a structured value.

list_value

ListValue

Represents a repeated Value.

NullValue#

NullValue is a singleton enumeration to represent the null value for the Value type union.

The JSON representation for NullValue is JSON null.

Enum NullValue values#

Name

Number

Description

NULL_VALUE

0

Null value.