Protocol Documentation#
flyteidl/datacatalog/datacatalog.proto#
AddTagRequest#
Request message for tagging an Artifact.
AddTagResponse#
Response message for tagging an Artifact.
Artifact#
Artifact message. It is composed of several string fields.
Field |
Type |
Label |
Description |
---|---|---|---|
id |
The unique ID of the artifact |
||
dataset |
The Dataset that the artifact belongs to |
||
data |
repeated |
A list of data that is associated with the artifact |
|
metadata |
Free-form metadata associated with the artifact |
||
partitions |
repeated |
||
tags |
repeated |
||
created_at |
creation timestamp of artifact, autogenerated by service |
ArtifactData#
ArtifactData that belongs to an artifact
ArtifactPropertyFilter#
Artifact properties we can filter by
CreateArtifactRequest#
Request message for creating an Artifact and its associated artifact Data.
CreateArtifactResponse#
Response message for creating an Artifact.
CreateDatasetRequest#
Request message for creating a Dataset.
CreateDatasetResponse#
Response message for creating a Dataset
Dataset#
Dataset message. It is uniquely identified by DatasetID.
DatasetID#
DatasetID message that is composed of several string fields.
DatasetPropertyFilter#
Dataset properties we can filter by
FilterExpression#
Filter expression that is composed of a combination of single filters
Field |
Type |
Label |
Description |
---|---|---|---|
filters |
repeated |
GetArtifactRequest#
Request message for retrieving an Artifact. Retrieve an artifact based on a query handle that can be one of artifact_id or tag. The result returned will include the artifact data and metadata associated with the artifact.
GetArtifactResponse#
Response message for retrieving an Artifact. The result returned will include the artifact data and metadata associated with the artifact.
GetDatasetRequest#
Request message for retrieving a Dataset. The Dataset is retrieved by it’s unique identifier which is a combination of several fields.
GetDatasetResponse#
Response message for retrieving a Dataset. The response will include the metadata for the Dataset.
GetOrExtendReservationRequest#
Try to acquire or extend an artifact reservation. If an active reservation exists, retrieve that instance.
Field |
Type |
Label |
Description |
---|---|---|---|
reservation_id |
|||
owner_id |
|||
heartbeat_interval |
Requested reservation extension heartbeat interval |
GetOrExtendReservationResponse#
Response including either a newly minted reservation or the existing reservation
Field |
Type |
Label |
Description |
---|---|---|---|
reservation |
KeyValuePair#
ListArtifactsRequest#
List the artifacts that belong to the Dataset, optionally filtered using filtered expression.
Field |
Type |
Label |
Description |
---|---|---|---|
dataset |
Use a datasetID for which you want to retrieve the artifacts |
||
filter |
Apply the filter expression to this query |
||
pagination |
Pagination options to get a page of artifacts |
ListArtifactsResponse#
Response to list artifacts
ListDatasetsRequest#
List the datasets for the given query
Field |
Type |
Label |
Description |
---|---|---|---|
filter |
Apply the filter expression to this query |
||
pagination |
Pagination options to get a page of datasets |
ListDatasetsResponse#
List the datasets response with token for next pagination
Metadata#
Metadata representation for artifacts and datasets
Field |
Type |
Label |
Description |
---|---|---|---|
key_map |
repeated |
key map is a dictionary of key/val strings that represent metadata |
Metadata.KeyMapEntry#
PaginationOptions#
Pagination options for making list requests
Field |
Type |
Label |
Description |
---|---|---|---|
limit |
the max number of results to return |
||
token |
the token to pass to fetch the next page |
||
sortKey |
the property that we want to sort the results by |
||
sortOrder |
the sort order of the results |
Partition#
An artifact could have multiple partitions and each partition can have an arbitrary string key/value pair
PartitionPropertyFilter#
Partition properties we can filter by
Field |
Type |
Label |
Description |
---|---|---|---|
key_val |
ReleaseReservationRequest#
Request to release reservation
Field |
Type |
Label |
Description |
---|---|---|---|
reservation_id |
|||
owner_id |
ReleaseReservationResponse#
Response to release reservation
Reservation#
A reservation including owner, heartbeat interval, expiration timestamp, and various metadata.
ReservationID#
ReservationID message that is composed of several string fields.
SinglePropertyFilter#
A single property to filter on.
Field |
Type |
Label |
Description |
---|---|---|---|
tag_filter |
|||
partition_filter |
|||
artifact_filter |
|||
dataset_filter |
|||
operator |
field 10 in case we add more entities to query |
Tag#
Tag message that is unique to a Dataset. It is associated to a single artifact and can be retrieved by name later.
TagPropertyFilter#
Tag properties we can filter by
UpdateArtifactRequest#
Request message for updating an Artifact and overwriting its associated ArtifactData.
Field |
Type |
Label |
Description |
---|---|---|---|
dataset |
ID of dataset the artifact is associated with |
||
artifact_id |
|||
tag_name |
|||
data |
repeated |
List of data to overwrite stored artifact data with. Must contain ALL data for updated Artifact as any missing ArtifactData entries will be removed from the underlying blob storage and database. |
UpdateArtifactResponse#
Response message for updating an Artifact.
PaginationOptions.SortKey#
Name |
Number |
Description |
---|---|---|
CREATION_TIME |
0 |
PaginationOptions.SortOrder#
Name |
Number |
Description |
---|---|---|
DESCENDING |
0 |
|
ASCENDING |
1 |
SinglePropertyFilter.ComparisonOperator#
as use-cases come up we can add more operators, ex: gte, like, not eq etc.
Name |
Number |
Description |
---|---|---|
EQUALS |
0 |
DataCatalog#
Data Catalog service definition Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. Artifacts are associated with a Dataset, and can be tagged for retrieval.
Method Name |
Request Type |
Response Type |
Description |
---|---|---|---|
CreateDataset |
Create a new Dataset. Datasets are unique based on the DatasetID. Datasets are logical groupings of artifacts. Each dataset can have one or more artifacts |
||
GetDataset |
Get a Dataset by the DatasetID. This returns the Dataset with the associated metadata. |
||
CreateArtifact |
Create an artifact and the artifact data associated with it. An artifact can be a hive partition or arbitrary files or data values |
||
GetArtifact |
Retrieve an artifact by an identifying handle. This returns an artifact along with the artifact data. |
||
AddTag |
Associate a tag with an artifact. Tags are unique within a Dataset. |
||
ListArtifacts |
Return a paginated list of artifacts |
||
ListDatasets |
Return a paginated list of datasets |
||
UpdateArtifact |
Updates an existing artifact, overwriting the stored artifact data in the underlying blob storage. |
||
GetOrExtendReservation |
Attempts to get or extend a reservation for the corresponding artifact. If one already exists (ie. another entity owns the reservation) then that reservation is retrieved. Once you acquire a reservation, you need to periodically extend the reservation with an identical call. If the reservation is not extended before the defined expiration, it may be acquired by another task. Note: We may have multiple concurrent tasks with the same signature and the same input that try to populate the same artifact at the same time. Thus with reservation, only one task can run at a time, until the reservation expires. Note: If task A does not extend the reservation in time and the reservation expires, another task B may take over the reservation, resulting in two tasks A and B running in parallel. So a third task C may get the Artifact from A or B, whichever writes last. |
||
ReleaseReservation |
Release the reservation when the task holding the spot fails so that the other tasks can grab the spot. |
google/protobuf/timestamp.proto#
Timestamp#
A Timestamp represents a point in time independent of any time zone or local calendar, encoded as a count of seconds and fractions of seconds at nanosecond resolution. The count is relative to an epoch at UTC midnight on January 1, 1970, in the proleptic Gregorian calendar which extends the Gregorian calendar backwards to year one.
All minutes are 60 seconds long. Leap seconds are “smeared” so that no leap second table is needed for interpretation, using a [24-hour linear smear](https://developers.google.com/time/smear).
The range is from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z. By restricting to that range, we ensure that we can convert to and from [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt) date strings.
# Examples
Example 1: Compute Timestamp from POSIX time().
Timestamp timestamp; timestamp.set_seconds(time(NULL)); timestamp.set_nanos(0);
Example 2: Compute Timestamp from POSIX gettimeofday().
struct timeval tv; gettimeofday(&tv, NULL);
Timestamp timestamp; timestamp.set_seconds(tv.tv_sec); timestamp.set_nanos(tv.tv_usec * 1000);
Example 3: Compute Timestamp from Win32 GetSystemTimeAsFileTime().
FILETIME ft; GetSystemTimeAsFileTime(&ft); UINT64 ticks = (((UINT64)ft.dwHighDateTime) << 32) | ft.dwLowDateTime;
// A Windows tick is 100 nanoseconds. Windows epoch 1601-01-01T00:00:00Z // is 11644473600 seconds before Unix epoch 1970-01-01T00:00:00Z. Timestamp timestamp; timestamp.set_seconds((INT64) ((ticks / 10000000) - 11644473600LL)); timestamp.set_nanos((INT32) ((ticks % 10000000) * 100));
Example 4: Compute Timestamp from Java System.currentTimeMillis().
long millis = System.currentTimeMillis();
- Timestamp timestamp = Timestamp.newBuilder().setSeconds(millis / 1000)
.setNanos((int) ((millis % 1000) * 1000000)).build();
Example 5: Compute Timestamp from Java Instant.now().
Instant now = Instant.now();
- Timestamp timestamp =
- Timestamp.newBuilder().setSeconds(now.getEpochSecond())
.setNanos(now.getNano()).build();
Example 6: Compute Timestamp from current time in Python.
timestamp = Timestamp() timestamp.GetCurrentTime()
# JSON Mapping
In JSON format, the Timestamp type is encoded as a string in the [RFC 3339](https://www.ietf.org/rfc/rfc3339.txt) format. That is, the format is “{year}-{month}-{day}T{hour}:{min}:{sec}[.{frac_sec}]Z” where {year} is always expressed using four digits while {month}, {day}, {hour}, {min}, and {sec} are zero-padded to two digits each. The fractional seconds, which can go up to 9 digits (i.e. up to 1 nanosecond resolution), are optional. The “Z” suffix indicates the timezone (“UTC”); the timezone is required. A proto3 JSON serializer should always use UTC (as indicated by “Z”) when printing the Timestamp type and a proto3 JSON parser should be able to accept both UTC and other timezones (as indicated by an offset).
For example, “2017-01-15T01:30:15.01Z” encodes 15.01 seconds past 01:30 UTC on January 15, 2017.
In JavaScript, one can convert a Date object to this format using the standard [toISOString()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString) method. In Python, a standard datetime.datetime object can be converted to this format using [strftime](https://docs.python.org/2/library/time.html#time.strftime) with the time format spec ‘%Y-%m-%dT%H:%M:%S.%fZ’. Likewise, in Java, one can use the Joda Time’s [ISODateTimeFormat.dateTime()]( http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTime%2D%2D ) to obtain a formatter capable of generating timestamps in this format.
Field |
Type |
Label |
Description |
---|---|---|---|
seconds |
Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be from 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z inclusive. |
||
nanos |
Non-negative fractions of a second at nanosecond resolution. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be from 0 to 999,999,999 inclusive. |
google/protobuf/duration.proto#
Duration#
A Duration represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution. It is independent of any calendar and concepts like “day” or “month”. It is related to Timestamp in that the difference between two Timestamp values is a Duration and it can be added or subtracted from a Timestamp. Range is approximately +-10,000 years.
# Examples
Example 1: Compute Duration from two Timestamps in pseudo code.
Timestamp start = …; Timestamp end = …; Duration duration = …;
duration.seconds = end.seconds - start.seconds; duration.nanos = end.nanos - start.nanos;
- if (duration.seconds < 0 && duration.nanos > 0) {
duration.seconds += 1; duration.nanos -= 1000000000;
- } else if (duration.seconds > 0 && duration.nanos < 0) {
duration.seconds -= 1; duration.nanos += 1000000000;
}
Example 2: Compute Timestamp from Timestamp + Duration in pseudo code.
Timestamp start = …; Duration duration = …; Timestamp end = …;
end.seconds = start.seconds + duration.seconds; end.nanos = start.nanos + duration.nanos;
- if (end.nanos < 0) {
end.seconds -= 1; end.nanos += 1000000000;
- } else if (end.nanos >= 1000000000) {
end.seconds += 1; end.nanos -= 1000000000;
}
Example 3: Compute Duration from datetime.timedelta in Python.
td = datetime.timedelta(days=3, minutes=10) duration = Duration() duration.FromTimedelta(td)
# JSON Mapping
In JSON format, the Duration type is encoded as a string rather than an object, where the string ends in the suffix “s” (indicating seconds) and is preceded by the number of seconds, with nanoseconds expressed as fractional seconds. For example, 3 seconds with 0 nanoseconds should be encoded in JSON format as “3s”, while 3 seconds and 1 nanosecond should be expressed in JSON format as “3.000000001s”, and 3 seconds and 1 microsecond should be expressed in JSON format as “3.000001s”.
Field |
Type |
Label |
Description |
---|---|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
||
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive. |
google/protobuf/struct.proto#
ListValue#
ListValue is a wrapper around a repeated field of values.
The JSON representation for ListValue is JSON array.
Struct#
Struct represents a structured data value, consisting of fields which map to dynamically typed values. In some languages, Struct might be supported by a native representation. For example, in scripting languages like JS a struct is represented as an object. The details of that representation are described together with the proto support for the language.
The JSON representation for Struct is JSON object.
Field |
Type |
Label |
Description |
---|---|---|---|
fields |
repeated |
Unordered map of dynamically typed values. |
Struct.FieldsEntry#
Value#
Value represents a dynamically typed value which can be either null, a number, a string, a boolean, a recursive struct value, or a list of values. A producer of value is expected to set one of these variants. Absence of any variant indicates an error.
The JSON representation for Value is JSON value.
Field |
Type |
Label |
Description |
---|---|---|---|
null_value |
Represents a null value. |
||
number_value |
Represents a double value. |
||
string_value |
Represents a string value. |
||
bool_value |
Represents a boolean value. |
||
struct_value |
Represents a structured value. |
||
list_value |
Represents a repeated Value. |
NullValue#
NullValue is a singleton enumeration to represent the null value for the Value type union.
The JSON representation for NullValue is JSON null.
Name |
Number |
Description |
---|---|---|
NULL_VALUE |
0 |
Null value. |