flytekit.types.directory.FlyteDirectory

class flytekit.types.directory.FlyteDirectory(path, downloader=None, remote_directory=None)[source]

Warning

This class should not be used on very large datasets, as merely listing the dataset will cause the entire dataset to be downloaded. Listing on S3 and other backend object stores is not consistent and we should not need data to be downloaded to list.

Please first read through the comments on the flytekit.types.file.FlyteFile class as the implementation here is similar.

One thing to note is that the os.PathLike type that comes with Python was used as a stand-in for FlyteFile. That is, if a task’s output signature is an os.PathLike, Flyte takes that to mean FlyteFile. There is no easy way to distinguish an os.PathLike where the user means a File and where the user means a Directory. As such, if you want to use a directory, you must declare all types as FlyteDirectory. You’ll still be able to return a string literal though instead of a full-fledged FlyteDirectory object assuming the str is a directory.

Converting from a Flyte literal value to a Python instance of FlyteDirectory

Type of Flyte IDL Literal

FlyteDirectory

Multipart Blob

uri matches http(s)/s3/gs

FlyteDirectory object stores the original string path, but points to a local file instead.

  • [fn] downloader: function that writes to path when open’ed.

  • [fn] download: will trigger download

  • path: randomly generated local path that will not exist until downloaded

  • remote_path: None

  • remote_source: original http/s3/gs path

uri matches /local/path

FlyteDirectory object just wraps the string

  • [fn] downloader: noop function

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

Converting from a Python value (FlyteDirectory, str, or pathlib.Path) to a Flyte literal

Type of Python value

FlyteDirectory

str or pathlib.Path or FlyteDirectory

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. Nothing is uploaded.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.

As inputs

def t1(in1: FlyteDirectory):
    ...

def t1(in1: FlyteDirectory["svg"]):
    ...

As outputs:

The contents of this local directory will be uploaded to the Flyte store.

return FlyteDirectory("/path/to/dir/")

return FlyteDirectory["svg"]("/path/to/dir/", remote_path="s3://special/output/location")

Similar to the FlyteFile example, if you give an already remote location, it will not be copied to Flyte’s durable store, the uri will just be stored as is.

return FlyteDirectory("s3://some/other/folder")

Note if you write a path starting with http/s, if anything ever tries to read it (i.e. use the literal as an input, it’ll fail because the http proxy doesn’t know how to download whole directories.

The format [] bit is still there because in Flyte, directories are stored as Blob Types also, just like files, and the Blob type has the format field. The difference in the type field is represented in the dimensionality field in the BlobType.

Parameters
  • path – The source path that users are expected to call open() on

  • downloader – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().

  • remote_directory – If the user wants to return something and also specify where it should be uploaded to.

__init__(path, downloader=None, remote_directory=None)[source]
Parameters
  • path (str) – The source path that users are expected to call open() on

  • downloader (Optional[Callable]) – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().

  • remote_directory – If the user wants to return something and also specify where it should be uploaded to.

Methods

__init__(path[, downloader, remote_directory])

param path

The source path that users are expected to call open() on

download()

extension()

Attributes

downloaded

path

remote_directory

remote_source

//something, flytekit will download the directory for the user.