flytekit.types.directory.FlyteDirectory

class flytekit.types.directory.FlyteDirectory(path: str, downloader: Optional[Callable] = None, remote_directory=None)[source]

Warning

This class should not be used on very large datasets, as merely listing the dataset will cause the entire dataset to be downloaded. Listing on S3 and other backend object stores is not consistent and we should not need data to be downloaded to list.

Please first read through the comments on the flytekit.types.file.FlyteFile class as the implementation here is similar.

One thing to note is that the os.PathLike type that comes with Python was used as a stand-in for FlyteFile. That is, if a task returns an os.PathLike, Flyte takes that to mean FlyteFile. There is no easy way to distinguish an os.PathLike where the user means a File and where the user means a Directory. As such, if you want to use a directory, you must declare all types as FlyteDirectory. You’ll still be able to return a string literal though instead of a full-fledged FlyteDirectory object assuming the str is a directory.

Use cases as inputs

def t1(in1: FlyteDirectory):
    ...

def t1(in1: FlyteDirectory["svg"]):
    ...

As outputs:

The contents of this local directory will be uploaded to the Flyte store.

return FlyteDirectory("/path/to/dir/")

return FlyteDirectory["svg"]("/path/to/dir/", remote_path="s3://special/output/location")

Similar to the FlyteFile example, if you give an already remote location, it will not be copied to Flyte’s durable store, the uri will just be stored as is.

return FlyteDirectory("s3://some/other/folder")

Note if you write a path starting with http/s, if anything ever tries to read it (i.e. use the literal as an input, it’ll fail because the http proxy doesn’t know how to download whole directories.

The format [] bit is still there because in Flyte, directories are stored as Blob Types also, just like files, and the Blob type has the format field. The difference in the type field is represented in the dimensionality field in the BlobType.

__init__(path: str, downloader: Optional[Callable] = None, remote_directory=None)[source]
Parameters
  • path – The source path that users are expected to call open() on

  • downloader – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().

  • remote_directory – If the user wants to return something and also specify where it should be uploaded to.

Methods

__init__(path[, downloader, remote_directory])

param path

The source path that users are expected to call open() on

extension()

Attributes

downloaded

path

remote_directory

remote_source

//something, flytekit will download the directory for the user.