- class flytekit.types.directory.FlyteDirectory(path, downloader=None, remote_directory=None)¶
This class should not be used on very large datasets, as merely listing the dataset will cause the entire dataset to be downloaded. Listing on S3 and other backend object stores is not consistent and we should not need data to be downloaded to list.
Please first read through the comments on the
flytekit.types.file.FlyteFileclass as the implementation here is similar.
One thing to note is that the
os.PathLiketype that comes with Python was used as a stand-in for
FlyteFile. That is, if a task’s output signature is an
os.PathLike, Flyte takes that to mean
FlyteFile. There is no easy way to distinguish an
os.PathLikewhere the user means a File and where the user means a Directory. As such, if you want to use a directory, you must declare all types as
FlyteDirectory. You’ll still be able to return a string literal though instead of a full-fledged
FlyteDirectoryobject assuming the str is a directory.
Converting from a Flyte literal value to a Python instance of FlyteDirectory
Type of Flyte IDL Literal
uri matches http(s)/s3/gs
FlyteDirectory object stores the original string path, but points to a local file instead.
[fn] downloader: function that writes to path when open’ed.
[fn] download: will trigger download
path: randomly generated local path that will not exist until downloaded
remote_source: original http/s3/gs path
uri matches /local/path
FlyteDirectory object just wraps the string
[fn] downloader: noop function
[fn] download: raises exception
path: just the given path
Converting from a Python value (FlyteDirectory, str, or pathlib.Path) to a Flyte literal
Type of Python value
str or pathlib.Path or FlyteDirectory
path matches http(s)/s3/gs
Blob object is returned with uri set to the given path. Nothing is uploaded.
path matches /local/path
Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.
def t1(in1: FlyteDirectory): ... def t1(in1: FlyteDirectory["svg"]): ...
The contents of this local directory will be uploaded to the Flyte store.
return FlyteDirectory("/path/to/dir/") return FlyteDirectory["svg"]("/path/to/dir/", remote_path="s3://special/output/location")
Similar to the FlyteFile example, if you give an already remote location, it will not be copied to Flyte’s durable store, the uri will just be stored as is.
Note if you write a path starting with http/s, if anything ever tries to read it (i.e. use the literal as an input, it’ll fail because the http proxy doesn’t know how to download whole directories.
The format  bit is still there because in Flyte, directories are stored as Blob Types also, just like files, and the Blob type has the format field. The difference in the type field is represented in the
dimensionalityfield in the
path – The source path that users are expected to call open() on
downloader – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().
remote_directory – If the user wants to return something and also specify where it should be uploaded to.
- __init__(path, downloader=None, remote_directory=None)¶
path (str) – The source path that users are expected to call open() on
downloader (Optional[Callable]) – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().
__init__(path[, downloader, remote_directory])
- param path
The source path that users are expected to call open() on
//something, flytekit will download the directory for the user.