flytekit.types.file.FlyteFile

class flytekit.types.file.FlyteFile(path, downloader=<function noop>, remote_path=None)[source]

Since there is no native Python implementation of files and directories for the Flyte Blob type, (like how int exists for Flyte’s Integer type) we need to create one so that users can express that their tasks take in or return a file. There is pathlib.Path of course, (which is usable in flytekit as a return value, though not a return type), but it made more sense to create a new type esp. since we can add on additional properties.

Files (and directories) differ from the primitive types like floats and string in that flytekit typically uploads the contents of the files to the blob store connected with your Flyte installation. That is, the Python native literal that represents a file is typically just the path to the file on the local filesystem. However in Flyte, an instance of a file is represented by a Blob literal, with the uri field set to the location in the Flyte blob store (AWS/GCS etc.).

We decided to not support pathlib.Path as an input/output type because if you wanted the automatic upload/download behavior, you should just use the FlyteFile type. If you do not, then a str works just as well.

The prefix for where uploads go is set by the raw output data prefix setting, which should be set at registration time. See the flytectl option for more information.

In short, if a task returns "/path/to/file" and the task’s signature is set to return FlyteFile, then the contents of /path/to/file are uploaded.

You can also make it so that the upload does not happen. There are a few different types you use for task/workflow signatures. Keep in mind that in the backend, in Admin and in the blob store, there is only one type that represents files, the Blob type.

Whether or not the uploading happens, and the behavior of the translation between Python native values and Flyte literal values depends on a few things:

  • The declared Python type in the signature. These can be * flytekit.FlyteFile * os.PathLike Note that os.PathLike is only a type in Python, you can’t instantiate it.

  • The type of the Python native value we’re returning. These can be * flytekit.FlyteFile * pathlib.Path * str

  • Whether the value being converted is a “remote” path or not. For instance if a task returns a value of “http://www.google.com” as a FlyteFile, obviously it doesn’t make sense for us to try to upload that to the Flyte blob store. So no remote paths are uploaded. flytekit considers a path remote if it starts with s3://, gs://, http(s)://, or even file://.

Converting from a Flyte literal value to a Python instance of FlyteFile

Expected Python type

Type of Flyte IDL Literal

FlyteFile

os.PathLike

Blob

uri matches http(s)/s3/gs

FlyteFile object stores the original string path, but points to a local file instead.

  • [fn] downloader: function that writes to path when open’ed.

  • [fn] download: will trigger download

  • path: randomly generated local path that will not exist until downloaded

  • remote_path: None

  • remote_source: original http/s3/gs path

Basically this signals Flyte should stay out of the way. You still get a FlyteFile object (which implements the os.PathLike interface)

  • [fn] downloader: noop function, even if it’s http/s3/gs

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

uri matches /local/path

FlyteFile object just wraps the string

  • [fn] downloader: noop function

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

Converting from a Python value (FlyteFile, str, or pathlib.Path) to a Flyte literal

Expected Python type

Type of Python value

FlyteFile

os.PathLike

str or pathlib.Path

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. No uploading happens.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. Blob object is returned with uri pointing to the blob store location.

No warning is logged since only a string is given (as opposed to a FlyteFile). Blob object is returned with uri set to just the given path. No uploading happens.

FlyteFile

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. Nothing is uploaded.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.

Warning is logged since you’re passing a more complex object (a FlyteFile) and expecting a simpler interface (os.PathLike). Blob object is returned with uri set to just the given path. No uploading happens.

Since Flyte file types have a string embedded in it as part of the type, you can add a format by specifying a string after the class like so.

def t2() -> flytekit_typing.FlyteFile["csv"]:
    return "/tmp/local_file.csv"
Parameters
  • path – The source path that users are expected to call open() on

  • downloader – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().

  • remote_path – If the user wants to return something and also specify where it should be uploaded to.

__init__(path, downloader=<function noop>, remote_path=None)[source]
Parameters
  • path (str) – The source path that users are expected to call open() on

  • downloader (Callable) – Optional function that can be passed that used to delay downloading of the actual fil until a user actually calls open().

  • remote_path – If the user wants to return something and also specify where it should be uploaded to.

Methods

__init__(path[, downloader, remote_path])

param path

The source path that users are expected to call open() on

download()

extension()

Attributes

downloaded

path

remote_path

remote_source

If this is an input to a task, and the original path is s3://something, flytekit will download the file for the user.