flytekit.types.file.FlyteFile

class flytekit.types.file.FlyteFile(path: 'typing.Union[str, os.PathLike]', downloader: 'typing.Callable' = <function noop at 0x7f2a35cf1c10>, remote_path: 'typing.Optional[typing.Union[os.PathLike, bool]]' = None)[source]

Methods

Parameters:
download()[source]
Return type:

str

classmethod extension()[source]
Return type:

str

classmethod from_dict(d, *, dialect=None)
classmethod from_json(data, decoder=<function loads>, **from_dict_kwargs)
Parameters:
Return type:

T

classmethod from_source(source)[source]

Create a new FlyteFile object with the remote source set to the input

Parameters:

source (str | os.PathLike)

Return type:

FlyteFile

classmethod new_remote_file(name=None)[source]

Create a new FlyteFile object with a remote path.

Parameters:

name (str | None)

Return type:

FlyteFile

open(mode, cache_type=None, cache_options=None)[source]

Returns a streaming File handle

@task
def copy_file(ff: FlyteFile) -> FlyteFile:
    new_file = FlyteFile.new_remote_file(ff.name)
    with ff.open("rb", cache_type="readahead", cache={}) as r:
        with new_file.open("wb") as w:
            w.write(r.read())
    return new_file

Alternatively,

@task
def copy_file(ff: FlyteFile) -> FlyteFile:
    new_file = FlyteFile.new_remote_file(ff.name)
    with fsspec.open(f"readahead::{ff.remote_path}", "rb", readahead={}) as r:
        with new_file.open("wb") as w:
            w.write(r.read())
    return new_file
Parameters:
  • mode (str) – str Open mode like ‘rb’, ‘rt’, ‘wb’, …

  • cache_type (str | None) – optional str Specify if caching is to be used. Cache protocol can be ones supported by fsspec https://filesystem-spec.readthedocs.io/en/latest/api.html#readbuffering, especially useful for large file reads

  • cache_options (Dict[str, Any] | None) – optional Dict[str, Any] Refer to fsspec caching options. This is strongly coupled to the cache_protocol

to_dict()
to_json(encoder=<function dumps>, **to_dict_kwargs)
Parameters:
Return type:

str | bytes | bytearray

Attributes

downloaded
path: str | PathLike = None

Since there is no native Python implementation of files and directories for the Flyte Blob type, (like how int exists for Flyte’s Integer type) we need to create one so that users can express that their tasks take in or return a file. There is pathlib.Path of course, (which is usable in Flytekit as a return value, though not a return type), but it made more sense to create a new type esp. since we can add on additional properties.

Files (and directories) differ from the primitive types like floats and string in that Flytekit typically uploads the contents of the files to the blob store connected with your Flyte installation. That is, the Python native literal that represents a file is typically just the path to the file on the local filesystem. However in Flyte, an instance of a file is represented by a Blob literal, with the uri field set to the location in the Flyte blob store (AWS/GCS etc.). Take a look at the data handling doc for a deeper discussion.

We decided to not support pathlib.Path as an input/output type because if you wanted the automatic upload/download behavior, you should just use the FlyteFile type. If you do not, then a str works just as well.

The prefix for where uploads go is set by the raw output data prefix setting, which should be set at registration time in the launch plan. See the option listed under flytectl register examples --help for more information. If not set in the launch plan, then your Flyte backend will specify a default. This default is itself configurable as well. Contact your Flyte platform administrators to change or ascertain the value.

In short, if a task returns "/path/to/file" and the task’s signature is set to return FlyteFile, then the contents of /path/to/file are uploaded.

You can also make it so that the upload does not happen. There are different types of task/workflow signatures. Keep in mind that in the backend, in Admin and in the blob store, there is only one type that represents files, the Blob type.

Whether the uploading happens or not, the behavior of the translation between Python native values and Flyte literal values depends on a few attributes:

  • The declared Python type in the signature. These can be * python:flytekit.FlyteFile * os.PathLike Note that os.PathLike is only a type in Python, you can’t instantiate it.

  • The type of the Python native value we’re returning. These can be * flytekit.FlyteFile * pathlib.Path * str

  • Whether the value being converted is a “remote” path or not. For instance, if a task returns a value of “http://www.google.com” as a FlyteFile, obviously it doesn’t make sense for us to try to upload that to the Flyte blob store. So no remote paths are uploaded. Flytekit considers a path remote if it starts with s3://, gs://, http(s)://, or even file://.

Converting from a Flyte literal value to a Python instance of FlyteFile

Expected Python type

Type of Flyte IDL Literal

FlyteFile

os.PathLike

Blob

uri matches http(s)/s3/gs

FlyteFile object stores the original string path, but points to a local file instead.

  • [fn] downloader: function that writes to path when open’ed.

  • [fn] download: will trigger download

  • path: randomly generated local path that will not exist until downloaded

  • remote_path: None

  • remote_source: original http/s3/gs path

Basically this signals Flyte should stay out of the way. You still get a FlyteFile object (which implements the os.PathLike interface)

  • [fn] downloader: noop function, even if it’s http/s3/gs

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

uri matches /local/path

FlyteFile object just wraps the string

  • [fn] downloader: noop function

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

Converting from a Python value (FlyteFile, str, or pathlib.Path) to a Flyte literal

Expected Python type

Type of Python value

FlyteFile

os.PathLike

str or pathlib.Path

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. No uploading happens.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. Blob object is returned with uri pointing to the blob store location.

No warning is logged since only a string is given (as opposed to a FlyteFile). Blob object is returned with uri set to just the given path. No uploading happens.

FlyteFile

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. Nothing is uploaded.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.

Warning is logged since you’re passing a more complex object (a FlyteFile) and expecting a simpler interface (os.PathLike). Blob object is returned with uri set to just the given path. No uploading happens.

Since Flyte file types have a string embedded in it as part of the type, you can add a format by specifying a string after the class like so.

def t2() -> flytekit_typing.FlyteFile["csv"]:
    return "/tmp/local_file.csv"
remote_path
remote_source

If this is an input to a task, and the original path is an s3 bucket, Flytekit downloads the file for the user. In case the user wants access to the original path, it will be here.