Spark API reference#

Tags: Spark, Integration, DistributedComputing

This package contains things that are useful when extending Flytekit.

new_spark_session(name[, conf])

Optionally creates a new spark session and returns it.

ParquetToSparkDecodingHandler()

Spark([spark_conf, hadoop_conf, ...])

Use this to configure a SparkContext for a your task.

SparkDataFrameSchemaReader(from_path, cols, fmt)

Implements how SparkDataFrame should be read using the open method of FlyteSchema

SparkDataFrameSchemaWriter(to_path, cols, fmt)

Implements how SparkDataFrame should be written to using open method of FlyteSchema

SparkDataFrameTransformer()

Transforms Spark DataFrame's to and from a Schema (typed/untyped)