Spark API reference#

Tags: Spark, Integration, DistributedComputing

This package contains things that are useful when extending Flytekit.

new_spark_session(name[, conf])

Optionally creates a new spark session and returns it.

ParquetToSparkDecodingHandler()

Extend this abstract class, implement the decode function, and register your concrete class with the StructuredDatasetTransformerEngine class in order for the core flytekit type engine to handle dataframe libraries.

Spark([spark_conf, hadoop_conf, ...])

Use this to configure a SparkContext for a your task.

SparkDataFrameSchemaReader(from_path, cols, fmt)

Implements how SparkDataFrame should be read using the open method of FlyteSchema

SparkDataFrameSchemaWriter(to_path, cols, fmt)

Implements how SparkDataFrame should be written to using open method of FlyteSchema

SparkDataFrameTransformer()

Transforms Spark DataFrame's to and from a Schema (typed/untyped)