flytekitplugins.awssagemaker.DistributedTrainingContext#

class flytekitplugins.awssagemaker.DistributedTrainingContext(current_host: 'str', hosts: 'typing.List[str]', network_interface_name: 'str')[source]#

Methods

Parameters
  • current_host (str) –

  • hosts (List[str]) –

  • network_interface_name (str) –

Return type

None

classmethod from_env()[source]#

SageMaker suggests “Hostname information might not be immediately available to the processing container. We recommend adding a retry policy on hostname resolution operations as nodes become available in the cluster.” https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html#byoc-config This is why we have an automatic retry policy

Return type

flytekitplugins.awssagemaker.distributed_training.DistributedTrainingContext

classmethod from_sagemaker_context_file()[source]#
Return type

flytekitplugins.awssagemaker.distributed_training.DistributedTrainingContext

classmethod local_execute()[source]#

Creates a dummy local execution context for distributed execution. TODO revisit if this is a good idea

Return type

flytekitplugins.awssagemaker.distributed_training.DistributedTrainingContext

Attributes

current_host: str
hosts: List[str]
network_interface_name: str