NIM#

Tags: Inference, NVIDIA

Serve optimized model containers with NIM in a Flyte task.

NVIDIA NIM, part of NVIDIA AI Enterprise, provides a streamlined path for developing AI-powered enterprise applications and deploying AI models in production. It includes an out-of-the-box optimization suite, enabling AI model deployment across any cloud, data center, or workstation. Since NIM can be self-hosted, there is greater control over cost, data privacy, and more visibility into behind-the-scenes operations.

With NIM, you can invoke the model’s endpoint as if it is hosted locally, minimizing network overhead.

Installation#

To use the NIM plugin, run the following command:

pip install flytekitplugins-inference

Example usage#

For a usage example, see NIM example usage.

Note

NIM can only be run in a Flyte cluster as it must be deployed as a sidecar service in a Kubernetes pod.