Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 2.23 KB

README.md

File metadata and controls

44 lines (31 loc) · 2.23 KB

Metaflow TensorFlow decorator

The tf.distribute.Strategy allows TensorFlow developers to distribute model training to multiple GPUs/TPUs and machines. This repository implements the Metaflow @tensorflow decorator, which sets up a multi-node Metaflow step to use this functionality.

Features

Installation

Install this experimental module:

pip install metaflow-tensorflow

Getting Started

This package will add a Metaflow extension to your already installed Metaflow, so you can use the tensorflow decorator.

from metaflow import FlowSpec, step, tensorflow, ...

The rest of this README.md file describes how you can use TensorFlow with Metaflow in the single node and multi-node cases which require @tensorflow.

TensorFlow Distributed on Metaflow guide

The examples in this repository are based on the original TensorFlow Examples.

Examples and guides

Directory TensorFlow script description
MirroredStrategy Synchronous distributed training on multiple GPUs on one machine.
MultiWorkerMirroredStrategy Synchronous distributed training across multiple workers, each with potentially multiple GPUs.

Parameter Server

Not yet tested, please reach out to the Outerbounds team if you need help.

Installing TensorFlow for GPU usage in Metaflow

From TensorFlow documentation: Do not install TensorFlow with conda. It may not have the latest stable version. pip is recommended since TensorFlow is only officially released to PyPI.

We have found the easiest way to install TensorFlow for GPU is to use the pre-made Docker image tensorflow/tensorflow:latest-gpu.

Fault Tolerance

See TensorFlow documentation on this matter. The TL;DR is to use a flavor of tf.distribute.Strategy, which implement mechanisms to handle worker failures gracefully.

License

metaflow-tensorflow is distributed under the Apache License.