Evaluating engineering tools? Get the comparison in Google Sheets

(Perfect for making buy/build decisions or internal reviews.)

Most-used commands

Thankyou for your submission

We have sent the cheatsheet on your email!

Oops! Something went wrong while submitting the form.

Installation and Setup

pip install horovod[tensorflow,pytorch,mxnet] Install Horovod with frameworks support horovodrun --check-build Verify Horovod installation and supported frameworks

Basic Usage

horovodrun -np 4 -H localhost:4 python script.py Run script with 4 processes on local machine horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python script.py Run script on 4 servers with 4 processes each

Framework Integration

import horovod.tensorflow as hvd Import Horovod for TensorFlow import horovod.torch as hvd Import Horovod for PyTorch import horovod.mxnet as hvd Import Horovod for MXNet hvd.init() Initialize Horovod hvd.size() Get number of processes hvd.rank() Get rank of current process hvd.local_rank() Get local rank within node

Distributed Operations

hvd.allreduce(tensor, name='allreduce') Average tensor across all processes hvd.allgather(tensor, name='allgather') Gather tensors from all processes hvd.broadcast(tensor, root_rank=0, name='broadcast') Broadcast tensor from root rank to all processes hvd.broadcast_parameters(model.state_dict(), root_rank=0) Broadcast model parameters (PyTorch) hvd.broadcast_variables(tf_variables, root_rank=0) Broadcast variables (TensorFlow)

Optimizer Wrapping

hvd.DistributedOptimizer(optimizer) Wrap optimizer for distributed training opt = hvd.DistributedOptimizer(opt, backward_passes_per_step=1) Set backward passes per step

Advanced Options

horovodrun --timeline-filename timeline.json Generate timeline for performance analysis horovodrun --verbose Enable verbose logging horovodrun --gloo Force using Gloo as communication backend horovodrun --mpi Force using MPI as communication backend horovodrun --nccl Force using NCCL as communication backend

Environment Variables

export HOROVOD_GPU_OPERATIONS=NCCL Set GPU operations backend export HOROVOD_CPU_OPERATIONS=MPI Set CPU operations backend export HOROVOD_TIMELINE=timeline.json Enable timeline recording export HOROVOD_FUSION_THRESHOLD=67108864 Set tensor fusion threshold (bytes)