Commands Cheat Sheet

Try DrDroid: AI Agent for Debugging

80+ monitoring tool integrations
Long term memory about your stack
Locally run Mac App available

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

Thank you for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.
Read more
Time to stop copy pasting your errors onto Google!

Installation

pip install deepspeed
Install DeepSpeed library

pip install deepspeed-mii
Install DeepSpeed Model Implementations for Inference (MII)

Configuration

deepspeed --help
Display DeepSpeed CLI help

deepspeed --json_config=ds_config.json
Launch training with DeepSpeed configuration file

Training

deepspeed --num_gpus=4 train.py
Launch training script with 4 GPUs

deepspeed --num_nodes=2 --num_gpus=8 train.py
Distributed training across 2 nodes with 8 GPUs each

model_engine, optimizer, _, _ = deepspeed.initialize(args=args, model=model, model_parameters=params)
Initialize DeepSpeed engine in code

Monitoring

deepspeed --tensorboard_dir=./logs train.py
Enable TensorBoard logging

deepspeed --wandb train.py
Enable Weights & Biases integration

Checkpointing

model_engine.save_checkpoint(save_dir)
Save model checkpoint

_, client_state = model_engine.load_checkpoint(load_dir)
Load model checkpoint

deepspeed --zero_stage=3 --save_interval=1
Save ZeRO-3 checkpoints every epoch

Inference

model = deepspeed.init_inference(model, mp_size=2, dtype=torch.half)
Initialize model for inference with tensor parallelism

from deepspeed.mii import MIIServer
Import MII server for inference

MIIServer.start(model='gpt2', dtype='fp16')
Start inference server with a model

Profiling

deepspeed --autotuning=tune train.py
Enable autotuning for optimal performance

deepspeed --flops_profiler train.py
Enable FLOPS profiler

model_engine.flops_profiler.start_profile()
Start FLOPS profiling

model_engine.flops_profiler.stop_profile()
Stop FLOPS profiling and print results

ZeRO Optimization

deepspeed --zero_stage=1 train.py
Enable ZeRO stage 1 (optimizer state partitioning)

deepspeed --zero_stage=2 train.py
Enable ZeRO stage 2 (optimizer + gradient partitioning)

deepspeed --zero_stage=3 train.py
Enable ZeRO stage 3 (optimizer + gradient + parameter partitioning)

deepspeed --zero_offload train.py
Enable CPU offloading with ZeRO