Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

Parallelism

Parallelism

After repeatedly hearing terms like FSDP, Tensor Parallelism, Model Parallelism and Pipeline Parallelism I wanted to write them out in my own words.

Looking around, there's also work trying to use more heterogenous systems.

Collectives

Types of Parallelism

Data Parallel

Model Parallel

Tensor Parallel

Pipeline Parallel

Fully Sharded Data Parallel, FSDP

Shard model parameters, gradients, optimizer states across gpus; saves a lot of memory and can be very convenient.

References

Kunal