Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2024-03-31

Spent the week reading more about LLMs, different types of model parallelism (again). I re-read, and forget what the different types of model parallelism were; I suspect I'll only be able to reason about these properly once I've manually implemented a model myself.

Parallelism

Anyways, HuggingFace has some excellent documentation. Writing out what I understand by hand again just to try and remember it a little bit longer:

Data parallelism: duplicate the model across GPUs, and split each batch into number of data parallel groups. There are different types of data parallelism depending on how the GPUs synchronize. (Distributed data parallelism: run forward/backward on each gpu with different data, and then average the results.)
Naive Model parallelism: split models by layers, where each layer forwards results and then gets them collected backwards as values propagate. - Pipeline parallelism: Naive Model parallelism, but the calculations are pipelined to avoid gpus sitting idle for some time.
Tensor parallelism: split tensors against the layers instead; Tensor Parallelism can be transparent to the rest of the modules with GPUs offloading work to others. The main requirement is that GPU interconnect should be really fast.

Of course, I need to spend some time to figure out actual values for fast, model size, and how much each type of operation costs.

Model Visualization

Wished into the ether for easier model visualization again.

There are several attempts at this: most of which render with graphviz and are not particularly interactive or useful. Sometimes I wonder if I should hire someone with excellent Javascript / 3.js / canvas skills and just get something built.

Tailscale

Set up tailscale on my personal laptops recently: after fixing up my personal laptop to stay connected after connecting to the internet I'm updating this week's letter remotely.

The SSH web client is pretty amazing: I'm finding myself able to use all the keyboard shortcuts I could have hoped to without chrome intercepting them; if I have any complaints it's only that my color scheme seems a bit messed up.

Given easy access to my personal laptop through a web cli (and potentially opening up more services through it) I'll end up spending a lot more time using/building CLI based applications.

— Kunal