Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2024-04-07

All over the place this week.

FZF for Rapid Application Development

A colleague (JL) built a very useful CLI tool that relied on stacking fzf invocations: I'm slowly realizing just how powerful fzf can be for building CL I applications quickly and painlessly; particularly for CLI applications that can compose quickly.

This blog shows off a little bit of what's possible with fzf. The ability to build a UI simply specifying commands that can be run is fairly amazing.

I'm also wondering if it's possible to use fzf to build a command creator, particulalry for dense launch commands like torchx and srun (from slurm). fzf can show contextual autocomplete derived from the location in the command, and that's something that could potentially be generalized by reading the --help and man outputs of different commands.

This may also be an interesting application of LLMs, to convert the --help output (or man page) into something that can be used for terminal autocomplete easily.

I only wish that zsh autocomplete was a little bit easier to hook into; I almost find myself wishing for a shell LSP that was easier to hook into.

NUMA: non uniform memory access

Ran into something fairly confusing: there seemed to be a lot of memory available but the host started allocating swap instead. A different helpful & knowledgeable colleague talked through how the kernel chooses which memory pages to reclaim.

The part I had been completely new to was NUMA: sometimes servers can have enough memory that depending on the core, some memory may be closer or farther. Intel's docs talk about this some. This was one of the things that nerd sniped me this week.

Making graphs from PyTorch

ezyang prototyped a debugger for PyTorch this week using torch dispatch to intercept ops, and then use python's ridiculous frame introspection to map it to actual code execution.

What I'm really looking forward to is to be able outline the tree of modules in PyTorch (as nested objects), map them to the actual operations and Cuda kernels. After spending some time exploring torch dispatch and even connecting it to my unpublished implementation of Intermediate Logging I'm now exploring Kineto Traces and meta tensors to see what's possible. I could potentially use torch dispatch to track how tensor values depend on each other, and have different variations of the execution graph much more visible.

Overlaying network activity on top of that, along with sizes / bandwidths used / memory and flops consumed and I probably have a replacement for trying to write an article on Mechanical Sympathy: because the model will just be observable.

Chainsaw: Minhash and friends

The other idea that's been stuck in my head for far too long is that in most HPC you have a lot of identical things going on on several machines (and sometimes several times within the same machine) at the same time. When things go wrong, it's generally one machine/rank that's misbehaving: and finding that one tends to become tricky quickly.

Identifying the outlier from logs and other sources is the thing I've become pretty interested in after seeing it applied several times consistently -- after looking for several algorithms I finally came across minhash. I need to actually test this on real logs to see if there's any promise in this approach: datasketch looks very promising to at least prototype quickly.

Unfortunately, this also introduces another book into my overflowing queue: MMDS or Mining of Massive Datasets by Jeff Ullman et al.

Transformers Lectures (CS25)

Learned from Threads: Luokai that Stanford's CS25 course will be publicly available, streamed on Thursdays from 4.30 - 5.50pm PDT.

I've blocked off the time on my calendar, and hope to watch all of these courses.

Enjoying the work

I went for a concert I've always hoped to: Satch/Vai at the Beacon Theater. Joe ended the show by talking about how he and Steve had decided they wanted to play the electric guitar for as long as they could: and then stuck to the plan.

I enjoy programming, and hope to keep going as long as I can too.

Kunal