Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Mark Seroufim learns out of spite
Programming massively parallel processors
I need to complement this with software dynamics the book
torch.load_inline
to build and use an extension rapidly
triton.jit(interpret=True) -- allows putting debug breakpoints in python
also an environmental variable
inspect ptx to see register usage, etc.
TORCH_LOGS=output_code
to generate triton code
ncu
python train.py
helps get rooflines
very powerful
torch profiler, autograd profiler, then ncu
a live demonstration of Brendan Gregg's technique I've adopted.
Follow ups
— Kunal