Working Notes: my commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

Transformers

Having spent a lot of time over the past few years on building infrastructure for Transformer models, I’m still not crystal clear on the actual calculations that happen within them. This work log is for experimenting with and building my own transformers and looking at the values inside them.

May 2025

Been busy hacking on my own programming language that transpiles to C (and Cuda) and ideally makes it more convenient to play with transformers: this is probably the most I’ve ever procrastinated on actually solving the problem I’d set out to but I expect this to pay off over a life time of programming.

Some interesting new websites I need to go devour:

March 2025

05

Kicking off building a transformer in C from the basics by following Sebastian Raschka’s book; I’ll keep documenting my progress and observations here. Listing out things I hope to achieve with this project

I’ll revisit this when I’m done with this project.

February 2025

Curriculum

My day job has been keeping me almost entirely occupied, and I haven’t had much time to do the experiments or programming I would like to. There are a couple of projects I’d like to complete before I feel confident about transformers:

and at the same time actually build apply a small finetuned LLM to daily tasks.

with tools that help along the way

January 2025

Building minimal transformers

As a first attempt, trying to build simple transformers: I have vague memories of doing something similar while working through the videos by Andrej Karpathy but this time around I’ll poke a little bit myself. Reading about circuits was also helpful in getting ready for this.

Things I’d generally like to work on here:

0 layers 2025-01-20

Based on what I understand from the circuits videos, paper – and what I vaguely remember from the Zero-to-Hero series, a simple transformer should result in weights that are simply bigram statistics. Surprisingly, I’m finding myself struggling a little bit in structuring the code in a way that’s flexible and satisfying; I’ve read too much code.

Kunal