Working Notes: my commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Having spent a lot of time over the past few years on building infrastructure for Transformer models, I’m still not crystal clear on the actual calculations that happen within them. This work log is for experimenting with and building my own transformers and looking at the values inside them.
As a first attempt, trying to build simple transformers: I have vague memories of doing something similar while working through the videos by Andrej Karpathy but this time around I’ll poke a little bit myself. Reading about circuits was also helpful in getting ready for this.
Things I’d generally like to work on here:
Based on what I understand from the circuits videos, paper – and what I vaguely remember from the Zero-to-Hero series, a simple transformer should result in weights that are simply bigram statistics. Surprisingly, I’m finding myself struggling a little bit in structuring the code in a way that’s flexible and satisfying; I’ve read too much code.
— Kunal