Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2023-12-10

-------------- | -------------- | --------------- |Time to first| 13m21s | 3m11s | |Time to second| 20m10s | 6m12s | |Bugs|| 1m19s Didn't split line | ||| 2m57s Didn't include original number | ||| 4m35s Didn't toggle the flag |

With these times I would have barely made it into the leaderboard for part 1, and not at all for part 2.

Potential follow ups:

Somewhat disappointingly, trying to redo the problem still led to bugs: in the past, I've run into this while trying to go too fast. Instead of trying to push for speed, I think I need to focus more on executing the code constantly in my head as I type and making sure I avoid making mistakes instead.

Knowing the problems I'd run into the first time around helped speed up the 3rd attempt to something around 4 minutes, but that's so far from a real attempt.

Transformers

Slowly working through the attention block and cross checking every step; I hadn't internalized that the calculations for K/Q/V are exactly the same before masking / summing over values.

From Hacker News: LLM Visualization is astonishingly well put together. I wish I was comfortable enough with 3-d programming to generate visualizations like that.

I've finally succeeded in making my attention block match the one from the test, through main brute force -- comparing tensor by tensor, step by step. My mistake was at the last step: instead of summing over the heads explicitly, and then adding biases I was using einsum to both add the biases and sum over the heads at the same time. At the same time, the test used a buffer IGNORE that I had just set as a constant -- but for evaluation the buffer would be overridden making my values diverge further.

Clearly I'm still not entirely clear on the ordering of the operations and why they matter; I suspect the model would train the same (but that is probably just my naivete).

I finally managed to make my Transformer work closely against the reference GPT2 implementation provided by the TransformerLens library. Of course, as soon as I find build the most basic familiarity with transformers, a new paper is released that seems very promising: Mamba. I'll spend some time playing with this as well, while I build more mechanical sympathy for these models.

I've been spending some time thinking about what to tackle next, and working through and implementing llama.c to reproduce the Llama2 and actually using it for inference sounds like a good plan -- and of course, I'll be doing all of that in Hy.

Fauvism

While playing with a terminal version of TensorBoard I started wondering if I could render cubism-style charts instead to gain even higher resolution than the sparklines offered by Textual. I also wanted something that didn't try to fit the data into the available width, and would instead let me scroll.

Which led to a small project I plan to use in a couple of places, including the open source release of Intermediate Logging. Fauvism tries to render both positive and negative values, and I'm building it in layers (Rich Renderable -> Textual Component) and a reasonably useful CLI app as well.

The most interesting bit is figuring out how to show negative values: I'm experimenting with flipping the bg/fg colors and simulating unicode blocks that start from the top instead. I still need to work through the edge cases though -- but I've only spent around two hours on this so far.

This is in Hy too, but is transparent to library users (__init__.py imports Hy, making the rest trivial).

Meta

On 2023-12-05 I completed 12 years working at Meta: 3.5 years at Facebook California, and 8.5 years in New York. It's been a while.

Kunal