Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2023-12-03

Satisfying another static-site generation itch this week with a simple table of contents generated with the strategic application of fragile regular expressions.

Transformers

Continuing to work on a transformer implementation, I'm finding Arena very helpful, because I can write the transformer code layer by layer and sanity check it against GPT2. Once I have this, I'd like to try and implement my own completely from scratch or from a paper, but at least this gives me something bite sized to tackle first.

Visualizing / thinking through multidimensional matrix multiplication is giving me a massive headache, if I'm honest. Peeking at solutions that just use einsum was bittersweet -- I'm happy it's possible to express it so cleanly, and I was sad I hadn't known about it earlier. I definitely don't enjoy having to deal with batches as an additional dimension -- I almost want batches to be something that can be plugged into the model post hoc as a post-processing step.

Trying to debug my Attention block -- given I have a reference implementation open right in front of me -- is a good exercise in realizing very concretely that ML Debugging tooling is very primitive at a lot of levels; trying to build a transformer is just driving that home very viscerally. On the plus side, this gives me good ideas for projects and visualizations to build.

Simplifying Transformer Blocks

On the same note, I'm pretty excited about reading this paper on Simplifying Transformer Blocks (via @arunsees).

I should read more about Signal Propagation Theory. But I think I'll stick to having my own transformer implementation first before I keep distracting myself.

Type Macros

Still iterating on the attention block, I'm now rewriting it into something like I would normally do instead of relying heavily on existing Module patterns. Started with a little bit of procrastination to implement my own typing Macros in Hy -- I can see a whole new world of DSLs opening up in front of me. I tried a plain old macro and a reader macro to replace jaxtypings Float[torch.Tensor "dim1 dim2 dim3"]:

(defreader T
  (.slurp-space &reader)
  (setv out (.parse-one-form &reader))
  (setv #(dims tensortype) (.split out ":"))
  `(get ~(hy.models.Symbol tensortype) #(torch.Tensor ~(dims.replace "," " "))))

which looks like #T dim1,dim2,dim3:Float. But when used with an annotation it becomes a bit too unwieldy for me (#^ #T ...) and I couldn't see a quick way to bypass that.

So I've settled on a simpler defmacro instead, which looks like

(defmacro T [dims tensortype]
  `(get ~(hy.models.Symbol tensortype)
        #(torch.Tensor ~(.replace (str dims) "," " "))))

and I can use as (T dim1,dim2,dim3 Float). I'm not particularly happy with this either, but it's better than (get Float #(torch.Tensor "dim1 dim2 dim3")) which is what I've been living with so far. I'll have to think a little bit more about this, and figure out how to write assertions on these -- perhaps with a custom defn wrapper macro instead.

DevTools

YouTube's ranking algorithm has been getting better: it recommended a DevTools podcast with Mitchell Hashimoto showing off his workflow which was a lot of fun.

Personally, I switched from Mac to Linux a couple of years ago (I'm even typing this post on a decade old Macbook Air running ChromeOS Flex) because while I really like Mac's hardware, I was tired of fighting the software.

Documentation

After writing a lot of documentation recently, I've been finding myself surprisingly attracted to using presentations as a quick documentation mechanism: I can easily annotate diagrams and text, point people to a very specific slide -- and most importantly -- they don't seem to be as overwhelming even if there are large amounts of content, just because of how skimmable they tend to be.

A significantly more formal -- and cleaner -- approach to documentation is at Diataxis which I plan to adopt and learn from.

Advent of Code

I must admit to being very excited about tackling Advent of Code again with Hy this year (though I'm also tempted to try Zig or Go, just to get another systems programming language under my belt). I'm hoping to use it as motivation to improve some of the tools around Hy & Emacs -- potentially just updating some of the existing tools that have broken with language changes.

Trying to run Blitzen -- my Rust helper -- failed; at this point it's probably too old too compile so I quickly made a version in Hy; with Requests and BeautifulSoup this turned out to be very small and surprisingly smooth to write; reinforcing why I enjoy writing Hy / Python so much. SPOILERS below, of course.

Day 01

I can't say I started off particularly smoothly: the first part was reasonably quick to pull off, even though I stumbled a couple of times -- submitting my solution with bzn also went off smoothly.

The second level was painful for me though, mainly because I misunderstood the order in which the second number could be picked up. Regexing backward on a reversed string was the best solution I could come up with in a pinch.

At some point, I may try to do this with a state machine instead using a Trie just to make the parsing more efficient; writing it out in Hy will also be interesting -- all the indexing gets pretty painful quickly.

My original solution was significantly more verbose and painful, but lended itself to refactoring remarkably well; so I'm happy I have a small solution up and running at the end. Of course, looking at Reddit shows me that I could have handled overlapping strings much more smoothly in Python by retaining the original letters; given the constraints we were facing this would just work. I could also have been fancier and used lookahead regularexpressions (?=(...)).

Day 02

Short and sweet; I even managed to just sneak into an under 1000 rank for part 1, and just above 1000 for part 2. Every time I do Advent of Code I remind myself that it would be a good idea to have some way to very simply and quickly read strings, and perhaps I should keep some library of parser combinators handy.

Day 03

I took a mostly mechanical approach today and could generally write functional code fairly quickly: though I was somewhat betrayed by my laptop which ended up freezing and had to be restarted. Given it was just past midnight, I have to wonder if there was an update transparently being applied that broke things.

Happily, I could still submit my solution to part 1 through emacs even though Chrome and my Terminal stopped function; bzn.submit-answer happily pulled through.

Kunal