Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Satisfying another static-site generation itch this week with a simple table of contents generated with the strategic application of fragile regular expressions.
Continuing to work on a transformer implementation, I'm finding Arena very helpful, because I can write the transformer code layer by layer and sanity check it against GPT2. Once I have this, I'd like to try and implement my own completely from scratch or from a paper, but at least this gives me something bite sized to tackle first.
Visualizing / thinking through multidimensional matrix multiplication
is giving me a massive headache, if I'm honest. Peeking at solutions
that just use einsum
was bittersweet -- I'm happy it's possible to
express it so cleanly, and I was sad I hadn't known about it earlier.
I definitely don't enjoy having to deal with batches as an additional
dimension -- I almost want batches to be something that can be plugged
into the model post hoc as a post-processing step.
Trying to debug my Attention block -- given I have a reference implementation open right in front of me -- is a good exercise in realizing very concretely that ML Debugging tooling is very primitive at a lot of levels; trying to build a transformer is just driving that home very viscerally. On the plus side, this gives me good ideas for projects and visualizations to build.
On the same note, I'm pretty excited about reading this paper on Simplifying Transformer Blocks (via @arunsees).
I should read more about Signal Propagation Theory. But I think I'll stick to having my own transformer implementation first before I keep distracting myself.
Still iterating on the attention block, I'm now rewriting it into
something like I would normally do instead of relying heavily on
existing Module patterns. Started with a little bit of procrastination
to implement my own typing Macros in Hy -- I can see a whole new world
of DSLs opening up in front of me. I tried a plain old macro and a
reader macro to replace jaxtyping
s Float[torch.Tensor "dim1 dim2 dim3"]
:
(defreader T
(.slurp-space &reader)
(setv out (.parse-one-form &reader))
(setv #(dims tensortype) (.split out ":"))
`(get ~(hy.models.Symbol tensortype) #(torch.Tensor ~(dims.replace "," " "))))
which looks like #T dim1,dim2,dim3:Float
. But when used with an
annotation it becomes a bit too unwieldy for me (#^ #T ...
) and I
couldn't see a quick way to bypass that.
So I've settled on a simpler defmacro
instead, which looks like
(defmacro T [dims tensortype]
`(get ~(hy.models.Symbol tensortype)
#(torch.Tensor ~(.replace (str dims) "," " "))))
and I can use as (T dim1,dim2,dim3 Float)
. I'm not particularly
happy with this either, but it's better than (get Float #(torch.Tensor "dim1 dim2 dim3"))
which is what I've been living with
so far. I'll have to think a little bit more about this, and figure
out how to write assertions on these -- perhaps with a custom defn
wrapper macro instead.
YouTube's ranking algorithm has been getting better: it recommended a DevTools podcast with Mitchell Hashimoto showing off his workflow which was a lot of fun.
Personally, I switched from Mac to Linux a couple of years ago (I'm even typing this post on a decade old Macbook Air running ChromeOS Flex) because while I really like Mac's hardware, I was tired of fighting the software.
After writing a lot of documentation recently, I've been finding myself surprisingly attracted to using presentations as a quick documentation mechanism: I can easily annotate diagrams and text, point people to a very specific slide -- and most importantly -- they don't seem to be as overwhelming even if there are large amounts of content, just because of how skimmable they tend to be.
A significantly more formal -- and cleaner -- approach to documentation is at Diataxis which I plan to adopt and learn from.
I must admit to being very excited about tackling Advent of Code again with Hy this year (though I'm also tempted to try Zig or Go, just to get another systems programming language under my belt). I'm hoping to use it as motivation to improve some of the tools around Hy & Emacs -- potentially just updating some of the existing tools that have broken with language changes.
Trying to run Blitzen -- my Rust helper -- failed; at this point it's
probably too old too compile so I quickly made a version in Hy; with
Requests
and BeautifulSoup
this turned out to be very small and
surprisingly smooth to write; reinforcing why I enjoy writing Hy /
Python so much. SPOILERS below, of course.
I can't say I started off particularly smoothly: the first part was
reasonably quick to pull off, even though I stumbled a couple of times
-- submitting my solution with bzn
also went off smoothly.
The second level was painful for me though, mainly because I misunderstood the order in which the second number could be picked up. Regexing backward on a reversed string was the best solution I could come up with in a pinch.
At some point, I may try to do this with a state machine instead using a Trie just to make the parsing more efficient; writing it out in Hy will also be interesting -- all the indexing gets pretty painful quickly.
My original solution was significantly more verbose and painful, but
lended itself to refactoring remarkably well; so I'm happy I have a
small solution up and running at the end. Of course, looking at Reddit
shows me that I could have handled overlapping strings much more
smoothly in
Python
by retaining the original letters; given the constraints we were
facing this would just work. I could also have been fancier and used
lookahead regularexpressions (?=(...)
).
Short and sweet; I even managed to just sneak into an under 1000 rank for part 1, and just above 1000 for part 2. Every time I do Advent of Code I remind myself that it would be a good idea to have some way to very simply and quickly read strings, and perhaps I should keep some library of parser combinators handy.
I took a mostly mechanical approach today and could generally write functional code fairly quickly: though I was somewhat betrayed by my laptop which ended up freezing and had to be restarted. Given it was just past midnight, I have to wonder if there was an update transparently being applied that broke things.
Happily, I could still submit my solution to part 1 through emacs even
though Chrome and my Terminal stopped function; bzn.submit-answer
happily pulled through.
— Kunal