Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Another surprisingly busy week; weeks where I don't have enough time to learn something new seem disppointing and pale -- I should make a stronger effort to take time out to study and explore and build.
I picked up one of my favorite books -- The Art of Science and Engineering by Richard Hamming; apart from Small Gods -- it's one of the books that been most influential about how I think about my life. I'm looking forward to re-reading with a new lens, and hopefully getting something new out of the book.
I find myself slowly becoming more proficient with Bash; enough to be able to quickly put things together without having to google too much. Quoting and arrays are still a nightmare, of course, but there are places where they just work.
Thinking about Bash, Python, and the desire to write systems programming code, I found myself disappointed: a Lisp-like macro system and homoiconicity seems perfect for writing efficient code, but there was no Lisp that seemed to satisfy these requirements. I find myself tempted to write my own. This is in stark contrast to last week's dreams on building an automatic profiler, but is somewhere close by.
I find myself tempted to work through Crafting Interpreters with Hy, using the effort to improve Hy itself, think about building my own language and levelling up a little bit. At the same time, I'm curious about which programming languages would be easy for a Transformer to write programs with and get feedback; would assembly be simpler?
Of course, ChatGPT said that Python is the easiest language to write because of the sheer amount of existing code. That said, I'm a little surprised and suspicious.
At the same time, I'm also surprised at the lack of specific programming tools: Copilot and ChatGPT should be able to do significantly more analysis on the programs being written to design real systems well and quickly.
As a project, I expect I'll go back to numpy or PyTorch -- I haven't enjoyed using JAX much, and with PyTorch i should be able to write code quickly.
I spent time standing in lines and sitting around in a cafe re-reading How Transformers Work along with several links within the post -- and that helped make things click much more clearly than they have recently, particularly when reading after watching Karpathy's videos a few weeks ago.
The thing I'm still struggling with is that the transformers -- and perhaps a lot of the architectures -- are much more evolved and empirically determined instead of being designed. Why does the value of attention heads fall of after adding 6? That's probably some function of the input data, information theory, and may be aligned with the tokenizer.
I really appreciated that this blog post also went into the details of tokenization, which have been somewhat obscure for me -- just because I haven't gotten around to paying attention. There is something here to play with, and I really enjoy Anthropic's approach to this with Mechanistic evaluation of models in Transformer Circuits.
On a completely different note, I also spent time building a TUI using textual and Hy to let off steam (and I suspect I'll be treating this project as my personal video game for the coming few weeks).
I've been having a terrible time getting used to all the APIs and mechanisms available in Textual to write apps -- and if I had one suggestion to make it would be to make it much simpler; right now the API and the components offer too many things (worker threads, magic async functions depending on how you define things, way to many magic instance members that change behavior) and simplify it to something that maintains views.
That's how I'm planning to use it anyways, with all the business and
data fetching logic extracted (something like MVVM potentially? or
MVC?) in a way that feels comfortable. Hy is beginning to feel
familiar, though I still stumble ofter (why does for
look like
(for [x xs] (print x))
but lfor
skip the []
:
(lfor x xs x * x)
. Potentially an implementation issue, but it was
surprising when I ran into it. The language is also significantly more
ergonomic than I had realized, with support for setx
that sets and
returns values as an alternative for setv
.
Hopefully this Thanksgiving weekend I'll have a chance to: take significantly more detailed notes from The Art of Science and Engineering, and potentially talk about applying it to the world today.
I'd also like to refurbish my online presence, reset and simplify my dotfiles and simply clean up this site and my slipbox significantly. I'll also be taking a stab at writing out the implementation of intermediate logging for open sourcing.
— Kunal