Working notes: reflecting on — and collecting — what I’ve been learning. Curated writing at expLog.



I spent most of this week hacking on termdex and learning new things from the process: I hope to combine all of that into a new setup to publish this website (and bring over my previous slipboxes) within the next few weeks.

I’ve started using it to collect notes, tasks, bookmarks, and anything else that catches my fancy: and to write this week’s letter, I have it open in a split window to see things I explored (trying out remote-pdb, python-manhole, etc.).

In time I’m excited to throw an LLM at this, and also to have it automatically figure out when I wrote the same note / explored the same resource last time. There’s also a lot of functionality to build out to make it easier to create notes with values prefilled: one of the pieces of functionality I really liked in [[FocalBoard]] – that was almost perfectly implemented but still had some issues.

Some of the new tricks I learned this week include:

  split_path(path, remainder, moved) AS (
	SELECT * FROM (SELECT "" as path, path AS remainder, 0 FROM markdown_files WHERE basename LIKE '' LIMIT 2)
	  path || substr(remainder, 1, instr(substr(remainder, 2), '/')) as path,
	  substr(remainder, instr(substr(remainder, 2), '/') + 1) as remainder,
	  moved + 1 as moved
	  instr(substr(remainder, 2), '/') != 0
  SELECT SUBSTR("............", 1, 2*moved) || remainder FROM split_path LIMIT 20


The other interesting project this week was trying to listen for cuda context initialization using LD_PRELOAD and subscribing to CUPTI callbacks. I have a tiny implementation working, but need to see if I can get good cross programming language stack traces.

Stream Dye

While playing with PTYs to get unbuffered output (mainly for improving torchrun I tried implementing a quick program to tag stdout and stderr separately by runnig a subprocess with 2 pty pipes attached). This worked surprisingly well, though I think this interferes with the terminal size estimations – at least when done through basic python.

Having something that explicitly displays stderr / stdout separately is pretty valuable, so I plan to make this a tiny zig utility in the near future.



I ended up doing a small Zig marathon and worked through all the Ziglings. This weekend I hope to put some of that into practice and continue improving the SQLite extension – making it much more generic and composable.


While I continue hacking on termdex, I’ve been coming across additional useful resources: today I stumbled across marksman which looks exactly like the LSP I was planning to build after getting SQLite working.

I’ve regularly started using TermDex at work, for meeting notes, quick queries; all based off the existing Go implementations to create a new file, quickly visualize active files using fzf and query them with the Go extension. I also have a small hack to show related files quickly when I open it in $EDITOR (nvim) by using tail -n +1 on multiple files. (That’s a trick I use to print multiple files with the file names included, unlike cat).

Markdown in Emacs

I spent a little bit of time configuring emacs to write markdown more easily with my wiki setup:


Reading Marksman’s documentation also took me to Emanote: another interesting project that does what I’ve now implemented around 5 times or more; a fairly smoothly generated markdown site representing the documents in a folder. I expect I’ll keep hacking on termdex instead just to make sure it satisfies my needs precisely (including ease of hackability and composition).

Of course, thinking through this took me down another rabbit hole where I’d compare The Unix Philosophy against emacs: and some interesting follow up articles.

Performance Evaluations

It’s midpoint performance review season at Meta again; after more than a decade of performance reviews I generally find myself ignoring them. I have found it valuable to use the cadence of the reviews as a way to introspect outside of the formal evaluation projects and think through and write down how things are going, what I hope to achieve, and reflect on goals.

Windows Surface Pro

I really enjoy the fact that I can open my terminal / editor to full screen and basically have a magically powerful, thin terminal with an extremely long battery life anywhere I go these days. With a Bluetooth keyboard, the surface pro is slowly becoming my favorite machine.

The only shortcoming I’d like to figure out is how to use it easily on my lap when I don’t have a table handy.


A brief hiatus from writing, and some new plans: I ended up skipping a couple of weeks because I’ve been busy hacking on TermDex, and really wanted to rearchitect these notes into a much more useful digital garden. I also have some new hardware that I’m enjoying and exploring.


I ended up building a real extension with Go directly talking to sqlite without any third-party Go libraries; CGo made it possible to define the Go functions fairly simply and directly, and I left the template C-code as is while implementing just the pieces I needed in go.

This was fairly satisfying, but there were some initial tradeoffs I made that I didn’t quite like, so I’ll keep hacking on this: having to define the schema of the virtual table up front, which means I end up scraping all the files on disk up front and generating the table.

Instead I’d like to generate a vtable with JSONB blobs instead: which gives a lot more flexibility and lets me keep the table live without needing to re-create it; and leaves the option open to generate files on demand later. Reading the JSONB spec reminded me a little bit of MessagePack.

Microsoft Surface Pro

For the past several months I’ve been looking for a powerful ChromeBook that can run a linux virtual machine for me and I’ve generally been disappointed. Seeing the new Windows Copilot laptops was the impetus to give WSL2 and Windows a shot – using a Nuphy wireless keyboard gives me a beautiful terminal with an excellent screen and a snappy computer, with the ability to sketch/use Adobe Lightroom when I want.

I’m pretty happy with my decision so far, and Windows has been better than I expected. I’m having a little bit of trouble adjusting to the keyboard shortcuts from Windows, but they do work surprisingly well.


A rabbit hole I fell down recently was trying to understand why torchrun wasn’t printing the logs I expected when the underlying program was segfaulting.

The basic reason was process output buffering: if I let the subprocesses print to stdout, they flush instantly and I see the segfault output, but if it’s tee’d then the output ends up being buffered by default (C stdlib behavior) and we end up losing logs. (There’s also space for much better / simpler implementations of log redirection in TorchRun).

The one way I can think of around this is to use a Pty to spawn subprocesses instead, which fools them into unbuffered output irrespective of the underlying program. The way I got here was looking at Expect, PExpect – and then finding that PExpect depends on PTYProcess. Python itself also has a much more minimal Pty module that isn’t as flexible.

The part I haven’t figured out yet is how to distinguish stderr and stdout coming to the subprocess – with that, I may send out some PRs to torchrun.

Zig, Go

Finally, I’m planning to play with Zig this weekend to experiment and explore: Go has been pretty great, and I’ll still be using it at work, but I’d love to have something a little less verbose with more explicit control over how things run and work.

Someday I hope I can write my own systems-programming lisp with minimal syntax, great compatibility, and ease of use / iteration.

Updating configurations

As I set up the new laptop, I’m also setting up configuration files in dot/2024 as I update each file. Claude has been amazing to achieve this: I had it rewrite my org mode file into a pure elisp .el file to start simplifying and flattening out my configurations.


Spent the week learning about dataloaders, played with Go extensions and generally thinking about problems. I’d write more but I have a lot of work to catch up on.


After publishing the letter last week I fell into a rabbit hole of implementing a sqlite extension that can help understand / organize / parse markdown documents – particularly using Claude and Chat GPT 4-o. Unfortunately the LLMs became very confused very quickly, constantly generating broken code and moving back and forth between the same issues. After a while I gave up and spent time learning to build this from scratch without needing to use LLMs.


CGo is surprisingly nice, and involves writing C code in comments that are simply parsed out and used while generating the extension.

Unix Philosophy

Reading the unix philosophy was extremely satisfying: a lot of the ideas and patterns described in the book have survived 30 years, and resonated deeply with me.

I saved some of my favourites in threads, but reproducing the quotes with some impressions:

the UNIX philosophy is an approach to developing operating systems and software that constantly looks to the future. It assumes that the world is ever changing. We’re not saying that we can predict the future. We can only acknowledge that, even if we know everything about the present, our knowledge is still incomplete.

I wish organizations in general were significantly more open about this, and understood their limitations in predicting the future, optimizing for the ability to move fast instead of perfect planning. This resonates all the way back to Boyd’s description of the OODA loop.

When an application must be written and (a) it must be done to meet a practical need, (b) there aren’t any “experts” around who would know how to write it, and © there is no time to do it “right,” the odds are very good that an outstanding piece of software will be written.

I described this a little bit while writing about developer tools: but the scripts engineers will put together to unblock themselves tend to accomplish 80-90% of what a tooling team will build for them with perhaps .1% of the effort and time involved. Realizing this has changed how I think about building tools, and I would much rather leverage the fact that when my customers are engineers I can truly force multiply them instead of limiting their opportunities.

it is better to let the user shoot himself in the foot than never let him run at all. This can be painful to implement in practice, and I often have to push back on people locking down software for “safety”, but often throwing out the baby with the bathwater. But if the organization is functional then there should be enough trust to let people run with scissors when they need to.

The software that exists in the world today represents a great store of wealth. Even more true today with the sheer amount of open source software powering most of the technology in the world today; open weights/source LLMs just add to the value and having these available is really valuable.

LLMs for the CLI

I’m still surprised at how little I see LLMs applied to CLIs and sys-adminy work at the moment: it’s all text, and the problems feel like they would naturally lend themselves to being evaluated by LLMs. Most recently, I want to play with something that can construct a command for me by reading the help text, man page, and sample commands.

I think I’ve written about this before but I’m also a little surprised at all the products replacing the UI with custom UIs – but I should just be able to ask the LLM to do things for me without needing to navigate yet another UI – I can imagine a huge market for building tiny application specific LLMs that just super-charge the “Help”; instead of learning to navigate the application, ask the LLM to do something for you, and it also shows you how. I could see this being very valuable for everything from Excel to Photoshop.

Then layer on magical capabilities (“edit my photo in the style of Salgado”) or domain specific knowledge.


Back in New York again, at my favourite weekend cafe.

TermDex, Go and Claude

I’ve been slowly iterating on TermDex, particularly to try and get all my ToDos and investigations down on paper in a way I can easily query them. Every time I make progress I think of more things I could be doing with it instead – the trap with all my ToDo applications so far, and get somewhat derailed. Go’s ergonomics don’t quite gel with me yet, but I have to admit just how practical and applicable it is – it’s an easy language to set up and get running with quickly, with a very rich ecosystem.

I’ve also not found myself particularly productive with Go recently – there are some reminders of Java that simply make me uneasy. After thinking about it for a bit, I started using Claude to generate Go code for me and it helped walk past a lot of minutiae I didn’t particularly care about and gives me some encouragement that I can make things work.

Today I’ll try and build out an old idea quickly: make a sqlite extension that can index markdown files based on frontmatter (yaml, toml or json – inspired by Hugo) and then return the results quickly; making it very easy to generate appropriate queries and views for easy visualization.

This can be quickly combined with fzf and other mechanisms for editing. Writing this out made me realize I wanted to spend more time understanding the Unix Philosophy and seeing how well it’s aged over the years – it’s a system of design that leverages the operating system it’s surrounded by and seems to work particularly well for extensible design.

Applying Transformers to knowledge work

Another new experiment I would like to start is to use LLaMa and other models to all my emails, chat, and other daily minutiae so I can stay on top of things more easily: a large part of my day job involves remembering and contextualizing what’s going on, and then intervening where I can best help out: the amount of context to maintain day to day is getting a little bit overwhelming, and interferes when I need to go deep into a specific problem.

If I can figure out ways to extract all my data into easily indexed text, and then apply a reasonably dumb model to query and aggregate it for context I suspect I’ll be significantly more effective; over time if tools like this were built into work communications things like status reports and updates should become trivially cheap to accomplish without needing a lot of coordination or busy work to accomplish (if applied at larger scales).


Inspite of being so close to LLMs and helping out with building them, I’m nowhere close to internalizing all that’s possible with their applications yet: realized this yet again while working on TermDex with Claude; I can afford to be significantly more ambitious as these tools open up and work through a lot of projects that seemed to big to tackle before this. A small part of me is scared at the quality of the outputs – as the projects get too big for me to reason about personally, at some point I’ll have to trust the AI to do things right; there are some uncomfortable parallels to managing other people that start developing as we extend these.

The downstream economic and technological consequences are pretty hard to predict; with just the technology that exists today there’s a clear sweet spot of productivity possible that I don’t see being applied particularly well at the moment which is surprising. Perhaps the unix philosophy lends itself particularly well to this because we can pull out small pieces of the problem that the AI can then build for us – good design generalizes really well into the future.

Thinking about the letter

While I’ve been having fun writing this letter every week, having to self-censor based on what I can talk about publicly and what I can’t has been getting annoying. The practice of reflecting on everything I’ve learned recently is clearly extremely valuable, but I’d like to revisit the mechanics a little bit to make this more useful and thorough. I’m hoping I can use TermDex to achieve this more easily.


An extra unexpected week in San Francisco. I accidentally walked past the tail end of Bay-to-Breakers on the way to getting coffee (right before writing this week’s letter) and was thoroughly confused and amused. Seeing several costumed people running past made up for missing out on the Dance Parade in NYC on Saturday.


I’ve been using this a lot, and see a lot of potential, but haven’t quite made up the time/energy to actually implement features I’m really looking forward to. Having extremely simple query functionality implemented using a mix of bash scripts and fzf has taken me surprisingly far. As much as I appreciate POSIX, I guess I never really internalized the extremely minimal api it exposed for different programs to connect – and I still find myself surprised just how much is accomplished through that minimal api.

I’m also working to configure nvim to be a good markdown editor and building a surprisingly pleasant/effective editing experience where I can navigate through text/ideas/notes quickly. THis finally feels like a flexible enough alternative that lets me keep flat files, easily swap between tools and still get all the benefits of Luhmann’s methods & Notion & Index Cards and all the other tools I’ve used to try and keep my head in order.

With LLMs being able to easily consume text, fancier CLIs where things just work should have been much more common than they seem to be today, and is something I expect to start noodling around with soon.

Hierarchies, nested notes

Something I haven’t figured out the ergonomics for is having a hierarchy of notes easily: the best cheap alternative I tend to have is to use a custom sheet where I mix in indentation by basically converting the first several columns into thinner indents. So I can show a nested hierarchy by simply starting from a different column and rely on how the UI simply overflows cell text to get nesting in a way that I can easily move rows around.

I’d like to actually build this out as a UI for easy modification and managing relationships between notes.

Information Theory / Chainsaw / TF-IDF

Feeling a little bit lost while playing with TF-IDF and realizing that I was getting extremely broken results because of bugs in implementation but still something that seemed valuable, I wanted to start levelling up in Math a little bit. Based on an answer from ChatGPT-4o I’ve started reading Elements of Information Theory and generally enjoying the book.

Even revisiting the minimal definitions of entropy (sum of -p . log p) and mutual information between distributions (sum of -p . log p / log q ) I think I can take another stab at finding outlier logs by looking for logfiles that have the most different distribution from the distribution over the norm. I don’t need real outliers, I just need the ones with the most distance.

Tokenization is of course critical: I’ve worked around it for now by simply normalizing a lot of strings: smashing numbers down to a single 0, smashing punctution into a single _, etc. THis obviously reduces a lot of nuance in the logs but does let me find logs with different stack traces much faster. I think I’ll take the approach of mutual information to find logs that I should look at, and then look at cosine similarity & clustering (something else I spent time learning) to figure out batches of logs and hosts to work with.


Writing go continues to be pleasant: I definitely miss the rich python ecosystem (and have now started wondering if I could just reuse Python libraries in Go) – there are so many good libraries apparently made for data mining.


As a reminder: none of the notes here represent those of my employer, nor do they include confidential information (not that I expect anyone to ever read this site either). With that out of the way, I’ve tried to think of what happens if I extrapolate into the future while thinking through what transformers can accomplish without needing them to make tremendous jump in capability (sitting in a cafe in SF seems like the right time to explore these ideas):

Most of these seem feasible with current technology with a lot of engineering applied to make things cheaper to deploy, faster, and better integrated – I’d expect to see a lot of these kinds of applications to pop up within a decade if not sooner (a decade seems extremely conservative if I’m honest).


Partially in California this week, and happened to be in town for the LLaMa hackathon. I’m very interested in seeing what people make, and writing this from the event.

LLaMa3 Hackathon

There were several interesting projects, and it was fun to see so many people use something I’d helped out with. The project that fascinated me most was one that basically undid LLaMa’s fine tuning by adjusting weights: I have to wonder what else we can achieve with their mechanism, and what the actual process was (waiting to see the repository).

At the same time, there wasn’t anything that did something completely out of left field: a lot of practical application of LLMs today beyond an omniscient and occasionally loopy chat bot seem to be a bit farther from the maturity level I would hope for. That doesn’t mean we can’t use them, we’ll need to be significantly more creative in how and where we apply them.

The finetuned LLM I’d love to build is a CLI helper: everything is text anyways and it has amazing amounts of context available; I’d like it to quickly complete my commands or let me simply ask for things and to translate them into explicit machine instructions.



Started playing with implementing TF-IDF to identify outlier logs in HPC jobs. The idea seems so obvious I’m sure someone has to have implemented it, but perhaps I’m just missing something obvious. Some ideas seem obvious enough I can’t imagine it hasn’t been done yet.

Asking Claude for recommendations on papers and approaches generally lead to hallucination, but I did find that one of the papers was real (just from a different year and with different authors): RNN Attention Mechanisms for System Log Anomaly Detection that also has a healthy number of downstream references, including literature surveys.

Go is being surprisingly pleasant to work with, though I still don’t quite know how to write idiomatic go. The performance is a boon after getting used to Python.


I’ve also been making progress on my index card application, leaning into FZF and bash scripting to fill in gaps that I’ll actually implement with Go later. It’s interesting just how much flexibility and speed is available using Bash scripting: and at some point, the complexity just goes up to make the script unmaintainable. Bash’s inherent global state matches my intuitions around debuggability (that I’ve tweeted about in the past), but I still don’t quite know how to demonstrate this feeling better.

Another idea that’s started wondering through my head is the implementation of a modern shell scripting language: something POSIXy but not quite; perhaps an extension of SkyLark that is just as good at handling stderr/stdout/nesting commands but has significantly cleaner language semantics and more modern lexical scoping and language constructs? I’m surprised this doesn’t seem to exist yet.

Python’s Memory View

Through bitter experience I also learned about the cost of constantly appending to a bytestring in Python when I profiled my code and realized it was only allocating memory (Py-Spy is a gift!). Which lead me to a rabbit hole to avoid copying bytes on slicing, and for easier manipulation: I ended up testing out ByteArrays and memoryviews which are useful tools to have: they helped me turn a 30 minute long script to something that ran in seconds.


Tempus Fugit.

I don’t really remember where April went; I still think it was March just yesterday.

Multiprocessing Queues

Spent most of Saturday debugging with several team-mates to realize that Multiprocessing queues will create their own private thread that copies values into a buffer, pickles them and sends them over a pipe to the other process. The part that can bite you is if you mutate the put object before it gets pickled.

I wrote up a small Thread as a teaser and also put together a demo gist to show the issue. The one reply I did get on Threads misattributed the problem to parallelism with the subprocess getting a chance to run: to make it more explicit I’ve updated the gist to only run the subprocess after the first one is finished. The race is between the main thread in the parent process sending values and mutating them against the Queue’s inner thread.

Another coworker asked why my inital repro’s only had 1 or 11 in the values (ie no mutations or all mutations): I can only attribute this to the points at which Python lets threads interleave I guess; adding a sleep(0) to the mutation lets me see a wider range of scenarios.

Working through this also reminded me of the value of doing debugging by understanding instead of debugging by hit-and-trial; even if the cost of understanding seems significantly higher over time debugging by hit-and-trial ends up getting nowhere.

A refreshed .tmuxrc

I started refreshing my dotfiles with my tmux configuration; I’m doing them piecewise (because the software to edit these configurations is also affected by them) and making sure they work well for me. There are a couple of new things I’m applying:

Autocomplete anything on screen

Another thread I posted earlier in the week involved a new ZSH, Tmux + FZF trick I finally managed to put together (again, leaning on Claude to parse man pages for me). I put out a thread and gist about it and recorded a video for co-workers, but an annotated version of the script:

# Grabs the contents of all the panes in the current window for easy processing.
function _print_all_panes() {
  # List all visible panes, changing the output format to only show the pane-id (used in the next set of commands)
  for pane_id in $(tmux list-panes -F '#{pane_id}'); do
    # tmux capture-pane: starting from the first visible line (`-S 0`) to the end (`-E`). `-t` identifies which
    # pane to capture, while `-p` redirects output to stdout and `-J` makes sure wrapped lines show up as joined.
    # This is piped to `tr` to replace spaces with new lines -- giving me one word per line. The sort & grep get
    # rid of pure collections of symbols, only giving me words and numbers to complete on.
    # TODO: Explore additional tokenization strategies, to allow breaking up paths/into/components.
    # TODO: Remove duplicated output across panes
    tmux capture-pane -p -J -S 0 -E - -t "$pane_id" | tr ' ' '\n' | sort -u | rg '[a-zA-Z0-9]+'

# The actual auto-complete function
_tmux_pane_words() {
  # `LBUFFER`, `RBUFFER` and `CURSOR` are magical environment variables from `zle` with the contents of the entered text
  # left and right of the cursor, with cursor marking the actual position.
  # Grab any half completed word in the LBUFFER (removing a greedy match that ends with a space)
  local current_word="${LBUFFER##* }"
  # Get rid of the half completed word in the rbuffer if any, greedly removing non space characters
  # I had to spend non trivial amounts of time reading zsh regex matching to get the behavior I expected.
  local new_rbuffer="${RBUFFER/#[^ ]##/}"
  # Build the prompt for fzf, using the ␣ as a way to mark insertion point for the completion
  local prompt="${LBUFFER% *} ␣ $new_rbuffer "

  # Tokenize and print the pane contents and generate an fzf window with the half-completed word from the LBUFFER as the content
  # `--layout=reverse` because I don't like needing my eyes to jump to the new cursor position when fzf pops up
  # `--no-sort` because we already did it, with the caveat of needing to de-dupe across panes
  # `--print-query` for the case when we can't find a good match; this prints the query first and any selections after
  # If the user doesn't select anything, rely on the fact that the query was filled in to choose the completion; that's why the `tail -n1`
  local selected_word=$(_print_all_panes | fzf --query="$current_word" --prompt="$prompt" --height=20 --layout=reverse --no-sort --print-query | tail -n1)

  # Build the new lbuffer with the completion; doing the opposite of the original aline
  local new_lbuffer="${LBUFFER% *} $selected_word"
  # Reposition the cursor to the end of the completion

  # Ask the zsh line editor to redraw the line with the new contents` 
  zle redisplay

# Register the completion mechanism, I went with `Ctrl-U`.`
zle -N _tmux_pane_words
bindkey '^U' _tmux_pane_words

Stanford Lectures

This week’s lecture was a little more abstract but had some interesting ramifications and applications for being able to build small and focused LLMs. The main paper. There’s also emphasis on the importance of finding the right starting values.

More go hacking

I’ve started working on building some CLI programs with Go (yet another time/notes/calendar/notion-equivalent) management app; but with the excellent TCell library and surprisingly powerful terminals available these days I’m much more bullish about good CLIs. The big hidden bonus is that I can shell out to Vim or Emacs for actually editing notes, while leaving the actual management to the app itself, which is an excellent bonus and partially inspired by how easy FZF makes it. I’ve picked up The Power of Go Tools to help me write idiomatic Go with the right approaches faster.

FZF is also the reason why I’ve been so impressed with go recently: I’ve begun to realize that languages end up making programs have a certain taste for the lack of a better word; some characteristics stand out: Python programs have very distinctive CLIs, slightly noticeable sluggishness; Javascript tends to be a bit faster and the CLIs tend to be very colorful. Rust is colorful, but generally characterized by being very fast. The most used CLIs tend to be C or similar languages. And finally some Go programs tend to be surprisingly useful: fzf, gotty, etc. Of course it’s not perfect (until a few seconds ago I thought jq was also written in Go).

The prevalence of closures and function objects in Go has been the most surprising (and pleasant) departure from my previous assumptions about Go so far; they make programming significantly more ergonomic – though there are also some factory patterns I don’t think I’m going to enjoy (such as using a function to manipulate structures to set default arguments).

Anyways, I’m calling this new project termdex for terminal index cards. More updates next week!


This was a long with a lot of overlapping oncalls; I’ll be glad to take a break sometime next week.

At the same time, I was able to learn some new things.

Stanford Lecture

The lecture on MoE this week was fascinating, well delivered and cleared up a bunch of misconceptions I had about what MoE meant and how they functioned. Some of the talks’ slides have been uploaded at the CS25 website, but there are still several to go.

Things I remember


Talked to an old friend after a very long time: he’s clearly been doing much more advanced work than I have, and pointed me to several interesting ideas to explore

I have a lot of math and infrastructure to learn. I’m thinking of playing with a simple transformer and seeing if I can get it to encode/decode some patterns like look & say, and if I can use that to build some intuition about QKV. It should be an interesting exercise.


I finally wrote a small program in Go, and so far have been finding the language surprisingly ergonomic and friendly; particularly with Go routines. I’m planning to build several log parsing tools with Go (and possibly TCell).

I’ll need to find a good modern book on Go before I shoot myself in the foot with assumptions about the behavior of the language though.


I also spent a lot of time learning zookeeper semantics: the original paper was excellent and finally made things click. I could solve the problems I wanted by simply relying on watches, which Kazoo makes even easier.

Djikstra’s notes

Partially read notes that floated by on Hacker News, this entry is a reminder to go back and read the rest.



I’ve been helping out with infrastructure and tools for training LLaMa3 at Meta: I’m very happy to be able to help because I think having something of LLaMa’s quality easily available for hacking is one of the things that will shape how LLMs are generally applied and used, and contributing to that is very satisfying. I’m even in the model card, along with some very well known people.

At the same time, I could use ellama, ollama and LLaMa 8b to have my very own local LLM – which has been fairly helpful. Ellama’s prompts around reviewing highlighted code, implementing functions, etc. is exactly what I’d dreamt of a long time ago and hadn’t expected to be true so soon. The UX is still a bit rough and generating tokens on my laptop CPU is slow, but I expect that to constantly, inexorably improve the way things have been going.

I’m not thinking about finetuning / distilling a LLaMa model down to something that can translate CLI commands on my behalf; eg. “extract this tarfile”. I think it should be very doable – and maybe a good excuse to learn torchtune – but I need more time and energy.

Python & Emacs

As part of consolidating my .emacs I’ve been cleaning up my Python setup as well. I rebuilt and moved to the latest on Emacs’s master branch – the fact that I can smoothly run on Emacs master always amazes me – and set up Jedi, Ruff (using LSPs), while relying on some existing snippets for and Ellama integration.

All of this means I get some very cool auto completion, almost instant error & syntax checking and warnings with minimal setup or dependency on the repository I’m editing.

I still have some trouble with both Auto-complete mode and Company mode turning on and both trying to complete what I’m typing; I’ll dig in some more and start publishing my configurations:


JAX released some very interesting tools, including a visualization tool that is almost exactly like what I was hoping to see with PyTorch. This also makes it much easier to explain – though I think i’d probably go with a little bit more whitespace in the UI if I was designing it – and seems pretty powerful.

I need to find the time to hack on this and actually make an interactive UI or CLI around it. And a top or below like interface to TensorBoard.

Wax, and languages

Continuing the theme of looking for lisp-like homoiconic languages that compiled down to C, I ran into some reddit posts and links – particularly this list of lisp-like languages. There are several interesting ideas in there, but some day I’d like to implement my own, potentially working backwards from the C grammar to make sure everything can easily and cleanly be expressed, and then layering on language sugar on top of that.

As mechanisms for procrastination go, inventing the language to program in before actually getting around to programming seems unfortunately too far up my alley. I’ll save this particular project for some slow months.

Stanford Lectures on Transformers

More rough notes from the lectures.

Nathan Lambert, Allen Institute for AI



A somewhat busy week.

Cleaning up dotfiles

I’m finally getting started cleaning up my dotfiles: there are a couple of decisions I’ve made and will be implementing as I go:

Zig, Rust, systems programming & Lisps

Refreshing my Rust knowledge by skimming Rust in Action: I actually seem to remember a lot, but I’ll find out when I actually build the thing I’m reading this book for.

I also started playing with Zig: what I really want is a small language with minimal syntax that is still very expressive. Zig seems nice and compact and enjoyable, so I’ll spend some time hacking with it.

I’m not sure why all (most?) lisps I’ve seen follow the Scheme / Common Lisp standards: the homoiconic syntax is decoupled from the other semantics. A small part of me wants to implement my own variant of C (at the very least, compatible with C) that has the syntax of a lisp. Probably emit LLVM IR and have hygienic macros. Then attach batteries by making it trivial to interop with C and Python and other languages.

I frequently find myself hitting the wiki and reading through different discussions. Both the style, speed of loading, minimalism, and depth of the discussion make this particularly enjoyable.

Stanford Transformers Lecture

(Very rough notes from stanford lectures)

Why do LLMS work so well?

Future of AI


All over the place this week.

FZF for Rapid Application Development

A colleague (JL) built a very useful CLI tool that relied on stacking fzf invocations: I’m slowly realizing just how powerful fzf can be for building CL I applications quickly and painlessly; particularly for CLI applications that can compose quickly.

This blog shows off a little bit of what’s possible with fzf. The ability to build a UI simply specifying commands that can be run is fairly amazing.

I’m also wondering if it’s possible to use fzf to build a command creator, particulalry for dense launch commands like torchx and srun (from slurm). fzf can show contextual autocomplete derived from the location in the command, and that’s something that could potentially be generalized by reading the --help and man outputs of different commands.

This may also be an interesting application of LLMs, to convert the --help output (or man page) into something that can be used for terminal autocomplete easily.

I only wish that zsh autocomplete was a little bit easier to hook into; I almost find myself wishing for a shell LSP that was easier to hook into.

NUMA: non uniform memory access

Ran into something fairly confusing: there seemed to be a lot of memory available but the host started allocating swap instead. A different helpful & knowledgeable colleague talked through how the kernel chooses which memory pages to reclaim.

The part I had been completely new to was NUMA: sometimes servers can have enough memory that depending on the core, some memory may be closer or farther. Intel’s docs talk about this some. This was one of the things that nerd sniped me this week.

Making graphs from PyTorch

ezyang prototyped a debugger for PyTorch this week using torch dispatch to intercept ops, and then use python’s ridiculous frame introspection to map it to actual code execution.

What I’m really looking forward to is to be able outline the tree of modules in PyTorch (as nested objects), map them to the actual operations and Cuda kernels. After spending some time exploring torch dispatch and even connecting it to my unpublished implementation of Intermediate Logging I’m now exploring Kineto Traces and meta tensors to see what’s possible. I could potentially use torch dispatch to track how tensor values depend on each other, and have different variations of the execution graph much more visible.

Overlaying network activity on top of that, along with sizes / bandwidths used / memory and flops consumed and I probably have a replacement for trying to write an article on Mechanical Sympathy: because the model will just be observable.

Chainsaw: Minhash and friends

The other idea that’s been stuck in my head for far too long is that in most HPC you have a lot of identical things going on on several machines (and sometimes several times within the same machine) at the same time. When things go wrong, it’s generally one machine/rank that’s misbehaving: and finding that one tends to become tricky quickly.

Identifying the outlier from logs and other sources is the thing I’ve become pretty interested in after seeing it applied several times consistently – after looking for several algorithms I finally came across minhash. I need to actually test this on real logs to see if there’s any promise in this approach: datasketch looks very promising to at least prototype quickly.

Unfortunately, this also introduces another book into my overflowing queue: MMDS or Mining of Massive Datasets by Jeff Ullman et al.

Transformers Lectures (CS25)

Learned from Threads: Luokai that Stanford’s CS25 course will be publicly available, streamed on Thursdays from 4.30 - 5.50pm PDT.

I’ve blocked off the time on my calendar, and hope to watch all of these courses.

Enjoying the work

I went for a concert I’ve always hoped to: Satch/Vai at the Beacon Theater. Joe ended the show by talking about how he and Steve had decided they wanted to play the electric guitar for as long as they could: and then stuck to the plan.

I enjoy programming, and hope to keep going as long as I can too.


Spent the week reading more about LLMs, different types of model parallelism (again). I re-read, and forget what the different types of model parallelism were; I suspect I’ll only be able to reason about these properly once I’ve manually implemented a model myself.


Anyways, HuggingFace has some excellent documentation. Writing out what I understand by hand again just to try and remember it a little bit longer:

Of course, I need to spend some time to figure out actual values for fast, model size, and how much each type of operation costs.

Model Visualization

Wished into the ether for easier model visualization again.

There are several attempts at this: most of which render with graphviz and are not particularly interactive or useful. Sometimes I wonder if I should hire someone with excellent Javascript / 3.js / canvas skills and just get something built.


Set up tailscale on my personal laptops recently: after fixing up my personal laptop to stay connected after connecting to the internet I’m updating this week’s letter remotely.

The SSH web client is pretty amazing: I’m finding myself able to use all the keyboard shortcuts I could have hoped to without chrome intercepting them; if I have any complaints it’s only that my color scheme seems a bit messed up.

Given easy access to my personal laptop through a web cli (and potentially opening up more services through it) I’ll end up spending a lot more time using/building CLI based applications.


A long, somewhat jet-lagged week; I did wrap up some loose ends happily.

Tidy First?

Tidy First? is a short-yet-deep book by Kent on choosing the right time to refactor software. I’ve found Kent’s books extremely valuable for building taste, and thinking through second- and third- order consequences of decisions.

The most fascinating part of the book was the discussion on the value of optionality in terms of thinking about the tradeoffs for cleaning up now vs later, and of shipping features now or later. I use a variant of this argument to encourage developer experience engineers to ship now instead of later; the knock on effects of force-multiplying so many others almost always makes the tradeoff valuable.

The bit that was surprising – and yet, resonates perfectly – was Constantine’s Equivalence, cost(software) ~= cost(change) ~= coupling over the lifetime of the project. The cost to get started is almost negligible in the grand scheme of things; and the cost of change is influenced by the design: design for flexibility and the projects hums along smoothly, adapting to the world.

Sometimes I have invested in extremely strongly coupled software with the intention of keeping it short lived (prototypes, particularly) – which does satisfy the math modeled here as well.

I re-started reading Kill it With Fire and started How to Make Things Faster, both of which have a host of strategies to drive valuable technical change in a business environment.

Brendan Gregg’s Linux Crisis Tools

The list of tools includes several I haven’t yet used, and should plan to spend some time with so I can use them in anger when I need to.

Reading the HN discussion pointed to a discussion on strace (which I love to use) and the cost of running strace (something I hadn’t internalized) – discussed here. There are several important details on strace in the article, not least of which include:

On the bottom of San Francisco bay are several thousand unused straces, which were intended for Y2K issues that never arose, and so were scuttled.


I’ve been trying to find the write way to set up for a writing project related to Mechanical Sympathy; after playing with different tools I found Zettlr to be remarkably pleasant for the project itself.

Highlights included a minimal interface, Zettelkasten style links and VIM bindings, making it the perfect editor. The interface was also fairly snappy, and the Bordeaux is beautifully elegant.


I had a week long vacation, and spent time exploring. These letters were originally published separately, but I ultimately decided to club them together into a single page just to maintain consistency.


I spent a large part of today debugging my laptop setup, and learning more about Flatpak and xdg-desktop-portal than I would have liked to. The short of it was that I couldn’t get file open dialogs to work in Chrome – and I couldn’t get configuration based fallbacks to work correctly by updating .config/xdg-desktop-portal/portals.conf. The solution ended up being directly modifying the configurations at /usr/share/xdg-desktop-portal/portals/gtk.portal and adding sway to it.

Working through this always makes me wonder if it’s worth the time to use a Linux laptop, but in the end I’d rather deepen my knowledge of Linux instead of wrestling MacOS or Windows. I wouldn’t mind more powerful ChromeOS laptops though.

[Edit: 2024-03-24] This has still been plaguing me; there seems to be some bug that doesn’t manifest immediately after a restart, but probably after putting the laptop to sleep and restarting.

Revisiting Cuda

Spending some time reading about Cuda and multiprocessing today; I inevitably forget what an SM or a Warp is, just because I don’t get enough of a chance to use them daily. So, some definitions:

Compute Optimal Language Models

Notes from DeepMind’s Paper:

Visualizing a model

I really want be able to easily look at a full model’s definition without needing to read and hand annotate code: the modules, dimensions passed in and handled, etc. Intermediate Logging got close to it with the way it worked, but I still want to play with more sophisticated visualizations.

Linker Visibility

I stumbled across this detailed answer on Stack Overflow. T

I’m exploring this because I’m curious if I can combine native python extensions that use different versions of the same library in the same process.


ChatGPT pointed me to man dlopen to read more about library linking.

namespaces: within a namespace, dependent shared objects are implicitly loaded. These allow more flexibility than RTLD_LOCAL, with up to 16 namespaces allowed.



Open AI’s Transformer Debugger

First pass at playing with the transformer debugger / reading through the repo

Book: Linkers & Loaders

Reading through a description of file paging and different formats for shared libraries and binaries, including a.out, elf, etc. I’m not quite sure how to make the most of this book – at some point I’ll probably want to try and implement my own linker for a single format/architecture as an exercise.

Book: The Coming Wave

Received the book as a gift today, and I started skimming through the book (and still need to re-read it slowly). The most interesting chapters came towards the end and recommends a very careful path through to the future balancing several different approaches to control the effects of AI.

I’m not quite sure where I stand with the book, but I’m looking forward to going through it again to see how AI is expected to affect the future; all the changes so far have been good but not that large.

Book: Stripe’s Letter

Stripe has always been an interesting company, and they talked a little bit about reliability.


Playing with Cuda & NCCL on PI Day; I’m trying out a programming experiment to estimate Pi using GPUs. The only way I knew of to estimate PI was to use random points to estimate the ratio of points that fall outside / inside the circle – and as ChatGPT reminded me, that’s extremely parallelizable. As a trivial first attempt:

#include <cuda_runtime.h>

#include <math.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#define CHECK(call) do { \
    cudaError_t error = call; \
    if (error != cudaSuccess) { \
        fprintf(stderr, "CUDA Error at %s:%d - %s\n", __FILE__, __LINE__, cudaGetErrorString(error)); \
        exit(EXIT_FAILURE); \
    } \
} while(0)

void count(int N, bool *out) {
  int i = threadIdx.x + blockDim.x * blockIdx.x;
  int j = threadIdx.y + blockDim.y * blockIdx.y;
  /* Edit: this is almost certainly incorrect */
  int p = i * blockDim.x * gridDim.x + j;

  float x = (i + .5) / N;
  float y = (j + .5) / N;
  out[p] = (x * x + y * y) <= 1;

int main(void) {

  int devCount;

  cudaDeviceProp props;
  for (unsigned int i = 0; i < devCount; i++) {
    cudaGetDeviceProperties(&props, i);
    printf("Device %d | Max threads: %d", i, props.maxThreadsPerBlock);

  int N = 64; // Size of grid
  bool *out;
  cudaMallocManaged(&out, N * N * sizeof(bool));
  count<<<dim3(2, 2), dim3(N/2, N/2)>>>(N, out);

  unsigned int count = 0;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      if (out[i * N + j]) {
        // printf(".");
    // printf("\n");

  printf("π = %f\n", 4 * (float)count / (N * N));

Since writing the program, I’ve been experimenting with moving the reduction to another kernel, and benchmarking it aggressively.


For some reason this is hard to reach, but ncu is nvidia’s nsights compute CLI. So far I’ve directly used it with ncu <binary> -o profile.


Building large models

A little guide to building large language models is a treasure chest of useful links and data. - Mamba the hard way


I’m taking a week off from work, and planning to use that time to read interesting papers, dig into the LLaMa models, and catch up on learning and exploring things I generally don’t get time to. I’ll keep a daily entry for summarizing the day’s explorations as I go.

If things go well, by the end of the week I’ll have played a little bit with cuda, inference, understanding some model dimensions, fine tuning, etc.

Emacs, Tramp & SSH

Emacs surpassed my expectations yet again by supporting ssh’ing with multiple hops transparently. The trick to setting this up is to use a file path like sshx:dev1|sshx:dev2:~/ and it Just Works. I could even use a shell over this smoothly.

For using Tramp comfortably (as it spawns multiple sessions) I find it extremely valuable to use ControlMaster to share SSH connections and skip authenticating repeatedly. The .ssh/config additions to enable this are:

  ControlMaster auto
  ControlPersist yes
  ControlPath /home/kunalb/.ssh/multiplex/%C

A quick Google search and reading a couple of articles shows this one from CyberCiti which covers the bits I use, and several bits I don’t.

Abstraction vs Illusion

From a video that floated across my YouTube recommendations: Abstractions remove/generalize details to focus attention on important details instead. Illusions accidentally remove important details… confusing end users. This is clearly a goldilocks zone, and the decision of important is a matter of taste and experience.

The speaker also calls out the risk of the uncanny valley where an abstraction is almost like another platform you’re used to – it becomes much harder to use, because you’re not sure which bits are missing.


Spent a little bit of time exploring linking and loading, and taking a break this week.

How to Write Shared Libraries – Ulrich Drepper

I’ve been spending time reading Linkers and Loaders, but decided to take a detour and ended up reading a small book by Ulrich Drepper that went into a lot of detail on how shared objects are located and linked, alongwith the differences between RUNPATH and RPATH.


Becoming a bit more intentional about spending time learning again.

GPT Tokenizer

Of course I had to watch Andrej’s latest video.


Binged on a talk by Jeff Dean.

Things that stood out:


A mixed week as I adjust to working remotely with a large time zone difference.

Mechanical Sympathy

I’ve been reading – and trying to also compile and run at the same time – Understanding Software Dynamics. While I could get sample code running for playing with the cost of CPU utilization and arithmetic operators, I haven’t been able to just copy paste and run code memory or disk utilization. The book is both fascinating and a little bit overwhelming given the sheer amount of complexity that can affect actual runtimes of a program – and doesn’t actually include GPUs.

My plan so far had been to carefully work through and play with each component of the book, potentially writing some reusable benchmarking scripts that I could compile and use everywhere. At this point, I’m planning to just make it through the book while taking notes, and then following up with some code to make sure I have some understanding – and then extending the same principles to GPUs.

Reading through the chapters on memory and disk access, there are so many potential sources of noise and simplify artifacts of the shape of memory that I’m surprised the author still manages to construct experiments on it and reason about it; trying to isolate memory patterns and figuring out if the results match the actual hardware bandwidth is fascinating. It’s basically about constructing and validating experiments.

The trick used to measure disk performance is to read/write a large block of memory: for reading, check when disk block offsets in the memory get updated (indicating that block was read in) or for writing constantly updating the block being written so the time of writing can be timestamped. This only works because of order of magnitude time differences in updating memory vs disk, but is very clever. I’m also not quite sure how I’d actually implement it, and will continue reading the book / plaiyng to find out more.

Portable Conda Environments

The other long standing bugbear I’ve been thinking of is how to have easily movable conda environments; PyTorch does rpath patches to have a relative path to the conda environment’s library. Conda pack works well, but it needs the library to be mutated, and once unpacked libraries can’t be moved again because the update is somewhat destructive.

I’ve been wondering if I can update conda-pack to do non-destructive updates – allowing the same environment to be moved repeatedly without thought, and then also having it fix itself every time it’s activated through the activate scripts.

Looking through the issues on conda-pack I also stumbled across constructor which seems to be a slightly fancier conda env to declaratively create and install conda environments. This is still not as flexible as I would like because some packages like NVidia’s Apex must be installed from source and cannot be installed from PyPI – installing from PyPI actually drops in some other random package.

Exploring Observable

I’ve been meaning to write about a complex system with rich visualizations for several months now, and observable framework seems like the perfect tool. After spending some time exploring, I’ve generally come to appreciate the default design choices and generally smooth experience.

Funnily enough, I’ve been generating plotly graphs from relatively expensive data sources and caching them recently – so observable’s dataloader approach that statically generates the data once and then loads it resonates particularly well. I need to check if it’s smart enough to be able to only partially load data from parquet files, because that is still one of the shortcomings of this approach – if you have too much data to load up front, things will become extremely slow and blow up on the client anyways.


Spent a lot of time traveling this week, and didn’t get around to exploring as much as I would have liked.


While on a flight, the person sitting next to me worked in hardware: something I know very little about. She worked on Silicon chip design, and her husband was an engineer at Lockheed Martin working on helicopters, and briefly described the setup on how different companies build chips – the supply chain was significantly more convoluted than I’d expected, and significantly more concentrated.

I asked for a 101 series to build a basic understanding of the hardware manufacturing process, and was pointed to Chris Mack’s YouTube Channel, which seems fascinating. I’m starting with his career retrospective.

Whatever path you find yourself on, find something to be passionate about.


Unfortunately, I’ve been struggling with PyTorch’s CMake scripts somewhat over the past few weeks. CMake seems reasonable, but I still don’t have much of an intuition for it. I tried to see if any books I used covered it, but didn’t turn up much.

I have a lot of study / exploration to do stacked up to catch up on going forward. There’s even a BlueSky paper I’m curious about.


Trying out something new this week; today’s letter is written in VSCode web and I have a GitHub Action to publish. Things mostly seem to work; to mimic auto-fill-mode in emacs I’m using gq from Vim emulation in VSCode.

Revisiting Building Developer Tools

This is something I’ve been thinking about for some time: I wrote Building Developer Tools a lifetime ago, and have since updated some of my beliefs and approaches; most of my opinions are unchanged – some I’m doubling down on – and some I’ve added since.

Better resources

Doubling down on


Mechanical Sympathy

While I haven’t been able to make as much progress as I would have liked, I wanted to explicitly list out all the books, libraries to explore, test programs to write, and resources to use. I’ll tentatively keep coming back and updating this specific entry as I find new resources, but to just have a sense of what I’d need a little more formally than last week’s entry.

Books to work through:

Libraries to use

Missing topics to add: something aroud better understanding of network limitations, bandwidth, communication mechanisms and HPC. General understanding of data centers, disk, file systems.

And of course, getting better at Transformer architectures and the models – including the linear algebra. I still can’t really visualize K/Q/V etc. straight in my head, and must generally work things out slowly and manually.


A busy week with a lot going on and some great memories.

Named Pipes

I spent a lot of time looking at named pipes; I really wanted to be able to make a new API where a folder full of named pipes would act a as a way to write to different paths.

Unfortunately, I couldn’t find a good way to allow safely allowing multiple processes to write to these paths: as soon as they cross 1mb, things can unintentionally interleave.


Plotly has been surprisingly flexible: at this point I’ve used it to generate javascript, html, and most recently even JSON. I’m very impressed by the flexibility it offers.

Mechanical Sympathy

Understanding Software Dynamics

Following another old desire I’m picking up and working with C again, just to have more systems programming under my belt. I spent some time working through the chapter on estimating CPU utilization and fell into all the traps the book mentioned.

I’m tempted to write a general purpose benchmarking script that generates html and run it on every device I can get my hands on.

A shell based notebook

Another idea I’ve been thinking about: interpreters, bash shells, consoles in general have very explicit and carefully crafted input/output semantics, so it should be possible to make a Notebook interface on top of them with a very generic implementation: ideally I’d like something extremely lightweight built with HTMX with minimal assumptions on the contents of the shell.

I think this could work very well, particularly as a replacement for script and even for simpler notebooks. The fancier shell escape codes could allow for rendering images, with potentially some extensions to do so more naturally than pure shells.

Instead of implementing anything concrete, I spent most of my time thinking of a good name for this project instead: for now I’m calling it TextBook because I couldn’t think of anything better.


As I was writing this a really interesting Tweet floated past and now I’ll have to spend some time reading about the new model, and the architecture. The demos were pretty excellent, and if it’s as lightweight as claimed I should be able to easily run it locally.

LLMs are a fascinating space.


Like the set on HyGPT, starting a series of notes on software dynamics, queueing and hardware behavior that I hope to learn this year: I’ll be categorizing these as Mechanical Sympathy.

Mechanical Sympathy

Kicking off with notes from Understanding Software Dynamics by Richard Sites.

Understanding Software Dynamics

The book starts off with some classic advice: particularly around making Fermi estimates and understanding hardware, linking to Jeff Dean’s talk on numbers everyone must know.

The second chapter talks through estimating cpu utilization – in a way that can be reasoned about. There are so many potential ways that the compiler, pipelining on the CPU, or other systems can interfere – it’s extremely valuable to be very empirical, and to check all assumptions. The best recommended way to benchmark is to run the benchmark with multiple iterations over the same piece of code (after confirming that the compiler isn’t eliding all the synthetic work): once with n, and once with 2n. Then the actual cost of n iterations can be determined by subtracting them.

The best mechanism to deal with variations in runtime across runs is to choose the minimum to minimize noise – this may not be immediately intuitive, but is something I’d also learned from watching Android Benchmarking talks. The minimum time is almost certainly the one where nothing else interfered while benchmarking, and should truly represent the workload.

Reading Queue

Books I’d like to cover include The Art of Computer Systems Performance Analysis, The Nature of Mathematical Modeling, any math books that help me with understanding these parts, Linkers & Loaders, etc. I expect to increase this list over time, particularly to incorporate books that help me understand GPU behavior, LLMs, etc.

At the same time, Queueing theory also applies to organizations and systems, so I’d also like to complement this series with The Principles of Product Development Flow, which also goes into the math.

Simple Bookmarking

As an attempt at capturing interesting links – and recording where I find them – I’ve set up a Google Form for myself that publishes to a spreadsheet. I use it to upload papers straight to my drive, capture links from HN & Twitter, etc. I’m excited to see which of my technical news sources I should maintain and which ones I should cut down on.

A colleague – CT – pointed me to a very useful utility that can be LD_PRELOADed. Spending some time googling made me realize I’d been missing out on some very useful infrastructure.

There are some great blogposts on this, and also seems to be a really fun pattern I now want to apply on other pieces of native code. LibSegFault itself adds a SEGFAULT handler that prints a lot of useful information to help isolate where the segfault happened, without needing to recompile the code at all.


Mark announced that we’re building LLaMa3 – and given that’s public, I can also say that I’m helping out with that by building and maintaining tools for the team. This is one of the projects I’ve been most excited about in my career so far.


A fairly long week; I haven’t been able to spend as much time learning new things as I would like; and I really need to start structuring these notes better.

Tweaked the CSS to enable text-size-adjust, which makes this website significantly easier to read on mobile – I’d been wondering what I was missing compared to expLog.

/proc tricks

The Linux Debug filesystem is one of my favorite things: it’s a ridiculously convenient way to explore, offers a lot of affordance and is something you can play with live.

/proc/$pid/environ can be very useful at times, and it contains a \0 separated list of the environment the process was started with.

I’ve actually used it twice recently, and I thought I’d record the tricks:

  1. Sometimes programs will unset environment variables they were started with before handing over control, eg. LD_LIBRARY_PATH. I parsed /proc/$pid/environ to recreate the original environment (split by \0 and put it into a map), and then exec’d into the process I needed with that environment. It works surprisingly smoothly.

  2. I was writing bash scripts I needed to be configurable, so I had three layers of configuration:

    • A set of default environment variables
    • Config files that could be sourced and used to override these
    • The ability to set the variables at the CLI while running to override both of these.

    Now this becomes tricky because variables set in the parent environment are generally the first to be overridden. /proc/$pid/environ to the rescue again: after sourcing the passed in config file, I would read the environ file and pull out the config variables I cared about and explicitly source them.

    Part of this was inspired from how Bash Idioms deals with configurations (including self-documenting code).


Marimo is a beautiful new Python notebook that I wish I’d built. Someday I’m sure I’ll try and build my own – almost certainly with HTMX and even more minimalism.

Implementation details

Features I really liked

Data format

The generated notebook is an executable script that relies heavily on decorators (app.cell) – which is a significantly more elegant approach than json. Of course, that also means that no notebook outputs or state is ever persisted to disk.

Threading behavior

I’m always curious about how notebook desigers implement something like async tasks; asyncio itself seems to be slightly hard to use in these notebooks – there’s no default event loop, and I couldn’t quite get it to make one for me to run background tasks.

Then I tried explicitly creating a thread: this doesn’t actually print anywhere and instead prints to stdout. I expect this is something that’ll get fixed in the future.


While I think this is a very nice way to build applications, I’m not yet sure if I’d like to use this approach to build “notebooks” for exploration.

I’m also going to be spending some time exploring more industrial strength visualization tools to handle exploring and visualizing much more complex data.

I also decided to try out to maintain notes on LLMs – as a complement to these letters. Unfortunately I’m probably too used to being able to customize my workflows to stick with it.


While reading tremendous amounts of fiction, I stumbled onto Master of Change by Brad Stulberg – which has been surprisingly soothing and engrossing. There are also some extraordinary quotes in there:

“It seems that all true things must change and only that which changes remains true.”

  • Carl Jung

Which is quoted in How to Skate.


Preparing for a brand new year! Spending some time thinking through what I’d like to learn, do, and build – ideally with a why attached – followed by how I should go about it.

To Learn

Large Language Models

Transformers – and large models in general – have been changing software; I’d like to be able to regularly and comfortably keep up with the state of the art, and train my own models for different purposes. Whether that means training a small model from scratch, fine tuning or applying open source models, or shrinking models to be tiny.

At the same time, building mechanical sympathy for these models seems really important: an intuitive sense of the hardware capacity required, the number of operations running a model takes. I expect I’ll be writing about this and spending a lot of time here in 2024.

Of course, this needs to be complemented with actual working code, notes and projects. I expect to level up in Systems Programming, Networking (one of my biggest weaknesses), Cuda and HPC as part of this.

Maths & Physics Fundamentals

I frequently run into limitations because I don’t have a good command over undergraduate - graduate level Maths & Physics. So I’ll try and hit some good old textbooks, including Feynman (both on Computation and Physics), and classics on Math.

Information theory, Queuing theory and more advanced simulations are topics I’d like to dive particularly deeply into.

To Build

Hy Tools

Hy has been making programming extremely fun again because I can bring together a lot of things I enjoy: Lisps, ML, iterating quickly and interactively. At the same time, the available tools have bit-rotted a bit, in progress or just don’t exist yet – nREPL, tree sitter, the emacs mode, jedhy, etc.

Once I have all of that in place, I’d like to play with some Python/Notebook experiments that have been floating around in my head for a really long time, where I mimic SmallTalk while still supporting a production Python environment. With enough elbow grease we can write code like they did in the 80s!


I’m leaning into visualizations, composable tools and also thinking about UIs at the moment: doubling down there and making it easily usable and customizable.

Notes box

While writing this letter has been valuable to reflect on my week and things I’ve been learning, I need a better mechanism to easily add and iterate on my notes, particularly with the ability to have an LLM read them for me (though I suspect that’s not going to be as valuable as having an LLM read things that I haven’t read).

While I’ve generally kept my slipboxes public, I’ll experiment with making a private one this time.

The other thing I learned was to try and use an LSP for organizing and cross linking between the notes easily.

To Write

An open source implementation of intermediate logging

I promised to do this during my talk at PyTorch Con '23, and I’d like to follow through on this. I’ve structured the code in my head several times over at this point, so it’s just a matter of getting it written in a way that works.

Maintaining these letters

As a summary of things I’ve been learning and paying attention to over the week, ideally cross-indexed with the notebox. At some point I may also tell people I’ve been writing them once I have a clearer purpose and structure – including some form of pagination.


Last one for 2023!


A short burst of hacking over Christmas Eve let me release a preview of orphism on PyPI and GitHub.

While I’d been calling this project Fauvism I realized it was already taken on PyPI. Orphism was a quick search and replace away – thematically it seems to fit even better, as a derivation of cubism.

Implementation has been fun: I decoupled the code into “bucketing” and “rendering” which helped smooth it out a lot, but there are still a lot of edge cases and off-by-ones to reason about that I’m not particularly happy out. And while I can render sin(x) and cos(x) pretty well, tan(x) looks ridiculous.

The data ink ratio is also remarkable for the amount of data that just fits into a single line.

I’m hoping to use this for rendering model weights live, so I’ll add support for NaN, Inf, numpy and torch Tensors. Then I’ll figure out how to make it fast.

Unfortunately this has been fascinating enough to draw me away from Advent of Code, but I suspect I’ll find my way back there soon enough anyways.


Ended up relaxing a bit last week. A new week, another attempt at learning to program!

Advent of Code

After getting really tired of thinking through the problems, I realized that I generally have the most trouble when I pattern match too aggressively on past problems. Instead of treating the problem as a brand new problem to evaluate on its own merits, I’ve been forcefully fitting it into a pattern and struggling when it didn’t match.

I need to adopt a little bit more of a beginner’s mind here and have more fun with the problems.

Day 12

I spent a lot of time thinking about this one: particularly part 2. My first several attempts were around recursing, caching, and reducing the problem.

The other – in hindsight, questionably – useful insight was that there were 2 ways to count the number of possibilities: one by trying out two options for each “?”, and one by generating all possible strings with the constraints. I also derived a formula to figure out when it would be better to keep recursing instead of trying to enumerate all possibilities. This worked well enough to solve the sample input, but even letting it run with PyPy for over an hour didn’t actually solve it.

A few days too late I pattern matched this to edit distance, and after thinking through the problem over a couple of days finally figured out a solution:

/ . ? # .
. 1 1 0 0
# 0 1 1 0
. 0 1 0 1

Google Slides

I constantly wish for an index card application that’s minimalistic, easy to use, and can actually replace using paper index cards for brainstorming. While thinking about the ways I could leverage existing, widely available tools I’ve decided to experiment with using slides.

Using a minimal theme, I’ve been using slides as a way to take quick notes, have bullet points I can rearrange and an easy way to build quick and dirty diagrams.

Here’s an example from building Fauvism: Bucketing strategy


Picked up 33 miniatures in linear algebra after a Tweet about it floated across my feed.

I’m worried this will be yet another aspirational book – one I’d like to be able to read easily, but find too hard at the moment – added to my tsundoku, but the first few chapters have been manageable.


A new week begins – another 7 Advent of Code Problems, hopefully a small open source project and some exploration with llama.c.

Advent of Code

Day 11

A surprisingly comfortable day after day 10: I almost started implementing the Floyed Warshall algorithm before realizing this was simply Manhattan distance and there was nothing particularly fancy going on here. Part 2 was fairly reasonable and direct as well.

My chromebook (Macbook Air 2012 with ChromeOS Flex) still keeps troubling me though: there’s some? update at around midnight that seems to cause it to freeze: copy-paste fails, chrome stops connecting to the internet; all terminals stop working. I still can’t quite figure out what’s going on – but it definitely plays havoc with my rankings.

… and beyond

Surprisingly, I’m finding myself a little bit bored at this point. I’m not enjoying staying up waiting for the problem to drop any more, and have started finding the problems a little bit of a chore so taking a break from AoC. I’ll catch up when / if I feel like it I guess.


So far I have a small video and less than 100 lines of code:


Somewhat annoyed at how long it took me to implement a working transformer, but I’m blaming Advent of Code for distracting me. (That’s not the truth though, I’m just not practiced enough at reasoning through matrices and tensors just yet. Soon.)

Just as a warning if you’re reading this: I keep editing the letter for the current week live, moving on after midnight on the Sunday this is published (so this particular letter stretches from 2023-12-04 00:00 to 2023-12-11 00:00, and will have the heading 2023-12-10).

Advent of Code

Day 04

Another fairly mechanical problem, I ended up spending most of my time re-reading the second part. Given the straightforward nature of the input format this time around, I decided to take a second stab at the problem with parsy: an excellent parser-combinator library that I’d been eyeing after spending more and more time writing regular expressions for AoC.

In retrospect, nested regexes would have worked just as well, but that’s not as much fun. My favorite interface to write regular expressions is in emacs, because I can see the behavior live – I’ve seen a couple of websites that do the same but I’m yet to see something as smooth as emacs is about it.

Day 05

A classic AoC puzzle: intersect segments with some custom code. There were some interesting solutions on Reddit that worked backwards, several that brute forced it – but I didn’t see a Tree implementation yet. At some point I’d like to implement this with an interval tree.

Day 06

Somewhat relaxing puzzle; I spent a bit too much time writing things out on paper first – and stumbled a little bit when the roots of the equation were integers. Happily the test cases included exactly that scenario, making it much faster to iterate.

Learned about math.nextafter today: I was looking for the quickest way to get the next integer from the current value, irrespective of whether it’s an int or not.

Day 07

Feeling somewhat lethargic today, but I didn’t do too badly in time. Today’s problem was slightly fiddly and easy to mess up; I also wasted some time because I didn’t read the instructions carefully enough and missed a case.

I realized I didn’t remember the APIs for Counter and had to look them up; also simply reversing a string, etc. which is a bit annoying. Oh well. As a test, I also recorded my attempt so I can go back and review later.

The most elegant solution I’ve seen so far was on Reddit by gemdude46. Looking at simple and well structured solutions for problems I’ve tried myself is generally one of my favorite parts of AoC.

Someone shared a video that listed all AoC solutions, so I couldn’t resist writing my own helper program to list solutions out. For now, all my solutions run in around ~half a second; I hope I can maintain this trend.

All the solutions

Day 08

My Chromebook gave out again while I was working through part 1, but I managed to catch up significantly in part 2 once I figured out what was going on. I’d started by recording myself work, and I’ve realized a lot of things that I should be doing to program faster that I’m planning to do from tomorrow:

The problem itself was fine; I gave up and went with a quick regex to unblock myself: if nothing else AoC helps stress test my regex skills.

I’ve been enjoying watching Jonathan Paulson’s solution videos, which also gave me a lot of hints on how to speed up my solutions.

On another note: I spent a bunch of time today trying to run my solutions through GraalVM but failed miserably; it can’t install hy, and generally installing – or trying to install – a python package started heating up my laptop. Hopefully some other time. Seeing Jonathan’s solutions with Pypy I decided to give that a shot too, but didn’t get very far – particularly because it’s not meant for short running processes.

Finally I decided to play with an idea I’ve described earlier, and compiled the code with Cython: and it runs beautifully. I’m going to have to spend a little bit more time with this, and I suspect I can get an experience with all the power of Common Lisp and Python in a language I really enjoy.

Day 09

I tried to be much more incremental this time around but still ended up being pretty slow; I’d like to say that I was too sleepy and tired to focus properly, but looking at the leaderboards does make me wonder at just how much faster I could be if I keep working at this.

The problem itself was interesting – it took longer to understand than anything else – and I still wonder if there’s a trick to finding a mathematical solution. It feels like differentiating a series till it turns to zero, and then figuring out the deltas / equations.

Day 10

Wow this was tricky. I solved part one with a BFS traversing the nodes and then taking a maximum. For part 2, I tried to maintain the party of all my answers. It’s always surprising just how much clarity a quick visualization can get me when compared to the raw data – this is something I should apply more widely.

Looking at discussions and other people’s solutions, I learned about the Shapely library, the shoelace formula and Pick’s theorem – though not well enough to apply them. Several people also implemented a floodfill after adjusting the grid, which was not an approach I’d really thought about – but it is pretty clean.

Speeding up

As an experiment, I decided to record myself solving Day 09 while fresh in the morning – already knowing the solution and just trying to get a baseline of how long it would take me to solve the problem.

Analyzing my videos (original attempt, second attempt in the morning):

Notes Actual attempt “Ideal” attempt
Time to first 13m21s 3m11s
Time to second 20m10s 6m12s
Bugs 1m19s Didn’t split line
2m57s Didn’t include original number
4m35s Didn’t toggle the flag

With these times I would have barely made it into the leaderboard for part 1, and not at all for part 2.

Potential follow ups:

Somewhat disappointingly, trying to redo the problem still led to bugs: in the past, I’ve run into this while trying to go too fast. Instead of trying to push for speed, I think I need to focus more on executing the code constantly in my head as I type and making sure I avoid making mistakes instead.

Knowing the problems I’d run into the first time around helped speed up the 3rd attempt to something around 4 minutes, but that’s so far from a real attempt.


Slowly working through the attention block and cross checking every step; I hadn’t internalized that the calculations for K/Q/V are exactly the same before masking / summing over values.

From Hacker News: LLM Visualization is astonishingly well put together. I wish I was comfortable enough with 3-d programming to generate visualizations like that.

I’ve finally succeeded in making my attention block match the one from the test, through main brute force – comparing tensor by tensor, step by step. My mistake was at the last step: instead of summing over the heads explicitly, and then adding biases I was using einsum to both add the biases and sum over the heads at the same time. At the same time, the test used a buffer IGNORE that I had just set as a constant – but for evaluation the buffer would be overridden making my values diverge further.

Clearly I’m still not entirely clear on the ordering of the operations and why they matter; I suspect the model would train the same (but that is probably just my naivete).

I finally managed to make my Transformer work closely against the reference GPT2 implementation provided by the TransformerLens library. Of course, as soon as I find build the most basic familiarity with transformers, a new paper is released that seems very promising: Mamba. I’ll spend some time playing with this as well, while I build more mechanical sympathy for these models.

I’ve been spending some time thinking about what to tackle next, and working through and implementing llama.c to reproduce the Llama2 and actually using it for inference sounds like a good plan – and of course, I’ll be doing all of that in Hy.


While playing with a terminal version of TensorBoard I started wondering if I could render cubism-style charts instead to gain even higher resolution than the sparklines offered by Textual. I also wanted something that didn’t try to fit the data into the available width, and would instead let me scroll.

Which led to a small project I plan to use in a couple of places, including the open source release of Intermediate Logging. Fauvism tries to render both positive and negative values, and I’m building it in layers (Rich Renderable -> Textual Component) and a reasonably useful CLI app as well.

The most interesting bit is figuring out how to show negative values: I’m experimenting with flipping the bg/fg colors and simulating unicode blocks that start from the top instead. I still need to work through the edge cases though – but I’ve only spent around two hours on this so far.

This is in Hy too, but is transparent to library users ( imports Hy, making the rest trivial).


On 2023-12-05 I completed 12 years working at Meta: 3.5 years at Facebook California, and 8.5 years in New York. It’s been a while.


Satisfying another static-site generation itch this week with a simple table of contents generated with the strategic application of fragile regular expressions.


Continuing to work on a transformer implementation, I’m finding Arena very helpful, because I can write the transformer code layer by layer and sanity check it against GPT2. Once I have this, I’d like to try and implement my own completely from scratch or from a paper, but at least this gives me something bite sized to tackle first.

Visualizing / thinking through multidimensional matrix multiplication is giving me a massive headache, if I’m honest. Peeking at solutions that just use einsum was bittersweet – I’m happy it’s possible to express it so cleanly, and I was sad I hadn’t known about it earlier. I definitely don’t enjoy having to deal with batches as an additional dimension – I almost want batches to be something that can be plugged into the model post hoc as a post-processing step.

Trying to debug my Attention block – given I have a reference implementation open right in front of me – is a good exercise in realizing very concretely that ML Debugging tooling is very primitive at a lot of levels; trying to build a transformer is just driving that home very viscerally. On the plus side, this gives me good ideas for projects and visualizations to build.

Simplifying Transformer Blocks

On the same note, I’m pretty excited about reading this paper on Simplifying Transformer Blocks (via @arunsees).

I should read more about Signal Propagation Theory. But I think I’ll stick to having my own transformer implementation first before I keep distracting myself.

Type Macros

Still iterating on the attention block, I’m now rewriting it into something like I would normally do instead of relying heavily on existing Module patterns. Started with a little bit of procrastination to implement my own typing Macros in Hy – I can see a whole new world of DSLs opening up in front of me. I tried a plain old macro and a reader macro to replace jaxtypings Float[torch.Tensor "dim1 dim2 dim3"]:

(defreader T
  (.slurp-space &reader)
  (setv out (.parse-one-form &reader))
  (setv #(dims tensortype) (.split out ":"))
  `(get ~(hy.models.Symbol tensortype) #(torch.Tensor ~(dims.replace "," " "))))

which looks like #T dim1,dim2,dim3:Float. But when used with an annotation it becomes a bit too unwieldy for me (#^ #T ...) and I couldn’t see a quick way to bypass that.

So I’ve settled on a simpler defmacro instead, which looks like

(defmacro T [dims tensortype]
  `(get ~(hy.models.Symbol tensortype)
        #(torch.Tensor ~(.replace (str dims) "," " "))))

and I can use as (T dim1,dim2,dim3 Float). I’m not particularly happy with this either, but it’s better than (get Float #(torch.Tensor "dim1 dim2 dim3")) which is what I’ve been living with so far. I’ll have to think a little bit more about this, and figure out how to write assertions on these – perhaps with a custom defn wrapper macro instead.


YouTube’s ranking algorithm has been getting better: it recommended a DevTools podcast with Mitchell Hashimoto showing off his workflow which was a lot of fun.

Personally, I switched from Mac to Linux a couple of years ago (I’m even typing this post on a decade old Macbook Air running ChromeOS Flex) because while I really like Mac’s hardware, I was tired of fighting the software.


After writing a lot of documentation recently, I’ve been finding myself surprisingly attracted to using presentations as a quick documentation mechanism: I can easily annotate diagrams and text, point people to a very specific slide – and most importantly – they don’t seem to be as overwhelming even if there are large amounts of content, just because of how skimmable they tend to be.

A significantly more formal – and cleaner – approach to documentation is at Diataxis which I plan to adopt and learn from.

Advent of Code

I must admit to being very excited about tackling Advent of Code again with Hy this year (though I’m also tempted to try Zig or Go, just to get another systems programming language under my belt). I’m hoping to use it as motivation to improve some of the tools around Hy & Emacs – potentially just updating some of the existing tools that have broken with language changes.

Trying to run Blitzen – my Rust helper – failed; at this point it’s probably too old too compile so I quickly made a version in Hy; with Requests and BeautifulSoup this turned out to be very small and surprisingly smooth to write; reinforcing why I enjoy writing Hy / Python so much. SPOILERS below, of course.

Day 01

I can’t say I started off particularly smoothly: the first part was reasonably quick to pull off, even though I stumbled a couple of times – submitting my solution with bzn also went off smoothly.

The second level was painful for me though, mainly because I misunderstood the order in which the second number could be picked up. Regexing backward on a reversed string was the best solution I could come up with in a pinch.

At some point, I may try to do this with a state machine instead using a Trie just to make the parsing more efficient; writing it out in Hy will also be interesting – all the indexing gets pretty painful quickly.

My original solution was significantly more verbose and painful, but lended itself to refactoring remarkably well; so I’m happy I have a small solution up and running at the end. Of course, looking at Reddit shows me that I could have handled overlapping strings much more smoothly in Python by retaining the original letters; given the constraints we were facing this would just work. I could also have been fancier and used lookahead regularexpressions (?=(...)).

Day 02

Short and sweet; I even managed to just sneak into an under 1000 rank for part 1, and just above 1000 for part 2. Every time I do Advent of Code I remind myself that it would be a good idea to have some way to very simply and quickly read strings, and perhaps I should keep some library of parser combinators handy.

Day 03

I took a mostly mechanical approach today and could generally write functional code fairly quickly: though I was somewhat betrayed by my laptop which ended up freezing and had to be restarted. Given it was just past midnight, I have to wonder if there was an update transparently being applied that broke things.

Happily, I could still submit my solution to part 1 through emacs even though Chrome and my Terminal stopped function; bzn.submit-answer happily pulled through.


This week’s letter starts with a face lift, combining some of the pieces I’ve been writing about: a color palette inspired by Dieter Rams and a simple site builder written in Hy. Almost everything is typeset in Inter; though at some point I’ll swap in Berkeley Mono for monospace fonts and maybe the headings.

(tl;dr; for the rest of this week: I’ve been playing with Hy.)


There are a remarkable amount of excellent notes and explorations on Transformers out there: this week I stumbled upon Neel Nanda’s blog from a Zulip discussion from RC. With sufficient Googling I’m beginning to realize that Transformers may be this decade’s Monad – and I probably shouldn’t write yet another post on how to implement my own. Though I almost certainly will do one for a Transformer in Hy.

Neel’s blog took me to Arena which hand-holds you through building a transformer. I’m taking a slightly different approach from the recommended set up and working in Hy, but maintaining everything else. Instead of copy-pasting set up code directly, I’m manually transforming it to Hy and doing my own set up, but so far that approach seems to be working. Learning about einops and jaxtyping has also been interesting – working with Hy I can see myself writing a couple of macros to make all of this significantly more ergonomic.

Hy, Textual & UIs

I finally ended up making a working application with Textual, using Hy as my actual language: a minimalistic version of TensorBoard that renders scalars as sparklines. It was surprisingly satisfying, particularly after I could enable sparklines.

Message Queues are probably my favorite mechanism to make sure the main thread in a program is unblocked; and once I wrapped my head around Textual’s APIs (particularly that every widget has access to the message queue – and the thread safety of certain functions) things became much easier. Enough to be able to set up auto-refresh and manual refresh, with Toasts popping up as data quietly updates. Textual also seems to be handling mounting new elements much more simply.

Hy, Common Lisp and Compilation

Hy keeps growing on me: while looking at old Hacker News posts on Hy, I also came across a post on using hy that resonates a bit too much.

I slowly find myself thinking about common lisp’s approach to compilation: and how Python is also reaching there: Triton compiles code live and creates Cuda kernels; torch.compile is the same idea. With a good REPL I can easily seem myself having a similar workflow with Hy.

There are some tools that I miss at the moment: JedHy is a bit out of date, the highlighting is slightly broken and autocomplete is accordingly somewhat minimalistic. On the other hand, the language itself simplifies code enough to make these trivial annoyances.

My favorite recent snippet is for generating HTML:

(defn tag [name #* children #** attrs]
  (let [child-tags (.join "" children)
        attr-str (.join " " (gfor #(key val) (attrs.items)
    f"<{name} {attr-str}>{child-tags}</{name}>"))

(tag "html"
  (tag "body"
    (tag "h1" :style "color: red"
         "Hello World!")))

It’s delightfully minimal, handles the HTML as strings but still gives me a remarkable amount of flexibility. In Python, I would probably end up using with, but with the right notation I can just write it out directly in Hy.

I’ll rewrite it to be slightly friendlier with a macro, potentially allowing for #< instead of tag.

Black Friday

My biggest set of expenses for Black Friday have been books, of course. I’ve picked up 3 books so far (that I will almost definitely read):


Another surprisingly busy week; weeks where I don’t have enough time to learn something new seem disppointing and pale – I should make a stronger effort to take time out to study and explore and build.

I picked up one of my favorite books – The Art of Science and Engineering by Richard Hamming; apart from Small Gods – it’s one of the books that been most influential about how I think about my life. I’m looking forward to re-reading with a new lens, and hopefully getting something new out of the book.


I find myself slowly becoming more proficient with Bash; enough to be able to quickly put things together without having to google too much. Quoting and arrays are still a nightmare, of course, but there are places where they just work.


Thinking about Bash, Python, and the desire to write systems programming code, I found myself disappointed: a Lisp-like macro system and homoiconicity seems perfect for writing efficient code, but there was no Lisp that seemed to satisfy these requirements. I find myself tempted to write my own. This is in stark contrast to last week’s dreams on building an automatic profiler, but is somewhere close by.

I find myself tempted to work through Crafting Interpreters with Hy, using the effort to improve Hy itself, think about building my own language and levelling up a little bit. At the same time, I’m curious about which programming languages would be easy for a Transformer to write programs with and get feedback; would assembly be simpler?

Of course, ChatGPT said that Python is the easiest language to write because of the sheer amount of existing code. That said, I’m a little surprised and suspicious.

At the same time, I’m also surprised at the lack of specific programming tools: Copilot and ChatGPT should be able to do significantly more analysis on the programs being written to design real systems well and quickly.


As a project, I expect I’ll go back to numpy or PyTorch – I haven’t enjoyed using JAX much, and with PyTorch i should be able to write code quickly.

I spent time standing in lines and sitting around in a cafe re-reading How Transformers Work along with several links within the post – and that helped make things click much more clearly than they have recently, particularly when reading after watching Karpathy’s videos a few weeks ago.

The thing I’m still struggling with is that the transformers – and perhaps a lot of the architectures – are much more evolved and empirically determined instead of being designed. Why does the value of attention heads fall of after adding 6? That’s probably some function of the input data, information theory, and may be aligned with the tokenizer.

I really appreciated that this blog post also went into the details of tokenization, which have been somewhat obscure for me – just because I haven’t gotten around to paying attention. There is something here to play with, and I really enjoy Anthropic’s approach to this with Mechanistic evaluation of models in Transformer Circuits.


On a completely different note, I also spent time building a TUI using textual and Hy to let off steam (and I suspect I’ll be treating this project as my personal video game for the coming few weeks).

I’ve been having a terrible time getting used to all the APIs and mechanisms available in Textual to write apps – and if I had one suggestion to make it would be to make it much simpler; right now the API and the components offer too many things (worker threads, magic async functions depending on how you define things, way to many magic instance members that change behavior) and simplify it to something that maintains views.

That’s how I’m planning to use it anyways, with all the business and data fetching logic extracted (something like MVVM potentially? or MVC?) in a way that feels comfortable. Hy is beginning to feel familiar, though I still stumble ofter (why does for look like (for [x xs] (print x)) but lfor skip the []: (lfor x xs x * x). Potentially an implementation issue, but it was surprising when I ran into it. The language is also significantly more ergonomic than I had realized, with support for setx that sets and returns values as an alternative for setv.


Hopefully this Thanksgiving weekend I’ll have a chance to: take significantly more detailed notes from The Art of Science and Engineering, and potentially talk about applying it to the world today.

I’d also like to refurbish my online presence, reset and simplify my dotfiles and simply clean up this site and my slipbox significantly. I’ll also be taking a stab at writing out the implementation of intermediate logging for open sourcing.


A busy week spent mostly traveling, and occasionally reading code and how certain systems work.

Bash signals

I’ve been spending a remarkable amount of time trying to reason about and understand how signals interact with bash scripts this week.

There’s one very important rule that may not be obvious: if a shell script is running a command, bash will block all signals it can. The only way to get signals unblocked is to run a backgrounded process (using &) and using trap.

The other generally simpler alternative is to simply exec into the script you want to forward signals to.

Automatic profiling

After spending a lot of time seeing people profile and improve distributed systems, I started to wonder whether it would be worth investigating simulating a system (hardware, network, performance characteristics, even software, etc.) and using that to optimize the system.

ChatGPT has pointed me to a significant number of resources, and clearly this is something that has been deeply researched; I have a lot of reading and exploration ahead to understand this.

User Interfaces

A recurring belief I have that slowly keeps strengthening is that the amount of effort that goes into building working user interfaces is completely disproportionate to the value created by them. As someone who strongly prefers minimalistic designs and genearlly appreciates form over function I’m even more biased against spending a lot of time polishing UIs.

In past lives I’ve spent weeks to months aligning pixels, and even rewriting a website from scratch for being off by 1 pixel. In certain contexts, that makes a lot of sense; but for a lot of other jobs to be done the UI is just not important. And spending engineering years implementing drop shadows, animations (which also consume a surprising amount of compute & battery) has come to feel like a bit of a waste.

I do live this as well: wherever I have the opportunity (ie a linux install) I’ll end up installing i3 or sway and then working through them instead of dealing with other window managers. Text gets me most of the way; simple drop-downs and affordances do the rest.

Part of this feels like the eco-system developed around UIs, particularly cross platform UIs. HTML, CSS and Javascript get us some part of the way – but the last mile seems to be so much more costly than I would have expected.

Reflecting on these letters

The point of this newsletter was to reflect on the past week and write down things I found interesting; now that I have almost a month of letters to look back it there are some clear patterns:


Continuing the theme of not having that much progress, I still have even more books coming in.


This week I picked up Linkers and Loaders on Olivier’s recommendation. The book is from '99, but so far it’s been extremely helpful in filling in gaps in my understanding of just what LD_PRELOAD, LD_LIBRARY_PATH, ldconfig, etc. do with significantly more concrete examples. Understanding “runtimes” has also been helpful.


Stumbled onto a new video by Antirez on building a minimal IRC server: Smallchat. I really enjoy these kinds of projects, as a way to teach / learn different types of programs. Every one of these I see I want to reimplement in Python/Hy to explore and learn.

Smallchat is no different, and it sets up a server that can be connected to over telnet.


Continuing on last week’s theme, I’ve been exploring how conda sets paths in the installed environments and I have to admit to being a bit surprised at how much is patched into binaries. conda-pack does a good job figuring things out, but it seems remarkably painful, and I have to wonder why path resolution needs to be so complicated.

I’m also playing with Textual on the side, again using Hy because I often want to create a GUI to display results, but generally can’t be bothered to build one. Building command-driven guis seems like the best way for me to stay sane.


I’ve been thinking about the value of developer experience, and even choice of programming languages. Over time, I’m becoming more cynical on the value added by the last mile of improving tools or languages.

There are things that are essential: access to a devserver, how you simulate a test vs production environment, something to run tests – but then once the big things are taken care of the rest seems to be … bikeshedding. unittest vs pytest or vim vs emacs – perhaps this is a function of reading [[Kill it With Fire]] but familiarity seems to overwhelm the rest of the choices. Whichever toolset resonates with you is fine; the cost of switching outweighs the value provided by an alternative tool.

The clear winner from that approach is to lean in harder into the unix approach of building composable tools; having files as an abstraction works a bit too well for the most part. I wish there was something similar for building User Interfaces that composed just as well – perhaps that’s why HTML/JS/CSS became the default UI standard – websites have some support for composition.

Open Source also trumps custom tools in the same way; though open source tools may not be particularly coherent across itself which can be tricky.


In some ways, this week’s letter is a little bit sparser: I’ve been heads down at work, and haven’t really been able to focus as much on learning new things as I would have liked. Looking back at last week’s letter, I see that I ended up mostly abandoning the books I was actively reading in return for focusing on things I’m working on, and looking for small pieces of novelty.


As both a replacement, in addition to Netflix, I continued catching up on Strange Loop videos:

  1. Building Distributed Systems was a great talk on different, new languages that can make it significantly easier to build and reason about consistency and distributed systems. What really stood out to me was a quote: “The speed of light is roughly 4 inches per clock cycle”, building much better intuition on cpu speeds than I’d had before.
  2. Metrics for preventing security incidents is an interesting take on building metrics for something I would generally consider impossible to measure. Personally, I’ve been very skeptical on building false precision by introducing numbers where none exist, but I’m coming to understand the value of having something like this.


I treated myself to a physical book as well, thinking about adding a design to this site: a book by Dieter Rams on design. The 10 commandments are fascinating, and resonate strongly – particularly Good design is as little design as possible. Sadly enough, the book’s format is not as friendly to read as I would like – there are thin columns of text with German and English translations next to each other, making it somewhat unweildy to read.


Python’s wheel building has been wonderfully inconsistent, and I’ve slowly been learning more – sometimes with a growing sense of terror as I consider the sheer number of options. Pypa has some excellent documentation, with a small page on editable installs. The Setuptools page on Development Mode is significantly more detailed and useful; with corresponding PEPS: PEP660. There’s also a library to build editable wheels. In practice, I often see libraries installed with a custom .pth file in site-packages, with a pointer to a custom data loader.


I’ve been wanting to have a ZSH prompt that split the terminal, to make it very obvious when the screen is moving from one command to the next. All of the mechanisms I’ve been seeing involved using live calculations, but a simpler trick has been relying on prompt truncation; documented in the ZSH docs on Prompt Expansion.

Simply using a very long line of unicode box characters and having them truncated by using %<< has worked very well. The documentation is a little bit hard to pass, but as a simplified, real example:

export PROMPT="# %~ %-2< #<───<... snipped ...>───"

will end up generating a line that ends with the characters # (which I appreciate for symmetry).

# ~ ───────────────────────────────────────────── #

The prompt also explicitly starts with a # character so if I copy paste my terminal into a script, it simply turns into a comment without breaking anything. I remember seeing someone use ; which may be an even more elegant way to achieve this.

Building my own transformer

Inspired by Andrej’s videos, I’ve been slowly iterating on a custom implementation in Hy & Jax. I expect to fill that out with time.


A – tentatively – weekly catalog of things I’ve been finding interesting as a programmer. There’s always something interesting going on, and I wanted to have some record of what’s been catching my attention spread across time.

Writing things out – well, or poorly – has generally paid off well in clarifying what I’m thinking about, showing the gaps in what I’m thinking, and helping me navigate the world in general.

I hope these letters help me start – and maintain – this practice again. And that they can capture some of the joy, curiousity, frustration and sense of excitement I find on programming; and mail themselves back to me on days I find myself jaded.

If you happen to come across these, you should expect a lot of links across several domains: programming languages, systems programming, ML, design, systems and organizational dynamics and whatever happens to catch my fancy. This, very first, edition is likely to be significantly longer than the rest just because I have so much to say it forced me to start writing.

Reading material


Newsletters I find myself inspired by: Factorio’s Friday Facts for the detail, craft and readability; Craig Mod’s pop-up newsletters, taking them into an entirely different art form; John Cutler’s The Beautiful Mess for incisive descriptions of patterns and anti-patterns in organizations; and of course, Kent Beck’s Tidy First – musings on design, a collection of new ideas, and the fearlessness to constantly experiment.


When it comes to books, my eyes are much, much, much bigger than my stomach. I have far too many I’m trying to read at the same time; some of the books I’ve read over the past week include:Kill It With Fire, a fascinating book by Marianne Bellotti which I ran into while catching up with Strange Loop talks I wasn’t able to attend. There are several lessons here: the incredible value of familiarity with the existing systems, why cp and ls were named the way they were and more.

At the same time, I’d like to have a significantly better handle on programming GPUs: Programming Massively Parallel Processors has been a pleasure in both learning about CUDA and being up to date in a very fast moving world.

On the same note, Understanding Software Dynamics brings significantly more rigor to my understand of performance; embarrassingly enough this book disappeared into one of my collections and I forgot all about it till I stumbled back into it recently.

Bash Idioms, the Google Shell Style Guide and ShellCheck have been helping me write up some production-worthy shell scripts (with several questions to ChatGPT along the way). Misunderstanding parameter expansion led me to committing broken code repeatedly to the point of printing out a cheat sheet and a solemn promise to only ever use [[ -z ${1-} ]] and [[ -n ${1-} ]] when testing for an argument with “strict” mode (-u) enabled.


Given I’m working on tools used by people building transformers, and that I spend most of my day bothering ChatGPT with questions on documentation I can’t be bothered to read, it seemed like a good idea to implement Transformers on my own steam. I spent most of a 6 hour flight watching and re-watching Andrej Karpathy’s video on NanoGPT while also trying to implement pieces in HyLang and Jax – as a way to make sure I actually understand the material. I’ve been making slow progress on the bigram model.

Hy Language

I enjoy using Lisp, and I enjoy writing Python. Re-finding a surprisingly functional implementation of a Lisp that runs on Python has been surprisingly cathartic and enjoyable; I expect to use this combo for most of my personal programs in the near future.

Hy is very usable, and I have a lot of stuck projects: Transformers, a site generator for this website, migrating my slipbox, working through PAIP, Let Over Lambda and similar books that get unblocked as I play with this language. There are rough edges to work through, but for the most part I find myself delighted.

Intermediate Logging

Last week, I was finally able to talk publicly about some work I did in 2022; building support for logging intermediate values in PyTorch – ignoring any transforms that may be applied to the model. It’s some of the sneakiest code I’ve ever written, with significant amounts of metaprogramming through code generation. I plan to refactor and release the code soon; I have some ideas on how to write it in a way that makes it both easy to understand and to use. The slides for the talk are available online, and the video should be up soon.


2024-07-07TermDexCUPTI and LD_PRELOADStream Dye2024-06-30ZigTermDexPerformance EvaluationsWindows Surface Pro2024-06-23TermDexMicrosoft Surface ProPtysZig, GoUpdating configurations2024-06-02TermDexUnix PhilosophyLLMs for the CLI2024-05-26TermDex, Go and ClaudeApplying Transformers to knowledge workSpeculationThinking about the letter2024-05-19TermDexInformation Theory / Chainsaw / TF-IDFGoSpeculation2024-05-12LLaMa3 HackathonGoPython’s Memory View2024-05-05Multiprocessing QueuesA refreshed .tmuxrcAutocomplete anything on screenStanford LecturesMore go hacking2024-04-28Stanford LectureTransformersGoZookeeperDjikstra’s notes2024-04-21LLaMa3Python & EmacsPenzaiWax, and languagesStanford Lectures on Transformers2024-04-14Cleaning up dotfilesZig, Rust, systems programming & Lispswiki.c2.comStanford Transformers Lecture2024-04-07FZF for Rapid Application DevelopmentNUMA: non uniform memory accessMaking graphs from PyTorchChainsaw: Minhash and friendsTransformers Lectures (CS25)Enjoying the work2024-03-31ParallelismModel VisualizationTailscale2024-03-24Tidy First?Brendan Gregg’s Linux Crisis ToolsZettlr2024-03-17XDG-Desktop-PortalRevisiting CudaCompute Optimal Language ModelsVisualizing a modelLinker VisibilityOpen AI’s Transformer DebuggerBook: Linkers & LoadersBook: The Coming WaveBook: Stripe’s LetterCudaBuilding large models2024-03-10Emacs, Tramp & SSHAbstraction vs Illusion2024-03-03How to Write Shared Libraries – Ulrich Drepper2024-02-25GPT TokenizerTrends in Machine Learning2024-02-18Mechanical SympathyPortable Conda EnvironmentsExploring Observable2024-02-11HardwareCMake2024-02-04Revisiting Building Developer ToolsMechanical Sympathy2024-01-28Named PipesPlotlyMechanical SympathyA shell based notebookRWKV2024-01-21Mechanical SympathySimple BookmarkinglibSegFault.soMeta2024-01-14/proc tricksMarimoAre.naBooks2024-01-07To LearnTo BuildTo Write2023-12-31Orphism2023-12-24Advent of CodeGoogle SlidesBooks2023-12-17Advent of CodeFauvism2023-12-10Advent of CodeSpeeding upTransformersFauvismMeta2023-12-03TransformersDevToolsDocumentationAdvent of Code2023-11-26TransformersHy, Textual & UIsHy, Common Lisp and CompilationBlack Friday2023-11-19BashLanguagesTransformersTextualNext2023-11-12Bash signalsAutomatic profilingUser InterfacesReflecting on these letters2023-11-05BooksVideosPythonMisc.2023-10-29TalksBooksPythonZSHBuilding my own transformer2023-10-22Reading materialHy LanguageIntermediate Logging
Kunal. (Last updated: Sunday, 2024-07-07 21:50)