Working notes: reflecting on — and collecting — what I’ve been learning. Curated writing at expLog.
2024-09-01
A long delayed update: I’ve been trying to improve the site generation and structure for some time, but have been stuck in the unhappy zone of not reflecting on what I’ve been learning and thinking about every week while trying to make progress across several projects.
Zig
I’m beginning to like and dislike zig a bit: I enjoy working in it and having access to low level primitives; but the lack of batteries sometimes annoys me – particularly because I still have significantly more muscle memory in most of the other languages I would be using. I’ve also wasted unfortunate amounts of time trying to write code that feels like good Zig without actually making anything functional, which is the root of my frustration.
Interfaces
This is one of the places that feels somewhat strange: the default recommendation is to accept an object that then becomes the interface and relies on aggressively casting pointers, and I haven’t felt particularly comfortable with it yet.
I’ve been relying on passing around structs and relying on comptime to figure out what the struct does instead. To make error handling a little bit more explicit I’ve also been using a comptime function to assert that the fields between the structs I’m using match the template interface object, but I don’t have it quite perfect yet.
After spending a lot of time refactoring I still don’t have a particularly good or reusable interface for SQLite modules, though I’ve come closer over time.
Webservers
For TermDex, I’m looking for a way to quickly render different snapshots of my notes, tasks, and changes over time. Both for review and for quickly exploring my notes around a topic. Implementing a webserver in zig has been reasonably minimal, but I’ve been struggling with finding ways to have it exit cleanly, etc. – I think I’ll need to spend a lot more time reading different guides and incoroporating C libraries to get better here.
Particularly for using event loops appropriately.
Compiling
Given the dependencies I’m using for TermDex, I also need to figure out a way to statically compile everything and see the results. My current plan is to add git submodules to the libraries I need and then build and statically link them as I need to.
Simulations
The other thing that’s been stuck in my head is using simulations for multiple purposes:
mapping out my finances over the next several decades, at least to build a sense of where I’m going
simulating hardware appropriately to model LLM training for work
For the second, I’ve been reading SimGrid documentation but the setup is fairly involved and I’m not sure how far I’ll be able to get; on the other hand, I expect to learn a lot trying to make effective simulations that I can actually use for distributed systems. I’m a bit surprised it isn’t a more common approach to explore architecture, particularly for backends like AWS and Azure.
llm.c
Finally, I’ve been wanting to feel more comfortable with writing and observing transformers, so I’ve been picking up llm.c and trying to rewrite it in zig. There’s a lot here, but dealing with all the small parts by hand should make me much more comfortable understanding and working with LLMs.
2024-07-14
Long week with a lot of interesting programming and reading.
TermDex
I’ve started relying pretty heavily on TermDex for my day-to-day work, tracking ToDos and notes as I debug and work on different projects: having a quick way to look through my notes, figure out my priorities for the day in a way that I can customize by quickly editing a bash script has been working out pretty well.
At this point of time, I’m basically relying on the sqlite extension to do the heavy lifting of querying data in ways that I care about, with a single ~400 line bash script doing everything else: I even added a quick pomodoro extension that shows me past pomodoros, asks for the context & collection for this one, and then ticks down the time; relying on notify-send to create alerts. I still need to find an alternative to use in WSL for this.
While I still have work to do to make the extension parse markdown, I’m also thinking about better ways of organizing collections against time: in practice, and even in some productivity classes I’ve taken, there’s a strong need to abstract out work over time and complexity: allocate large projects to large areas of time (such as 6 months) and then break both down into something more manageable.
I plan to also start generating a large part of this newsletter automatically too: my weekly bookmarks, books, and any other things I’m reading and ready to share; tentatively I may do a Meta-only version of this blog as well to write about all the things I can’t discuss here.
Zed
I tried out Zed on my Linux laptop this week and was delighted by the speed: to the point I tried out putting Termdex inside zed, and then calling back to the editor whenever I wanted to touch a file. Unfortunately it doesn’t work on WSL yet, and the keyboard bindings to navigate windows were uncomfortable after being so used to Tmux and Emacs, so I’ve gone back to my standard setups.
I’m happy to see a serious contender for Vim, Emacs for the first time: keybindings worked smoothly, the editor was extremely snappy, LSP and newer paradigms just work – I hope this continues.
Bookmarks
Using a work journal: this showed up on HN, and there were several interesting comments on different work journals people were using. All the hackers were building their own versions and there were some impressive ones out there. I particularly liked journals that “stacked” contexts, allowing users to go in and out.
Designing for expert users: a podcast where a UX designer explores what it takes to help experts used to navigating CLIs quickly. Yaron Minsky also called out something I find confusing: the lack of composability of UIs (I wish we had more than copy/paste as a paradigm of sharing across UIs).
2024-07-07
TermDex
I spent most of this week hacking on termdex and learning new things from the process: I hope to combine all of that into a new setup to publish this website (and bring over my previous slipboxes) within the next few weeks.
I’ve started using it to collect notes, tasks, bookmarks, and anything else that catches my fancy: and to write this week’s letter, I have it open in a split window to see things I explored (trying out remote-pdb, python-manhole, etc.).
In time I’m excited to throw an LLM at this, and also to have it automatically figure out when I wrote the same note / explored the same resource last time. There’s also a lot of functionality to build out to make it easier to create notes with values prefilled: one of the pieces of functionality I really liked in FocalBoard – that was almost perfectly implemented but still had some issues.
Some of the new tricks I learned this week include:
Implementing a recursive sqlite query to show nested paths:
WITH RECURSIVE
split_path(path, remainder, moved) AS (
SELECT * FROM (SELECT "" as path, path AS remainder, 0 FROM markdown_files WHERE basename LIKE '%.md' LIMIT 2)
UNION
SELECT
path || substr(remainder, 1, instr(substr(remainder, 2), '/')) as path,
substr(remainder, instr(substr(remainder, 2), '/') + 1) as remainder,
moved + 1 as moved
FROM
split_path
WHERE
instr(substr(remainder, 2), '/') != 0
)
SELECT SUBSTR("............", 1, 2*moved) || remainder FROM split_path LIMIT 20
;
Tmux has a popup-window command that can be very convenient to make a surprisingly fancy fzf setup.
FZF has a reload command that lets you reload the contents without restarting FZF; but it needs setting FZF_DEFAULT_COMMAND to enable this – and it quickly becomes incredibly finicky. I find it much better to become( ... original command ...) instead.
I even ended up implementing a Pomodoro implementation to track my time; I can probably add something to visualize the pomodoros over time when I start the command.
Having an easy query interface over flat files is astonishingly valuable in building features quickly; I just wish there were more tools to construct queries live.
I plan to explore sqlite views to define custom views of tables that are local to the folder the extension is used in. This makes sense as a way to layer abstractions: termdex the application doesn’t know what front matter is specific to a given markdown folder, so I can define it in the folder itself and use it easily.
YAML parsing with C is not as hard as I’d expected; it emits tokens / events and that lends itself particularly well to deal with the expected inputs.
Hacking on TermDex is somewhat addictive; I find myself as addicted to this as Factorio or similar games; and it’s a little bit more satisfying. (Trigger warning: blasphemy – It’s ever so slightly useful compared to constantly growing a virtual factory).
CUPTI and LD_PRELOAD
The other interesting project this week was trying to listen for cuda context initialization using LD_PRELOAD and subscribing to CUPTI callbacks. I have a tiny implementation working, but need to see if I can get good cross programming language stack traces.
Stream Dye
While playing with PTYs to get unbuffered output (mainly for improving torchrun I tried implementing a quick program to tag stdout and stderr separately by running a subprocess with 2 pty pipes attached). This worked surprisingly well, though I think this interferes with the terminal size estimations – at least when done through basic python.
Having something that explicitly displays stderr / stdout separately is pretty valuable, so I plan to make this a tiny zig utility in the near future.
2024-06-30
Zig
I ended up doing a small Zig marathon and worked through all the Ziglings. This weekend I hope to put some of that into practice and continue improving the SQLite extension – making it much more generic and composable.
TermDex
While I continue hacking on termdex, I’ve been coming across additional useful resources: today I stumbled across marksman which looks exactly like the LSP I was planning to build after getting SQLite working.
I’ve regularly started using TermDex at work, for meeting notes, quick queries; all based off the existing Go implementations to create a new file, quickly visualize active files using fzf and query them with the Go extension. I also have a small hack to show related files quickly when I open it in $EDITOR (nvim) by using tail -n +1 on multiple files. (That’s a trick I use to print multiple files with the file names included, unlike cat).
Markdown in Emacs
I spent a little bit of time configuring emacs to write markdown more easily with my wiki setup:
markdown-mode, an old favourite. Though I don’t use most of the features.
visual-line-mode to make sure text wrapping is smooth and convenient.
Enabling vcs backends to make sure project detection works smoothly
yasnippet for snippets; I’ve never really regularly used it before but the markdown completions are fairly powerful and interesting
examples include: img<TAB> to generate a link to an image; similar support for link, h1…hn, lists, etc.
yas-insert-snippet is very satisfying to find existing snippets defined by yas-snippets.
Making sure company-mode is auto enabled every time eglot is.
Setting up eglot to use marksman. I’m still trying to figure out how to use this to generate an overview of all the documents available.
Using variable-pitch-mode and my old themes: poet-mode again. I’d forgotten just how nice they could look; that said, I think the ef- themes have a better implementation at the moment.
Emanote
Reading Marksman’s documentation also took me to Emanote: another interesting project that does what I’ve now implemented around 5 times or more; a fairly smoothly generated markdown site representing the documents in a folder. I expect I’ll keep hacking on termdex instead just to make sure it satisfies my needs precisely (including ease of hackability and composition).
Of course, thinking through this took me down another rabbit hole where I’d compare The Unix Philosophy against emacs: and some interesting follow up articles.
Performance Evaluations
It’s midpoint performance review season at Meta again; after more than a decade of performance reviews I generally find myself ignoring them. I have found it valuable to use the cadence of the reviews as a way to introspect outside of the formal evaluation projects and think through and write down how things are going, what I hope to achieve, and reflect on goals.
Windows Surface Pro
I really enjoy the fact that I can open my terminal / editor to full screen and basically have a magically powerful, thin terminal with an extremely long battery life anywhere I go these days. With a Bluetooth keyboard, the surface pro is slowly becoming my favorite machine.
The only shortcoming I’d like to figure out is how to use it easily on my lap when I don’t have a table handy.
2024-06-23
A brief hiatus from writing, and some new plans: I ended up skipping a couple of weeks because I’ve been busy hacking on TermDex, and really wanted to rearchitect these notes into a much more useful digital garden. I also have some new hardware that I’m enjoying and exploring.
TermDex
I ended up building a real extension with Go directly talking to sqlite without any third-party Go libraries; CGo made it possible to define the Go functions fairly simply and directly, and I left the template C-code as is while implementing just the pieces I needed in go.
This was fairly satisfying, but there were some initial tradeoffs I made that I didn’t quite like, so I’ll keep hacking on this: having to define the schema of the virtual table up front, which means I end up scraping all the files on disk up front and generating the table.
Instead I’d like to generate a vtable with JSONB blobs instead: which gives a lot more flexibility and lets me keep the table live without needing to re-create it; and leaves the option open to generate files on demand later. Reading the JSONB spec reminded me a little bit of MessagePack.
Microsoft Surface Pro
For the past several months I’ve been looking for a powerful ChromeBook that can run a linux virtual machine for me and I’ve generally been disappointed. Seeing the new Windows Copilot laptops was the impetus to give WSL2 and Windows a shot – using a Nuphy wireless keyboard gives me a beautiful terminal with an excellent screen and a snappy computer, with the ability to sketch/use Adobe Lightroom when I want.
I’m pretty happy with my decision so far, and Windows has been better than I expected. I’m having a little bit of trouble adjusting to the keyboard shortcuts from Windows, but they do work surprisingly well.
Ptys
A rabbit hole I fell down recently was trying to understand why torchrun wasn’t printing the logs I expected when the underlying program was segfaulting.
The basic reason was process output buffering: if I let the subprocesses print to stdout, they flush instantly and I see the segfault output, but if it’s tee’d then the output ends up being buffered by default (C stdlib behavior) and we end up losing logs. (There’s also space for much better / simpler implementations of log redirection in TorchRun).
The one way I can think of around this is to use a Pty to spawn subprocesses instead, which fools them into unbuffered output irrespective of the underlying program. The way I got here was looking at Expect, PExpect – and then finding that PExpect depends on PTYProcess. Python itself also has a much more minimal Pty module that isn’t as flexible.
The part I haven’t figured out yet is how to distinguish stderr and stdout coming to the subprocess – with that, I may send out some PRs to torchrun.
Zig, Go
Finally, I’m planning to play with Zig this weekend to experiment and explore: Go has been pretty great, and I’ll still be using it at work, but I’d love to have something a little less verbose with more explicit control over how things run and work.
Someday I hope I can write my own systems-programming lisp with minimal syntax, great compatibility, and ease of use / iteration.
Updating configurations
As I set up the new laptop, I’m also setting up configuration files in dot/2024 as I update each file. Claude has been amazing to achieve this: I had it rewrite my org mode file into a pure elisp .el file to start simplifying and flattening out my configurations.
2024-06-02
Spent the week learning about dataloaders, played with Go extensions and generally thinking about problems. I’d write more but I have a lot of work to catch up on.
TermDex
After publishing the letter last week I fell into a rabbit hole of implementing a sqlite extension that can help understand / organize / parse markdown documents – particularly using Claude and Chat GPT 4-o. Unfortunately the LLMs became very confused very quickly, constantly generating broken code and moving back and forth between the same issues. After a while I gave up and spent time learning to build this from scratch without needing to use LLMs.
CGo
CGo is surprisingly nice, and involves writing C code in comments that are simply parsed out and used while generating the extension.
Unix Philosophy
Reading the unix philosophy was extremely satisfying: a lot of the ideas and patterns described in the book have survived 30 years, and resonated deeply with me.
I saved some of my favourites in threads, but reproducing the quotes with some impressions:
the UNIX philosophy is an approach to developing operating systems and software that constantly looks to the future. It assumes that the world is ever changing. We’re not saying that we can predict the future. We can only acknowledge that, even if we know everything about the present, our knowledge is still incomplete.
I wish organizations in general were significantly more open about this, and understood their limitations in predicting the future, optimizing for the ability to move fast instead of perfect planning. This resonates all the way back to Boyd’s description of the OODA loop.
I described this a little bit while writing about developer tools: but the scripts engineers will put together to unblock themselves tend to accomplish 80-90% of what a tooling team will build for them with perhaps .1% of the effort and time involved. Realizing this has changed how I think about building tools, and I would much rather leverage the fact that when my customers are engineers I can truly force multiply them instead of limiting their opportunities.
it is better to let the user shoot himself in the foot than never let him run at all.
This can be painful to implement in practice, and I often have to push back on people locking down software for “safety”, but often throwing out the baby with the bathwater. But if the organization is functional then there should be enough trust to let people run with scissors when they need to.
The software that exists in the world today represents a great store of wealth.
Even more true today with the sheer amount of open source software powering most of the technology in the world today; open weights/source LLMs just add to the value and having these available is really valuable.
LLMs for the CLI
I’m still surprised at how little I see LLMs applied to CLIs and sys-adminy work at the moment: it’s all text, and the problems feel like they would naturally lend themselves to being evaluated by LLMs. Most recently, I want to play with something that can construct a command for me by reading the help text, man page, and sample commands.
I think I’ve written about this before but I’m also a little surprised at all the products replacing the UI with custom UIs – but I should just be able to ask the LLM to do things for me without needing to navigate yet another UI – I can imagine a huge market for building tiny application specific LLMs that just super-charge the “Help”; instead of learning to navigate the application, ask the LLM to do something for you, and it also shows you how. I could see this being very valuable for everything from Excel to Photoshop.
Then layer on magical capabilities (“edit my photo in the style of Salgado”) or domain specific knowledge.
2024-05-26
Back in New York again, at my favourite weekend cafe.
TermDex, Go and Claude
I’ve been slowly iterating on TermDex, particularly to try and get all my ToDos and investigations down on paper in a way I can easily query them. Every time I make progress I think of more things I could be doing with it instead – the trap with all my ToDo applications so far, and get somewhat derailed. Go’s ergonomics don’t quite gel with me yet, but I have to admit just how practical and applicable it is – it’s an easy language to set up and get running with quickly, with a very rich ecosystem.
I’ve also not found myself particularly productive with Go recently – there are some reminders of Java that simply make me uneasy. After thinking about it for a bit, I started using Claude to generate Go code for me and it helped walk past a lot of minutiae I didn’t particularly care about and gives me some encouragement that I can make things work.
Today I’ll try and build out an old idea quickly: make a sqlite extension that can index markdown files based on frontmatter (yaml, toml or json – inspired by Hugo) and then return the results quickly; making it very easy to generate appropriate queries and views for easy visualization.
This can be quickly combined with fzf and other mechanisms for editing. Writing this out made me realize I wanted to spend more time understanding the Unix Philosophy and seeing how well it’s aged over the years – it’s a system of design that leverages the operating system it’s surrounded by and seems to work particularly well for extensible design.
Applying Transformers to knowledge work
Another new experiment I would like to start is to use LLaMa and other models to all my emails, chat, and other daily minutiae so I can stay on top of things more easily: a large part of my day job involves remembering and contextualizing what’s going on, and then intervening where I can best help out: the amount of context to maintain day to day is getting a little bit overwhelming, and interferes when I need to go deep into a specific problem.
If I can figure out ways to extract all my data into easily indexed text, and then apply a reasonably dumb model to query and aggregate it for context I suspect I’ll be significantly more effective; over time if tools like this were built into work communications things like status reports and updates should become trivially cheap to accomplish without needing a lot of coordination or busy work to accomplish (if applied at larger scales).
Speculation
Inspite of being so close to LLMs and helping out with building them, I’m nowhere close to internalizing all that’s possible with their applications yet: realized this yet again while working on TermDex with Claude; I can afford to be significantly more ambitious as these tools open up and work through a lot of projects that seemed to big to tackle before this. A small part of me is scared at the quality of the outputs – as the projects get too big for me to reason about personally, at some point I’ll have to trust the AI to do things right; there are some uncomfortable parallels to managing other people that start developing as we extend these.
The downstream economic and technological consequences are pretty hard to predict; with just the technology that exists today there’s a clear sweet spot of productivity possible that I don’t see being applied particularly well at the moment which is surprising. Perhaps the unix philosophy lends itself particularly well to this because we can pull out small pieces of the problem that the AI can then build for us – good design generalizes really well into the future.
Thinking about the letter
While I’ve been having fun writing this letter every week, having to self-censor based on what I can talk about publicly and what I can’t has been getting annoying. The practice of reflecting on everything I’ve learned recently is clearly extremely valuable, but I’d like to revisit the mechanics a little bit to make this more useful and thorough. I’m hoping I can use TermDex to achieve this more easily.
2024-05-19
An extra unexpected week in San Francisco. I accidentally walked past the tail end of Bay-to-Breakers on the way to getting coffee (right before writing this week’s letter) and was thoroughly confused and amused. Seeing several costumed people running past made up for missing out on the Dance Parade in NYC on Saturday.
TermDex
I’ve been using this a lot, and see a lot of potential, but haven’t quite made up the time/energy to actually implement features I’m really looking forward to. Having extremely simple query functionality implemented using a mix of bash scripts and fzf has taken me surprisingly far. As much as I appreciate POSIX, I guess I never really internalized the extremely minimal api it exposed for different programs to connect – and I still find myself surprised just how much is accomplished through that minimal api.
I’m also working to configure nvim to be a good markdown editor and building a surprisingly pleasant/effective editing experience where I can navigate through text/ideas/notes quickly. THis finally feels like a flexible enough alternative that lets me keep flat files, easily swap between tools and still get all the benefits of Luhmann’s methods & Notion & Index Cards and all the other tools I’ve used to try and keep my head in order.
With LLMs being able to easily consume text, fancier CLIs where things just work should have been much more common than they seem to be today, and is something I expect to start noodling around with soon.
Hierarchies, nested notes
Something I haven’t figured out the ergonomics for is having a hierarchy of notes easily: the best cheap alternative I tend to have is to use a custom sheet where I mix in indentation by basically converting the first several columns into thinner indents. So I can show a nested hierarchy by simply starting from a different column and rely on how the UI simply overflows cell text to get nesting in a way that I can easily move rows around.
I’d like to actually build this out as a UI for easy modification and managing relationships between notes.
Information Theory / Chainsaw / TF-IDF
Feeling a little bit lost while playing with TF-IDF and realizing that I was getting extremely broken results because of bugs in implementation but still something that seemed valuable, I wanted to start levelling up in Math a little bit. Based on an answer from ChatGPT-4o I’ve started reading Elements of Information Theory and generally enjoying the book.
Even revisiting the minimal definitions of entropy (sum of -p . log p) and mutual information between distributions (sum of -p . log p / log q ) I think I can take another stab at finding outlier logs by looking for logfiles that have the most different distribution from the distribution over the norm. I don’t need real outliers, I just need the ones with the most distance.
Tokenization is of course critical: I’ve worked around it for now by simply normalizing a lot of strings: smashing numbers down to a single 0, smashing punctution into a single _, etc. THis obviously reduces a lot of nuance in the logs but does let me find logs with different stack traces much faster. I think I’ll take the approach of mutual information to find logs that I should look at, and then look at cosine similarity & clustering (something else I spent time learning) to figure out batches of logs and hosts to work with.
Go
Writing go continues to be pleasant: I definitely miss the rich python ecosystem (and have now started wondering if I could just reuse Python libraries in Go) – there are so many good libraries apparently made for data mining.
Speculation
As a reminder: none of the notes here represent those of my employer, nor do they include confidential information (not that I expect anyone to ever read this site either). With that out of the way, I’ve tried to think of what happens if I extrapolate into the future while thinking through what transformers can accomplish without needing them to make tremendous jump in capability (sitting in a cafe in SF seems like the right time to explore these ideas):
Can my phone simply be a text/voice interface that reconfigures itself according to what I ask of it? Do we need servers to have any apis any more – or if we give a Transformer a structured tool for reading data, and ways to be efficient, can it simply have the behaviors we expect?
To make this a bit more concrete: instead of thinking through text/email/phone or other mechanism can I contact someone through an AI?
Do the AIs become large behemoths served remotely, with access to everything, or do we end up in a world with a lot of small AIs at an individual level coordinating with each other? What protocols do they apply when they talk to each other?
How much of media, art, knowledge can be customized to the consumer instead of the producer? In the near future, just thinking of smarter games and roguelikes that are much richer because all the NPCs remember context and can react to your behaviour much more explicitly. I have to imagine the creators of Dwarf Fortress are already thinking about the consequences of this.
Most of these seem feasible with current technology with a lot of engineering applied to make things cheaper to deploy, faster, and better integrated – I’d expect to see a lot of these kinds of applications to pop up within a decade if not sooner (a decade seems extremely conservative if I’m honest).
2024-05-12
Partially in California this week, and happened to be in town for the LLaMa hackathon. I’m very interested in seeing what people make, and writing this from the event.
LLaMa3 Hackathon
There were several interesting projects, and it was fun to see so many people use something I’d helped out with. The project that fascinated me most was one that basically undid LLaMa’s fine tuning by adjusting weights: I have to wonder what else we can achieve with their mechanism, and what the actual process was (waiting to see the repository).
At the same time, there wasn’t anything that did something completely out of left field: a lot of practical application of LLMs today beyond an omniscient and occasionally loopy chat bot seem to be a bit farther from the maturity level I would hope for. That doesn’t mean we can’t use them, we’ll need to be significantly more creative in how and where we apply them.
The finetuned LLM I’d love to build is a CLI helper: everything is text anyways and it has amazing amounts of context available; I’d like it to quickly complete my commands or let me simply ask for things and to translate them into explicit machine instructions.
Go
ChainSaw
Started playing with implementing TF-IDF to identify outlier logs in HPC jobs. The idea seems so obvious I’m sure someone has to have implemented it, but perhaps I’m just missing something obvious. Some ideas seem obvious enough I can’t imagine it hasn’t been done yet.
Asking Claude for recommendations on papers and approaches generally lead to hallucination, but I did find that one of the papers was real (just from a different year and with different authors): RNN Attention Mechanisms for System Log Anomaly Detection that also has a healthy number of downstream references, including literature surveys.
Go is being surprisingly pleasant to work with, though I still don’t quite know how to write idiomatic go. The performance is a boon after getting used to Python.
TermDex
I’ve also been making progress on my index card application, leaning into FZF and bash scripting to fill in gaps that I’ll actually implement with Go later. It’s interesting just how much flexibility and speed is available using Bash scripting: and at some point, the complexity just goes up to make the script unmaintainable. Bash’s inherent global state matches my intuitions around debuggability (that I’ve tweeted about in the past), but I still don’t quite know how to demonstrate this feeling better.
Another idea that’s started wondering through my head is the implementation of a modern shell scripting language: something POSIXy but not quite; perhaps an extension of SkyLark that is just as good at handling stderr/stdout/nesting commands but has significantly cleaner language semantics and more modern lexical scoping and language constructs? I’m surprised this doesn’t seem to exist yet.
Python’s Memory View
Through bitter experience I also learned about the cost of constantly appending to a bytestring in Python when I profiled my code and realized it was only allocating memory (Py-Spy is a gift!). Which lead me to a rabbit hole to avoid copying bytes on slicing, and for easier manipulation: I ended up testing out ByteArrays and memoryviews which are useful tools to have: they helped me turn a 30 minute long script to something that ran in seconds.
2024-05-05
Tempus Fugit.
I don’t really remember where April went; I still think it was March just yesterday.
Multiprocessing Queues
Spent most of Saturday debugging with several team-mates to realize that Multiprocessing queues will create their own private thread that copies values into a buffer, pickles them and sends them over a pipe to the other process. The part that can bite you is if you mutate the put object before it gets pickled.
I wrote up a small Thread as a teaser and also put together a demo gist to show the issue. The one reply I did get on Threads misattributed the problem to parallelism with the subprocess getting a chance to run: to make it more explicit I’ve updated the gist to only run the subprocess after the first one is finished. The race is between the main thread in the parent process sending values and mutating them against the Queue’s inner thread.
Another coworker asked why my inital repro’s only had 1 or 11 in the values (ie no mutations or all mutations): I can only attribute this to the points at which Python lets threads interleave I guess; adding a sleep(0) to the mutation lets me see a wider range of scenarios.
Working through this also reminded me of the value of doing debugging by understanding instead of debugging by hit-and-trial; even if the cost of understanding seems significantly higher over time debugging by hit-and-trial ends up getting nowhere.
A refreshed .tmuxrc
I started refreshing my dotfiles with my tmux configuration; I’m doing them piecewise (because the software to edit these configurations is also affected by them) and making sure they work well for me. There are a couple of new things I’m applying:
conditional code execution: I rely on using 2 tmux instances, one running locally on my laptop with the prefix C-b and one running remotely on any devservers with the prefix M-b. That allows my keybindings to be more or less symmetrical and I can easily leverage all of Tmux’s fancy features without having to think about it. I used to achieve this with subtly different configurations on both ends, but I found out that with if-shell I can run configurations conditionally, and the variable $SSH_CLIENT is only set in an SSH session (thanks to Claude).
I explicitly asked Claude to review my tmux rc and suggest improvements: while ther was nothing permanent I did find out that tmux allows for synchronized-panes (enable with :setw synchronized-pane). This let me easily manipulate a live HPC job with 2 hosts – both commands were automatically mirrored across the hosts.
I did find myself reaching for tricks like vi $(ls -t | head -n 1) to edit the latest file across both hosts because filenames would often be different.
Autocomplete anything on screen
Another thread I posted earlier in the week involved a new ZSH, Tmux + FZF trick I finally managed to put together (again, leaning on Claude to parse man pages for me). I put out a thread and gist about it and recorded a video for co-workers, but an annotated version of the script:
# Grabs the contents of all the panes in the current window for easy processing.
function _print_all_panes() {
# List all visible panes, changing the output format to only show the pane-id (used in the next set of commands)
for pane_id in $(tmux list-panes -F '#{pane_id}'); do
# tmux capture-pane: starting from the first visible line (`-S 0`) to the end (`-E`). `-t` identifies which
# pane to capture, while `-p` redirects output to stdout and `-J` makes sure wrapped lines show up as joined.
# This is piped to `tr` to replace spaces with new lines -- giving me one word per line. The sort & grep get
# rid of pure collections of symbols, only giving me words and numbers to complete on.
#
# TODO: Explore additional tokenization strategies, to allow breaking up paths/into/components.
# TODO: Remove duplicated output across panes
tmux capture-pane -p -J -S 0 -E - -t "$pane_id" | tr ' ' '\n' | sort -u | rg '[a-zA-Z0-9]+'
done
}
# The actual auto-complete function
_tmux_pane_words() {
# `LBUFFER`, `RBUFFER` and `CURSOR` are magical environment variables from `zle` with the contents of the entered text
# left and right of the cursor, with cursor marking the actual position.
# Grab any half completed word in the LBUFFER (removing a greedy match that ends with a space)
local current_word="${LBUFFER##* }"
# Get rid of the half completed word in the rbuffer if any, greedly removing non space characters
# I had to spend non trivial amounts of time reading zsh regex matching to get the behavior I expected.
local new_rbuffer="${RBUFFER/#[^ ]##/}"
# Build the prompt for fzf, using the ␣ as a way to mark insertion point for the completion
local prompt="${LBUFFER% *} ␣ $new_rbuffer "
# Tokenize and print the pane contents and generate an fzf window with the half-completed word from the LBUFFER as the content
# `--layout=reverse` because I don't like needing my eyes to jump to the new cursor position when fzf pops up
# `--no-sort` because we already did it, with the caveat of needing to de-dupe across panes
# `--print-query` for the case when we can't find a good match; this prints the query first and any selections after
# If the user doesn't select anything, rely on the fact that the query was filled in to choose the completion; that's why the `tail -n1`
local selected_word=$(_print_all_panes | fzf --query="$current_word" --prompt="$prompt" --height=20 --layout=reverse --no-sort --print-query | tail -n1)
# Build the new lbuffer with the completion; doing the opposite of the original aline
local new_lbuffer="${LBUFFER% *} $selected_word"
BUFFER="$new_lbuffer$new_rbuffer"
# Reposition the cursor to the end of the completion
CURSOR="${#${new_lbuffer}}"
# Ask the zsh line editor to redraw the line with the new contents`
zle redisplay
}
# Register the completion mechanism, I went with `Ctrl-U`.`
zle -N _tmux_pane_words
bindkey '^U' _tmux_pane_words
Stanford Lectures
This week’s lecture was a little more abstract but had some interesting ramifications and applications for being able to build small and focused LLMs. The main paper. There’s also emphasis on the importance of finding the right starting values.
More go hacking
I’ve started working on building some CLI programs with Go (yet another time/notes/calendar/notion-equivalent) management app; but with the excellent TCell library and surprisingly powerful terminals available these days I’m much more bullish about good CLIs. The big hidden bonus is that I can shell out to Vim or Emacs for actually editing notes, while leaving the actual management to the app itself, which is an excellent bonus and partially inspired by how easy FZF makes it. I’ve picked up The Power of Go Tools to help me write idiomatic Go with the right approaches faster.
FZF is also the reason why I’ve been so impressed with go recently: I’ve begun to realize that languages end up making programs have a certain taste for the lack of a better word; some characteristics stand out: Python programs have very distinctive CLIs, slightly noticeable sluggishness; Javascript tends to be a bit faster and the CLIs tend to be very colorful. Rust is colorful, but generally characterized by being very fast. The most used CLIs tend to be C or similar languages. And finally some Go programs tend to be surprisingly useful: fzf, gotty, etc. Of course it’s not perfect (until a few seconds ago I thought jq was also written in Go).
The prevalence of closures and function objects in Go has been the most surprising (and pleasant) departure from my previous assumptions about Go so far; they make programming significantly more ergonomic – though there are also some factory patterns I don’t think I’m going to enjoy (such as using a function to manipulate structures to set default arguments).
Anyways, I’m calling this new project termdex for terminal index cards. More updates next week!
2024-04-28
This was a long with a lot of overlapping oncalls; I’ll be glad to take a break
sometime next week.
At the same time, I was able to learn some new things.
Stanford Lecture
The lecture on MoE this week was fascinating, well delivered and cleared up a
bunch of misconceptions I had about what MoE meant and how they functioned. Some
of the talks’ slides have been uploaded at the
CS25 website, but there are still
several to go.
Things I remember
Experts are not trained on specific topics
As an experiment on mixtral, someone zero’d out each expert one at a time.
Expert 3 had the most effect, and that’s not been explained yet.
As a code convention, he suffixed all variables with their dimensions, a practice
I’d like to adopt as well.
Transformers
Talked to an old friend after a very long time: he’s clearly been doing much more
advanced work than I have, and pointed me to several interesting ideas to explore
I have a lot of math and infrastructure to learn. I’m thinking of playing with a simple
transformer and seeing if I can get it to encode/decode some patterns like look & say,
and if I can use that to build some intuition about QKV. It should be an interesting
exercise.
Go
I finally wrote a small program in Go, and so far have been finding the language surprisingly
ergonomic and friendly; particularly with Go routines. I’m planning to build several
log parsing tools with Go (and possibly TCell).
I’ll need to find a good modern book on Go before I shoot myself in the foot with assumptions
about the behavior of the language though.
Zookeeper
I also spent a lot of time learning zookeeper semantics: the
original paper
was excellent and finally made things click. I could solve the problems I wanted by simply
relying on watches, which Kazoo
makes even easier.
Djikstra’s notes
Partially read notes
that floated by on Hacker News, this entry is a reminder to go back and read the rest.
I’ve been helping out with infrastructure and tools for training LLaMa3 at Meta:
I’m very happy to be able to help because I think having something of LLaMa’s
quality easily available for hacking is one of the things that will shape how
LLMs are generally applied and used, and contributing to that is very satisfying.
I’m even in the model card,
along with some very well known people.
At the same time, I could use ellama, ollama and LLaMa 8b to have my very own
local LLM – which has been fairly helpful. Ellama’s prompts around reviewing
highlighted code, implementing functions, etc. is exactly what I’d dreamt of a
long time ago and hadn’t expected to be true so soon. The UX is still a bit rough
and generating tokens on my laptop CPU is slow, but I expect that to constantly,
inexorably improve the way things have been going.
I’m not thinking about finetuning / distilling a LLaMa model down to something that
can translate CLI commands on my behalf; eg. “extract this tarfile”. I think it should
be very doable – and maybe a good excuse to learn torchtune – but I need more
time and energy.
Python & Emacs
As part of consolidating my .emacs I’ve been cleaning up my Python setup as well.
I rebuilt and moved to the latest on Emacs’s master branch – the fact that I can
smoothly run on Emacs master always amazes me – and set up Jedi, Ruff (using LSPs),
while relying on some existing snippets for devdocs.io and Ellama integration.
All of this means I get some very cool auto completion, almost instant error & syntax
checking and warnings with minimal setup or dependency on the repository I’m editing.
I still have some trouble with both Auto-complete mode and Company mode turning on
and both trying to complete what I’m typing; I’ll dig in some more and start publishing
my configurations:
Penzai
JAX released some very interesting tools, including a visualization tool that
is almost exactly like what I was hoping to see with PyTorch. This also makes it
much easier to explain – though I think i’d probably go with a little bit more
whitespace in the UI if I was designing it – and seems pretty powerful.
I need to find the time to hack on this and actually make an interactive UI or CLI
around it. And a top or below like interface to TensorBoard.
Wax, and languages
Continuing the theme of looking for lisp-like homoiconic languages that compiled
down to C, I ran into some reddit posts and links – particularly this
list of lisp-like languages.
There are several interesting ideas in there, but some day I’d like to implement
my own, potentially working backwards from the C grammar to make sure everything
can easily and cleanly be expressed, and then layering on language sugar on top of
that.
As mechanisms for procrastination go, inventing the language to program in before
actually getting around to programming seems unfortunately too far up my alley. I’ll
save this particular project for some slow months.
Stanford Lectures on Transformers
More rough notes from the lectures.
Nathan Lambert, Allen Institute for AI
History
1948 Claude Shannon models english in 1948
Auto regressive loss function
2017 Attention in 2017
2018 Elmo, bert, gpt1
2019 GPT2, scaling laws, safety discussions
2020 GPT 3
2021 Stochastic Parrot
2022
RLHF is necessary but not sufficient for ChatGPT
hf.co/collections/natolambert
aligning open models
models trained with some preference learning technique
alignment of models
IFT instruction fine tuning – follow instructinos via autoregressive lm loss
SFT supervised fine tuning – task specific capabilities
Alignment – train a model to mirror user desires, any loss
Reinforcement learning from human feedback – train models from human data
preference fine tuning* – labelled preference data to fine tune an lm
PPO
DPO
Chapter 0:
craziness till llama dropped
Alpaca
first open instruction tuned models
self instruct, synthetic data
use a model to generate new instruction data to fine tuning a model
Vicuna after alpaca
llm as a judge
Koala (2023) berkeley
Dolly
Lora methods don’t work with RL
had limitations in applications
Uncensored models
no filtering
never censored to begin with
lots of transition models: didn’t change the narrative
chatbot arena: defining strategy
evals
chatbot arena
RLHF
optimize reward inspired by human preferences
penalty policy
increase reward, but constrain model to not go too far
Starling models
Modern ecosystem
PPO vs DPO
AllenAI also sees PPO to be stronger
Meta added today
Way more types of models
genstruct: rephrasing text into instructions
AI2: reproducible open source models
Llama3 scaling than alignment
Current directions
model merging is super accessible
Personalized lms
local llm
Don’t bet against progress continuing
2024-04-14
A somewhat busy week.
Cleaning up dotfiles
I’m finally getting started cleaning up my dotfiles: there are a couple of
decisions I’ve made and will be implementing as I go:
Consolidating on $XDG_CONFIG_DIR, with all the configs arranged in .config
instead of being spread out throughout my home directory. I need to check
how well that works in Emacs.
Moving away from an org-mode based emacs config: I find the cost of having to
Ctrl-' into elisp much more annoying than having a cleaner org file document.
If I’m particularly honest with myself, my emacs configuration is nowhere near
dense enough to need literate programming.
Use a git workspace to organize the dotfiles instead of symlinking them; the
Arch Wiki delivers on making this
true work really well. I wasn’t excited by the idea of using stow or more
scripts to pull this off.
Lean into fzf and make it generally available. In the some token, make it very
easy to bootstrap the configuration on a new machine. I find myself hopping
between devservers very frequently these days, and spending an hour or two
configuring and installing is not something I really enjoy anymore.
I mitigate this a lot by relying on my local emacs sshing into a machine for
most things, but the emacs terminal isn’t that good.
Recording my .zsh_history more intentionally: so far I have a 13 million line
zsh history (with duplicates) that I don’t use these days, and a more reasonable
63k line history that I do at work. My personal history is only 4k lines. JM
strongly recommends using version control: I generally like that approach,
want to spend a lot of time maintaining a repo.
zsh-histdb seems pretty
promising, with the added bonus that it’s mainly written in shell too. The
problem is that I’m pretty paranoid about this, and want something even more
minimal and explicit. Potentially my first real Zig project?
My other pet peeve is that my shell sessions don’t share the command history
live, and I’m not quite sure what the safest way to achieve this is.
Zig, Rust, systems programming & Lisps
Refreshing my Rust knowledge by skimming Rust in Action:
I actually seem to remember a lot, but I’ll find out when I actually build the
thing I’m reading this book for.
I also started playing with Zig: what I really want is a small language with
minimal syntax that is still very expressive. Zig seems nice and compact and
enjoyable, so I’ll spend some time hacking with it.
I’m not sure why all (most?) lisps I’ve seen follow the Scheme / Common Lisp
standards: the homoiconic syntax is decoupled from the other semantics. A small
part of me wants to implement my own variant of C (at the very least, compatible
with C) that has the syntax of a lisp. Probably emit LLVM IR and have hygienic
macros. Then attach batteries by making it trivial to interop with C and Python
and other languages.
wiki.c2.com
I frequently find myself hitting the wiki and reading through different discussions.
Both the style, speed of loading, minimalism, and depth of the discussion make this
particularly enjoyable.
Stanford Transformers Lecture
(Very rough notes from stanford lectures)
Why do LLMS work so well?
Open API
Manually inspect data
Predict probabilities of the next word
Basically massively multi task learning
small vs large model – what they learn (?)
while overall loss improves
individual tasks can improve suddenly while others are saturated
some tasks may move correctly, some may show emergent behavior
can’t predict what a large model will do from smaller models
Inverse scaling/U-shaped scaling
Repeat a string
All can do
Fix a quote repeating the same task
xs – 0
s – y
l – y
Plot scaling curves
Future of AI
10x compute every 5 years for same dollars
Scaling is what contributes to sota research
Bitter lesson: developing progressively more general methods & scale up
dominant driving force
it’s easier to get into AI
encoder - decoder, encoder only, decoder only (least structure)
2024-04-07
All over the place this week.
FZF for Rapid Application Development
A colleague (JL) built a very useful CLI tool that relied on stacking fzf
invocations: I’m slowly realizing just how powerful fzf can be for building CL I
applications quickly and painlessly; particularly for CLI applications that can
compose quickly.
This blog shows off a little bit
of what’s possible with fzf. The ability to build a UI simply specifying commands
that can be run is fairly amazing.
I’m also wondering if it’s possible to use fzf to build a command creator, particulalry
for dense launch commands like torchx and srun (from slurm). fzf can show
contextual autocomplete derived from the location in the command, and that’s something
that could potentially be generalized by reading the --help and man outputs of
different commands.
This may also be an interesting application of LLMs, to convert the --help output (or
man page) into something that can be used for terminal autocomplete easily.
I only wish that zsh autocomplete was a little bit easier to hook into; I almost
find myself wishing for a shell LSP that was easier to hook into.
NUMA: non uniform memory access
Ran into something fairly confusing: there seemed to be a lot of memory available but
the host started allocating swap instead. A different helpful & knowledgeable colleague
talked through how the kernel chooses which memory pages to reclaim.
The part I had been completely new to was NUMA: sometimes servers can have enough memory
that depending on the core, some memory may be closer or farther.
Intel’s docs
talk about this some. This was one of the things that nerd sniped me this week.
What I’m really looking forward to is to be able outline the tree of modules in
PyTorch (as nested objects), map them to the actual operations and Cuda kernels.
After spending some time exploring torch dispatch and even connecting it to my
unpublished implementation of Intermediate Logging I’m now exploring
Kineto Traces and meta tensors
to see what’s possible. I could potentially use torch dispatch to track how tensor
values depend on each other, and have different variations of the execution
graph much more visible.
Overlaying network activity on top of that, along with sizes / bandwidths used / memory
and flops consumed and I probably have a replacement for trying to write an article
on Mechanical Sympathy: because the model will just be observable.
Chainsaw: Minhash and friends
The other idea that’s been stuck in my head for far too long is that in most HPC you
have a lot of identical things going on on several machines (and sometimes several times
within the same machine) at the same time. When things go wrong, it’s generally one
machine/rank that’s misbehaving: and finding that one tends to become tricky quickly.
Identifying the outlier from logs and other sources is the thing I’ve become pretty
interested in after seeing it applied several times consistently – after looking for
several algorithms I finally came across minhash. I need to actually test this on
real logs to see if there’s any promise in this approach: datasketch
looks very promising to at least prototype quickly.
Unfortunately, this also introduces another book into my overflowing queue: MMDS
or Mining of Massive Datasets by Jeff Ullman et al.
Transformers Lectures (CS25)
Learned from Threads: Luokai that
Stanford’s CS25 course will be publicly available,
streamed on Thursdays from 4.30 - 5.50pm PDT.
I’ve blocked off the time on my calendar, and hope to watch all of these courses.
Enjoying the work
I went for a concert I’ve always hoped to: Satch/Vai at the Beacon Theater. Joe ended
the show by talking about how he and Steve had decided they wanted to play the electric
guitar for as long as they could: and then stuck to the plan.
I enjoy programming, and hope to keep going as long as I can too.
2024-03-31
Spent the week reading more about LLMs, different types of model
parallelism (again). I re-read, and forget what the different types of
model parallelism were; I suspect I’ll only be able to reason about
these properly once I’ve manually implemented a model myself.
Parallelism
Anyways,
HuggingFace
has some excellent documentation. Writing out what I understand by
hand again just to try and remember it a little bit longer:
Data parallelism: duplicate the model across GPUs, and split each
batch into number of data parallel groups. There are different
types of data parallelism depending on how the GPUs
synchronize. (Distributed data parallelism: run forward/backward on
each gpu with different data, and then average the results.)
Naive Model parallelism: split models by layers, where each layer
forwards results and then gets them collected backwards as values
propagate. - Pipeline parallelism: Naive Model parallelism, but the
calculations are pipelined to avoid gpus sitting idle for some time.
Tensor parallelism: split tensors against the layers instead; Tensor
Parallelism can be transparent to the rest of the modules with GPUs
offloading work to others. The main requirement is that GPU
interconnect should be really fast.
Of course, I need to spend some time to figure out actual values for
fast, model size, and how much each type of operation costs.
Model Visualization
Wished into the
ether for
easier model visualization again.
There are several attempts at this: most of which render with graphviz
and are not particularly interactive or useful. Sometimes I wonder if
I should hire someone with excellent Javascript / 3.js / canvas skills
and just get something built.
Tailscale
Set up tailscale on my personal laptops recently: after fixing up my
personal laptop to stay connected after connecting to the internet I’m
updating this week’s letter remotely.
The SSH web client is pretty amazing: I’m finding myself able to use
all the keyboard shortcuts I could have hoped to without chrome
intercepting them; if I have any complaints it’s only that my color
scheme seems a bit messed up.
Given easy access to my personal laptop through a web cli (and
potentially opening up more services through it) I’ll end up spending
a lot more time using/building CLI based applications.
2024-03-24
A long, somewhat jet-lagged week; I did wrap up some loose ends happily.
Tidy First? is a short-yet-deep book by Kent on choosing the right time to
refactor software. I’ve found Kent’s books extremely valuable for building
taste, and thinking through second- and third- order consequences of
decisions.
The most fascinating part of the book was the discussion on the value of
optionality in terms of thinking about the tradeoffs for cleaning up now vs
later, and of shipping features now or later. I use a variant of this argument
to encourage developer experience engineers to ship now instead of later; the
knock on effects of force-multiplying so many others almost always makes the
tradeoff valuable.
The bit that was surprising – and yet, resonates perfectly – was
Constantine’s Equivalence, cost(software) ~= cost(change) ~= coupling over
the lifetime of the project. The cost to get started is almost negligible in the
grand scheme of things; and the cost of change is influenced by the design:
design for flexibility and the projects hums along smoothly, adapting to the
world.
Sometimes I have invested in extremely strongly coupled software with the
intention of keeping it short lived (prototypes, particularly) – which does
satisfy the math modeled here as well.
I re-started reading Kill it With Fire and started How to Make Things
Faster, both of which have a host of strategies to drive valuable technical
change in a business environment.
The list of tools includes several I haven’t yet used, and should plan to spend
some time with so I can use them in
anger when I need to.
Reading the HN discussion pointed to a discussion on strace (which I love to
use) and the cost of running strace (something I hadn’t internalized) –
discussed here.
There are several important details on strace in the article, not least of which
include:
On the bottom of San Francisco bay are several thousand unused straces,
which were intended for Y2K issues that never arose, and so were scuttled.
Zettlr
I’ve been trying to find the write way to set up for a writing project related
to Mechanical Sympathy; after playing with different tools I found Zettlr to be
remarkably pleasant for the project itself.
Highlights included a minimal interface, Zettelkasten style links and VIM bindings,
making it the perfect editor. The interface was also fairly snappy, and the Bordeaux
is beautifully elegant.
2024-03-17
I had a week long vacation, and spent time exploring. These letters were originally published
separately, but I ultimately decided to club them together into a single page just to maintain consistency.
XDG-Desktop-Portal
I spent a large part of today debugging my laptop setup, and learning more about
Flatpak and
xdg-desktop-portal
than I would have liked to. The short of it was that I couldn’t get file open dialogs
to work in Chrome – and I couldn’t get configuration based fallbacks to work correctly
by updating .config/xdg-desktop-portal/portals.conf. The solution ended up being
directly modifying the configurations at /usr/share/xdg-desktop-portal/portals/gtk.portal
and adding sway to it.
Working through this always makes me wonder if it’s worth the time to use a Linux laptop,
but in the end I’d rather deepen my knowledge of Linux instead of wrestling MacOS or
Windows. I wouldn’t mind more powerful ChromeOS laptops though.
[Edit: 2024-03-24] This has still been plaguing me; there seems to be some bug that doesn’t
manifest immediately after a restart, but probably after putting the laptop to sleep and
restarting.
Revisiting Cuda
Spending some time reading about Cuda and multiprocessing today; I inevitably forget what
an SM or a Warp is, just because I don’t get enough of a chance to use them daily. So,
some definitions:
SM = Streaming Multiprocessor; a collection of cores in the GPU.
Warp = collection of threads executing simultaneously; generally assuming similar behavior.
MoE models don’t scale the same, and must consider the number of components
The appendix is pretty interesting, and includes the data mix
Visualizing a model
I really want be able to easily look at a full model’s definition without needing
to read and hand annotate code: the modules, dimensions passed in and handled, etc.
Intermediate Logging got close to it with the way it worked, but I still want to
play with more sophisticated visualizations.
#pragma GCC visibility push(hidden) / #pragma GCC visibility pop for having sections of visibility
-fvisibility=[default|internal|hidden|protected] at compile time
__attribute__ ((visibility("default"))) per symbol in code
I’m exploring this because I’m curious if I can combine native python extensions that use different versions of the same library in the same process.
Scoping
ChatGPT pointed me to man dlopen to read more about library linking.
RTLD_GLOBAL, RTLD_LOCAL to set up how symbols are resolved going forward.
RTLD_DEEPBIND sets local scope ahead of global scope.
namespaces: within a namespace, dependent shared objects are implicitly loaded. These allow more flexibility than RTLD_LOCAL, with up to 16 namespaces allowed.
First pass at playing with the transformer debugger / reading through the repo
Collects details on activation at inference time, and then provides useful visualizations and analysis on the results. Includes models summarizing debugging data.
Reading through a description of file paging and different formats for shared libraries and binaries, including a.out, elf, etc. I’m not quite sure how to make the most of this book – at some point I’ll probably want to try and implement my own linker for a single format/architecture as an exercise.
Received the book as a gift today, and I started skimming through the book (and still need to re-read it slowly). The most interesting chapters came towards the end and recommends a very careful path through to the future balancing several different approaches to control the effects of AI.
I’m not quite sure where I stand with the book, but I’m looking forward to going through it again to see how AI is expected to affect the future; all the changes so far have been good but not that large.
Stripe has always been an interesting company, and they talked a little bit about reliability.
Releases (the numbers are interesting, but not very satisfying)
400 releases a day, or a release every 4 minutes.
6 billion test runs a day, using 500,000 CPUs which block the release.
This suggests 15 million tests run every 5 minutes per release? I’m not sure good integration tests run that fast.
Tested against mock production to validate/canary, and then incrementally rolled out from 1 machine to 20%.
Tested against 55,000 metrics for anomalies.
I suspect there are actually several different releases going on and these are the summed up numbers across releases, so I shouldn’t assume it’s a single service deploying.
Cuda
Playing with Cuda & NCCL on PI Day; I’m trying out a programming experiment to estimate Pi using GPUs. The only way I knew of to estimate PI was to use random points to estimate the ratio of points that fall outside / inside the circle – and as ChatGPT reminded me, that’s extremely parallelizable.
As a trivial first attempt:
#include <cuda_runtime.h>
#include <math.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#define CHECK(call) do { \
cudaError_t error = call; \
if (error != cudaSuccess) { \
fprintf(stderr, "CUDA Error at %s:%d - %s\n", __FILE__, __LINE__, cudaGetErrorString(error)); \
exit(EXIT_FAILURE); \
} \
} while(0)
__global__
void count(int N, bool *out) {
int i = threadIdx.x + blockDim.x * blockIdx.x;
int j = threadIdx.y + blockDim.y * blockIdx.y;
/* Edit: this is almost certainly incorrect */
int p = i * blockDim.x * gridDim.x + j;
float x = (i + .5) / N;
float y = (j + .5) / N;
out[p] = (x * x + y * y) <= 1;
}
int main(void) {
int devCount;
cudaGetDeviceCount(&devCount);
cudaDeviceProp props;
for (unsigned int i = 0; i < devCount; i++) {
cudaGetDeviceProperties(&props, i);
printf("Device %d | Max threads: %d", i, props.maxThreadsPerBlock);
}
int N = 64; // Size of grid
bool *out;
cudaMallocManaged(&out, N * N * sizeof(bool));
count<<<dim3(2, 2), dim3(N/2, N/2)>>>(N, out);
CHECK(cudaDeviceSynchronize());
CHECK(cudaGetLastError());
unsigned int count = 0;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
if (out[i * N + j]) {
// printf(".");
count++;
}
}
// printf("\n");
}
printf("π = %f\n", 4 * (float)count / (N * N));
}
Since writing the program, I’ve been experimenting with moving the reduction to another kernel, and benchmarking it aggressively.
ncu
For some reason this is hard to reach, but ncu is nvidia’s nsights compute CLI. So far I’ve directly used it with ncu <binary> -o profile.
References
Cuda Mode A collection of excellent resources & lectures, involves several Meta-mates
I’m taking a week off from work, and planning to use that time to read interesting
papers, dig into the LLaMa models, and catch up on learning and exploring things
I generally don’t get time to. I’ll keep a daily entry for summarizing the day’s
explorations as I go.
If things go well, by the end of the week I’ll have played a little bit with cuda,
inference, understanding some model dimensions, fine tuning, etc.
Emacs, Tramp & SSH
Emacs surpassed my expectations yet again by supporting ssh’ing with multiple
hops transparently. The trick to setting this up is to use a file path like
sshx:dev1|sshx:dev2:~/ and it Just Works. I could even use a shell over this
smoothly.
For using Tramp comfortably (as it spawns multiple sessions) I find it extremely
valuable to use ControlMaster to share SSH connections and skip authenticating
repeatedly. The .ssh/config additions to enable this are:
ControlMaster auto
ControlPersist yes
ControlPath /home/kunalb/.ssh/multiplex/%C
A quick Google search and reading a couple of articles shows this one from
CyberCiti
which covers the bits I use, and several bits I don’t.
From a video that floated across my YouTube recommendations: Abstractions remove/generalize
details to focus attention on important details instead. Illusions accidentally remove
important details… confusing end users. This is clearly a goldilocks zone, and the
decision of important is a matter of taste and experience.
The speaker also calls out the risk of the uncanny valley where an abstraction is
almost like another platform you’re used to – it becomes much harder to use, because
you’re not sure which bits are missing.
2024-03-03
Spent a little bit of time exploring linking and loading, and taking a break
this week.
How to Write Shared Libraries – Ulrich Drepper
I’ve been spending time reading Linkers and Loaders, but decided to take a
detour and ended up reading a small book by Ulrich Drepper
that went into a lot of detail on how shared objects are located and linked,
alongwith the differences between RUNPATH and RPATH.
LD_LIBRARY_PATH also honors $ORIGIN, to be able to choose paths relative to the location of the object.
There are a lot of details on how the actual linking works, with offsets,
tables and jump instructions that I need to re-read.
Per symbol visibility in C can be added by using __attribute__ ((visibility ("hidden")))
LD_PROFILE=<soname> will profile intefaces that go through a PLT call.
Several issues with LLMs trace back to Tokenization (eg. spelling, string processing)
Stick with utf-8
“Byte Pair Encoding” to reduce stream of bytes to something more reasonable
Repeatedly take the most common pair and add it to the vocabulary, reducing the size of the stream
Can avoid tokenization with a hierarchical structuring that can take bytes
BPE is just this process repeated multiple times, compressing the length of the sequence a bit.
This interacts with the context length of the model, and becomes important there.
Decoding is fairly straightforward.
Looking at the encoding implementation, I feel like I’d much rather make a state machine to do this.
GPT tokenization
Uses regexes to split out the text into elements
Process the split results manually; forces merges to happen within words.
The regexes enforce a lot of behavior; maximum token sizes, etc.
Saving the merges and the vocab is enough to describe the tokenizer.
Special tokens:
<|endoftext|> – delimits documents in the training set. Special cased in the code.
im_start / im_end / etc.
Sentence piece
much more configurable / older
lots of options
falls back to bytes for tokens not seen in training
More tokens increases computational complexity
Tokenization also makes auto-complete much harder; because completion could involve introducing a new token that replaces the existing tokens being completed.
A mixed week as I adjust to working remotely with a large time zone difference.
Mechanical Sympathy
I’ve been reading – and trying to also compile and run at the same time –
Understanding Software Dynamics. While I could get sample code running for
playing with the cost of CPU utilization and arithmetic operators, I haven’t
been able to just copy paste and run code memory or disk utilization. The book
is both fascinating and a little bit overwhelming given the sheer amount of
complexity that can affect actual runtimes of a program – and doesn’t actually
include GPUs.
My plan so far had been to carefully work through and play with each component
of the book, potentially writing some reusable benchmarking scripts that I could
compile and use everywhere. At this point, I’m planning to just make it through
the book while taking notes, and then following up with some code to make sure I
have some understanding – and then extending the same principles to GPUs.
Reading through the chapters on memory and disk access, there are so many
potential sources of noise and simplify artifacts of the shape of memory that
I’m surprised the author still manages to construct experiments on it and reason
about it; trying to isolate memory patterns and figuring out if the results
match the actual hardware bandwidth is fascinating. It’s basically about
constructing and validating experiments.
The trick used to measure disk performance is to read/write a large block of
memory: for reading, check when disk block offsets in the memory get updated
(indicating that block was read in) or for writing constantly updating the
block being written so the time of writing can be timestamped. This only works
because of order of magnitude time differences in updating memory vs disk, but
is very clever. I’m also not quite sure how I’d actually implement it, and will
continue reading the book / plaiyng to find out more.
Portable Conda Environments
The other long standing bugbear I’ve been thinking of is how to have easily
movable conda environments; PyTorch does rpath patches to have a relative path
to the conda environment’s library. Conda pack works well, but it needs the
library to be mutated, and once unpacked libraries can’t be moved again because
the update is somewhat destructive.
I’ve been wondering if I can update conda-pack to do non-destructive updates –
allowing the same environment to be moved repeatedly without thought, and then also
having it fix itself every time it’s activated through the activate scripts.
Looking through the issues on conda-pack I also stumbled across constructor
which seems to be a slightly fancier conda env to declaratively create and install
conda environments. This is still not as flexible as I would like because some
packages like NVidia’s Apex must be
installed from source and cannot be installed from PyPI – installing from PyPI
actually drops in some other random package.
Exploring Observable
I’ve been meaning to write about a complex system with rich visualizations for
several months now, and observable framework seems like the perfect tool. After spending
some time exploring, I’ve generally come to appreciate the default design choices
and generally smooth experience.
Funnily enough, I’ve been generating plotly graphs from relatively expensive data
sources and caching them recently – so observable’s dataloader approach that statically
generates the data once and then loads it resonates particularly well. I need to check
if it’s smart enough to be able to only partially load data from parquet files, because that
is still one of the shortcomings of this approach – if you have too much data to load
up front, things will become extremely slow and blow up on the client anyways.
2024-02-11
Spent a lot of time traveling this week, and didn’t get around to exploring as
much as I would have liked.
Hardware
While on a flight, the person sitting next to me worked in hardware: something I
know very little about. She worked on Silicon chip design, and her husband was
an engineer at Lockheed Martin working on helicopters, and briefly described the
setup on how different companies build chips – the supply chain was significantly
more convoluted than I’d expected, and significantly more concentrated.
I asked for a 101 series to build a basic understanding of the hardware
manufacturing process, and was pointed to Chris Mack’s YouTube Channel,
which seems fascinating. I’m starting with his career retrospective.
Whatever path you find yourself on, find something to be passionate about.
CMake
Unfortunately, I’ve been struggling with PyTorch’s CMake scripts somewhat over
the past few weeks. CMake seems reasonable, but I still don’t have much of an
intuition for it. I tried to see if any books I used covered it, but didn’t turn
up much.
I have a lot of study / exploration to do stacked up to catch up on going forward.
There’s even a BlueSky paper I’m curious about.
2024-02-04
Trying out something new this week; today’s letter is written in VSCode web and
I have a GitHub Action to publish. Things mostly seem to work; to mimic
auto-fill-mode in emacs I’m using gq from Vim emulation in VSCode.
Revisiting Building Developer Tools
This is something I’ve been thinking about for some time: I wrote Building
Developer Tools a lifetime ago,
and have since updated some of my beliefs and approaches; most of my opinions
are unchanged – some I’m doubling down on – and some I’ve added since.
Better resources
The Mom Test: This is an excellent book to read
to think about how to test out products, and figuring out the right product to
build.
Developer Productivity Metrics have been in the news a lot; I generally have a
similar reaction as Kent & Gergley on Measuring Developer Productivity.
Doubling down on
Composability: I’ve been seeing the compounding effects of building easily
composable tools repeatedly over the past year. The Unix Way really works, and any
opportunity to extend it should be grabbed with both hands. I really wish UIs were
more composable and easier to integrate.
Shipping value immediately: I’ve bene able to get outsized results just because
of a dramatic focus on shipping things quickly. Unblocking other engineers sooner
has a ripple effect that’s not linear.
Adding
The value of familiarity: I’ve generally underestimated how much fungibility
of skill and just how much engineers value being able to use tools they’re
already familiar with. The muscle memory users have built up learning tools –
and the fact that learning a new tool is completely tangential to the real work
they need to do. Any new tool breaking into an existing workflow must provide an
outsized amount of value for adoption.
Build vs buy vs open source: This is an important one I never quite paid enough
attention to just because of where and how I work. Being able to quickly spin up something
that just works, and doesn’t reduce options can be invaluable – and there’s a certain
stability to good open source software.
Mechanical Sympathy
While I haven’t been able to make as much progress as I would have liked, I
wanted to explicitly list out all the books, libraries to explore, test programs
to write, and resources to use. I’ll tentatively keep coming back and updating
this specific entry as I find new resources, but to just have a sense of what
I’d need a little more formally than last week’s entry.
Books to work through:
Hacker’s Delight: bit-twiddling and other pieces of programming I haven’t indulged in, yet
Deep C Secrets: Re-reading if can make the time, this was a fun read
Linkers & Loaders: I have a lot of holes in my understanding of linkres at the moment
Understanding Software Dynamics: Building real mechanical sympathy for most pieces of hardware
Programming Massively Parallel Processors: Getting a better sense for CUDA & GPUs
Advanced Programming in the UNIX Environment: Getting better at systems programming
Libraries to use
NCCL
CUTLASS
Transformer Engine
Triton
Missing topics to add: something aroud better understanding of network
limitations, bandwidth, communication mechanisms and HPC. General understanding
of data centers, disk, file systems.
And of course, getting better at Transformer architectures and the models –
including the linear algebra. I still can’t really visualize K/Q/V etc. straight
in my head, and must generally work things out slowly and manually.
2024-01-28
A busy week with a lot going on and some great memories.
Named Pipes
I spent a lot of time looking at named pipes; I really wanted to be
able to make a new API where a folder full of named pipes would act a
as a way to write to different paths.
Unfortunately, I couldn’t find a good way to allow safely allowing
multiple processes to write to these paths: as soon as they cross 1mb,
things can unintentionally interleave.
Plotly
Plotly has been surprisingly flexible: at this point I’ve used it to
generate javascript, html, and most recently even JSON. I’m very
impressed by the flexibility it offers.
Mechanical Sympathy
Understanding Software Dynamics
Following another old desire I’m picking up and working with C again,
just to have more systems programming under my belt. I spent some time
working through the chapter on estimating CPU utilization and fell
into all the traps the book mentioned.
I’m tempted to write a general purpose benchmarking script that
generates html and run it on every device I can get my hands on.
A shell based notebook
Another idea I’ve been thinking about: interpreters, bash shells,
consoles in general have very explicit and carefully crafted
input/output semantics, so it should be possible to make a Notebook
interface on top of them with a very generic implementation: ideally
I’d like something extremely lightweight built with HTMX with minimal
assumptions on the contents of the shell.
I think this could work very well, particularly as a replacement for
script and even for simpler notebooks. The fancier shell escape
codes could allow for rendering images, with potentially some
extensions to do so more naturally than pure shells.
Instead of implementing anything concrete, I spent most of my time
thinking of a good name for this project instead: for now I’m calling
it TextBook because I couldn’t think of anything better.
RWKV
As I was writing this a really interesting Tweet floated
past and now
I’ll have to spend some time reading about the new model, and the
architecture. The demos were pretty excellent, and if it’s as
lightweight as claimed I should be able to easily run it locally.
LLMs are a fascinating space.
2024-01-21
Like the set on HyGPT, starting a series of notes on software
dynamics, queueing and hardware behavior that I hope to learn this
year: I’ll be categorizing these as Mechanical Sympathy.
The second chapter talks through estimating cpu utilization – in a
way that can be reasoned about. There are so many potential ways that
the compiler, pipelining on the CPU, or other systems can interfere –
it’s extremely valuable to be very empirical, and to check all
assumptions. The best recommended way to benchmark is to run the
benchmark with multiple iterations over the same piece of code (after
confirming that the compiler isn’t eliding all the synthetic work):
once with n, and once with 2n. Then the actual cost of n iterations
can be determined by subtracting them.
The best mechanism to deal with variations in runtime across runs is
to choose the minimum to minimize noise – this may not be
immediately intuitive, but is something I’d also learned from watching
Android Benchmarking talks. The minimum time is almost certainly the
one where nothing else interfered while benchmarking, and should truly
represent the workload.
At the same time, Queueing theory also applies to organizations and
systems, so I’d also like to complement this series with The
Principles of Product Development
Flow,
which also goes into the math.
Simple Bookmarking
As an attempt at capturing interesting links – and recording where I
find them – I’ve set up a Google Form for myself that publishes to a
spreadsheet. I use it to upload papers straight to my drive, capture
links from HN & Twitter, etc. I’m excited to see which of my technical
news sources I should maintain and which ones I should cut down on.
libSegFault.so
A colleague – CT – pointed me to libSegFault.so: a very useful
utility that can be LD_PRELOADed. Spending some time googling made
me realize I’d been missing out on some very useful infrastructure.
There are some great blogposts
on this, and also seems to be a really fun pattern I now want to apply on other
pieces of native code. LibSegFault itself adds a SEGFAULT handler that prints a
lot of useful information to help isolate where the segfault happened, without
needing to recompile the code at all.
Meta
Mark announced that we’re building
LLaMa3
– and given that’s public, I can also say that I’m helping out with
that by building and maintaining tools for the team. This is one of
the projects I’ve been most excited about in my career so far.
2024-01-14
A fairly long week; I haven’t been able to spend as much time learning
new things as I would like; and I really need to start structuring
these notes better.
Tweaked the CSS to enable
text-size-adjust,
which makes this website significantly easier to read on mobile – I’d
been wondering what I was missing compared to
expLog.
/proc tricks
The Linux Debug filesystem is one of my favorite things: it’s a
ridiculously convenient way to explore, offers a lot of affordance and
is something you can play with live.
/proc/$pid/environ can be very useful at times, and it contains a
\0 separated list of the environment the process was started with.
I’ve actually used it twice recently, and I thought I’d record the
tricks:
Sometimes programs will unset environment variables they were
started with before handing over control, eg. LD_LIBRARY_PATH. I
parsed /proc/$pid/environ to recreate the original environment
(split by \0 and put it into a map), and then exec’d into the
process I needed with that environment. It works surprisingly
smoothly.
I was writing bash scripts I needed to be configurable, so I had
three layers of configuration:
A set of default environment variables
Config files that could be sourced and used to override these
The ability to set the variables at the CLI while running to
override both of these.
Now this becomes tricky because variables set in the parent
environment are generally the first to be
overridden. /proc/$pid/environ to the rescue again: after
sourcing the passed in config file, I would read the environ file
and pull out the config variables I cared about and explicitly
source them.
Part of this was inspired from how Bash
Idioms
deals with configurations (including self-documenting code).
Marimo is a beautiful new Python notebook that I wish I’d
built. Someday I’m sure I’ll try and build my own – almost certainly
with HTMX and even more minimalism.
Implementation details
Tornado, React
Mix of http & sockets
Features I really liked
vim bindings out of the box
a fresh, lightweight design
seeing errors disappear as I fix imports live was excellent
autocomplete is… interesting. I couldn’t get copilot to work
Data format
The generated notebook is an executable script that relies heavily on
decorators
(app.cell)
– which is a significantly more elegant approach than json. Of
course, that also means that no notebook outputs or state is ever
persisted to disk.
Threading behavior
I’m always curious about how notebook desigers implement something
like async tasks; asyncio itself seems to be slightly hard to use in
these notebooks – there’s no default event loop, and I couldn’t quite
get it to make one for me to run background tasks.
Then I tried explicitly creating a thread: this doesn’t actually print
anywhere and instead prints to stdout. I expect this is something
that’ll get fixed in the future.
Reactivity
While I think this is a very nice way to build applications, I’m not
yet sure if I’d like to use this approach to build “notebooks” for
exploration.
I’m also going to be spending some time exploring more industrial
strength visualization tools to handle exploring and visualizing much
more complex data.
I also decided to try out are.na to maintain notes on LLMs – as a
complement to these letters. Unfortunately I’m probably too used to
being able to customize my workflows to stick with it.
Books
While reading tremendous amounts of fiction, I stumbled onto Master
of Change by Brad Stulberg –
which has been surprisingly soothing and engrossing. There are also
some extraordinary quotes in there:
“It seems that all true things must change and only that which
changes remains true.”
Preparing for a brand new year! Spending some time thinking through
what I’d like to learn, do, and build – ideally with a why attached
– followed by how I should go about it.
To Learn
Large Language Models
Transformers – and large models in general – have been changing
software; I’d like to be able to regularly and comfortably keep up
with the state of the art, and train my own models for different
purposes. Whether that means training a small model from scratch, fine
tuning or applying open source models, or shrinking models to be tiny.
At the same time, building mechanical sympathy for these models
seems really important: an intuitive sense of the hardware capacity
required, the number of operations running a model takes. I expect
I’ll be writing about this and spending a lot of time here in 2024.
Of course, this needs to be complemented with actual working code,
notes and projects. I expect to level up in Systems Programming,
Networking (one of my biggest weaknesses), Cuda and HPC as part of
this.
Maths & Physics Fundamentals
I frequently run into limitations because I don’t have a good command
over undergraduate - graduate level Maths & Physics. So I’ll try and
hit some good old textbooks, including Feynman (both on Computation
and Physics), and classics on Math.
Information theory, Queuing theory and more advanced simulations are
topics I’d like to dive particularly deeply into.
To Build
Hy Tools
Hy has been making programming extremely fun again because I can
bring together a lot of things I enjoy: Lisps, ML, iterating quickly
and interactively. At the same time, the available tools have
bit-rotted a bit, in progress or just don’t exist yet – nREPL, tree
sitter, the emacs mode, jedhy, etc.
Once I have all of that in place, I’d like to play with some
Python/Notebook experiments that have been floating around in my head
for a really long time, where I mimic SmallTalk while still
supporting a production Python environment. With enough elbow grease
we can write code like they did in the 80s!
Orphism
I’m leaning into visualizations, composable tools and also thinking
about UIs at the moment: doubling down there and making it easily
usable and customizable.
Notes box
While writing this letter has been valuable to reflect on my week and
things I’ve been learning, I need a better mechanism to easily add and
iterate on my notes, particularly with the ability to have an LLM read
them for me (though I suspect that’s not going to be as valuable as
having an LLM read things that I haven’t read).
While I’ve generally kept my slipboxes
public, I’ll experiment with making a
private one this time.
The other thing I learned was to try and use an LSP for organizing and
cross linking between the notes easily.
To Write
An open source implementation of intermediate logging
I promised to do this during my talk at PyTorch Con '23, and I’d like
to follow through on this. I’ve structured the code in my head several
times over at this point, so it’s just a matter of getting it written
in a way that works.
Maintaining these letters
As a summary of things I’ve been learning and paying attention to over
the week, ideally cross-indexed with the notebox. At some point I may
also tell people I’ve been writing them once I have a clearer purpose
and structure – including some form of pagination.
2023-12-31
Last one for 2023!
Orphism
A short burst of hacking over Christmas Eve let me release a preview
of orphism on PyPI and
GitHub.
While I’d been calling this project
Fauvism I realized it
was already taken on
PyPI. Orphism was a
quick search and replace away – thematically it seems to fit even better,
as a derivation of cubism.
Implementation has been fun: I decoupled the code into “bucketing” and
“rendering” which helped smooth it out a lot, but there are still a
lot of edge cases and off-by-ones to reason about that I’m not
particularly happy out. And while I can render sin(x) and cos(x)
pretty well, tan(x) looks ridiculous.
The data ink ratio is
also remarkable for the amount of data that just fits into a single
line.
I’m hoping to use this for rendering model weights live, so I’ll add
support for NaN, Inf, numpy and torch Tensors. Then I’ll
figure out how to make it fast.
Unfortunately this has been fascinating enough to draw me away from
Advent of Code, but I suspect I’ll find my way back there soon enough
anyways.
2023-12-24
Ended up relaxing a bit last week. A new week, another attempt at
learning to program!
Advent of Code
After getting really tired of thinking through the problems, I
realized that I generally have the most trouble when I pattern match
too aggressively on past problems. Instead of treating the problem as
a brand new problem to evaluate on its own merits, I’ve been
forcefully fitting it into a pattern and struggling when it didn’t
match.
I need to adopt a little bit more of a beginner’s mind here and have
more fun with the problems.
I spent a lot of time thinking about this one: particularly part 2. My
first several attempts were around recursing, caching, and reducing
the problem.
The other – in hindsight, questionably – useful insight was that
there were 2 ways to count the number of possibilities: one by trying
out two options for each “?”, and one by generating all possible
strings with the constraints. I also derived a formula to figure out
when it would be better to keep recursing instead of trying to
enumerate all possibilities. This worked well enough to solve the
sample input, but even letting it run with PyPy for over an hour
didn’t actually solve it.
A few days too late I pattern matched this to edit distance, and after
thinking through the problem over a couple of days finally figured out
a solution:
Reduce the pattern to match against by replacing all multiple .
with a single . – because that’s pretty much exactly the same.
Generate the actual constraint string, interspering the # with .
– and also adding a . at the beginning and the end.
Solve the problem with a table counting all options
Each cell in the table represents the number of ways to match the
patterns up to that intersection. The bottom right corner becomes
the required answer.
If the characters don’t match, then there’s no path forward, so
that’s an easy 0. (? can match anything.)
If the constraint’s character is #, it’s the number of ways to
match to precisely (row - 1, col - 1).
If the constraint’s character is ., that can be achieved by
stretching a string matching up till (row, col - 1) or by matching
up to (row - 1, col - 1) – so the result is the sum.
The extra . prefixed and suffixed on both allows for the
constraints to start and end appropriately.
-1, -1 is given a value of 1, and any other out-of-bounds values
are 0 to allow matching.
/
.
?
#
.
.
1
1
0
0
#
0
1
1
0
.
0
1
0
1
Google Slides
I constantly wish for an index card application that’s minimalistic,
easy to use, and can actually replace using paper index cards for
brainstorming. While thinking about the ways I could leverage
existing, widely available tools I’ve decided to experiment with using
slides.
Using a minimal theme, I’ve been using slides as a way to take quick
notes, have bullet points I can rearrange and an easy way to build
quick and dirty diagrams.
I’m worried this will be yet another aspirational book – one I’d like
to be able to read easily, but find too hard at the moment – added to
my tsundoku, but the first
few chapters have been manageable.
2023-12-17
A new week begins – another 7 Advent of Code Problems, hopefully a
small open source project and some exploration with llama.c.
A surprisingly comfortable day after day 10: I almost started
implementing the Floyed Warshall
algorithm
before realizing this was simply Manhattan distance and there was
nothing particularly fancy going on here. Part 2 was fairly reasonable
and direct as well.
My chromebook (Macbook Air 2012 with ChromeOS Flex) still keeps
troubling me though: there’s some? update at around midnight that
seems to cause it to freeze: copy-paste fails, chrome stops connecting
to the internet; all terminals stop working. I still can’t quite
figure out what’s going on – but it definitely plays havoc with my
rankings.
… and beyond
Surprisingly, I’m finding myself a little bit bored at this point. I’m
not enjoying staying up waiting for the problem to drop any more, and
have started finding the problems a little bit of a chore so taking a
break from AoC. I’ll catch up when / if I feel like it I guess.
Fauvism
So far I have a small video and less than 100 lines of code:
2023-12-10
Somewhat annoyed at how long it took me to implement a working
transformer, but I’m blaming Advent of Code for distracting
me. (That’s not the truth though, I’m just not practiced enough at
reasoning through matrices and tensors just yet. Soon.)
Just as a warning if you’re reading this: I keep editing the letter
for the current week live, moving on after midnight on the Sunday this
is published (so this particular letter stretches from 2023-12-04
00:00 to 2023-12-11 00:00, and will have the heading 2023-12-10).
Another fairly mechanical problem, I ended up spending most of my time
re-reading the second part. Given the straightforward nature of the
input format this time around, I decided to take a second stab at the
problem with
parsy: an
excellent parser-combinator library that I’d been eyeing after
spending more and more time writing regular expressions for AoC.
In retrospect, nested regexes would have worked just as well, but
that’s not as much fun. My favorite interface to write regular
expressions is in emacs, because I can see the behavior live – I’ve
seen a couple of websites that do the same but I’m yet to see
something as smooth as emacs is about it.
A classic AoC puzzle: intersect segments with some custom code. There
were some interesting solutions on Reddit that worked backwards,
several that brute forced it – but I didn’t see a Tree implementation
yet. At some point I’d like to implement this with an interval tree.
Somewhat relaxing puzzle; I spent a bit too much time writing things
out on paper first – and stumbled a little bit when the roots of the
equation were integers. Happily the test cases included exactly that
scenario, making it much faster to iterate.
Learned about math.nextafter today: I was looking for the quickest
way to get the next integer from the current value, irrespective of
whether it’s an int or not.
Feeling somewhat lethargic today, but I didn’t do too badly in
time. Today’s problem was slightly fiddly and easy to mess up; I also
wasted some time because I didn’t read the instructions carefully
enough and missed a case.
I realized I didn’t remember the APIs for Counter and had to look
them up; also simply reversing a string, etc. which is a bit
annoying. Oh well. As a test, I also recorded my attempt so I can go
back and review later.
The most elegant solution I’ve seen so far was on
Reddit
by gemdude46. Looking at
simple and well structured solutions for problems I’ve tried myself
is generally one of my favorite parts of AoC.
Someone shared a video that listed all AoC solutions, so I couldn’t
resist writing my own helper program to list solutions out. For now,
all my solutions run in around ~half a second; I hope I can maintain
this trend.
My Chromebook gave out again while I was working through part 1, but I
managed to catch up significantly in part 2 once I figured out what
was going on. I’d started by recording myself work, and I’ve realized
a lot of things that I should be doing to program faster that I’m
planning to do from tomorrow:
copy paste the small input from the beginning so I can constantly
exercise my code; and actually keep running it as I go.
I spend a lot of time debugging backwards for trivial mistakes
otherwise that I could have caught and dealt with significantly
sooner.
avoid spending too much time thinking about variable names, etc.
don’t get too comfortable with AoC: I wasted time implementing a
(working) cache lookup today just because that’s what I thought
“fit” the problem best without realizing it was about finding the
LCM.
make sure I sneak a peek at the actual raw input at the start
instead of completely ignoring it.
The problem itself was fine; I gave up and went with a quick regex to
unblock myself: if nothing else AoC helps stress test my regex skills.
I’ve been enjoying watching Jonathan
Paulson’s solution
videos, which also gave me a lot of hints on how to speed up my
solutions.
On another note: I spent a bunch of time today trying to run my
solutions through
GraalVM but
failed miserably; it can’t install hy, and generally installing – or
trying to install – a python package started heating up my
laptop. Hopefully some other time. Seeing Jonathan’s solutions with
Pypy I decided to give that a
shot too, but didn’t get very far – particularly because it’s not
meant for short running processes.
Finally I decided to play with an idea I’ve described earlier, and
compiled the code with Cython: and it runs
beautifully. I’m going to have to spend a little bit more time with
this, and I suspect I can get an experience with all the power of
Common Lisp and Python in a language I really enjoy.
I tried to be much more incremental this time around but still ended
up being pretty slow; I’d like to say that I was too sleepy and tired
to focus properly, but looking at the leaderboards does make me wonder
at just how much faster I could be if I keep working at this.
The problem itself was interesting – it took longer to understand
than anything else – and I still wonder if there’s a trick to finding
a mathematical solution. It feels like differentiating a series till
it turns to zero, and then figuring out the deltas / equations.
Wow this was tricky. I solved part one with a BFS traversing the nodes
and then taking a maximum. For part 2, I tried to maintain the party
of all my answers. It’s always surprising just how much clarity a
quick visualization can get me when compared to the raw data – this
is something I should apply more widely.
Looking at discussions and other people’s solutions, I learned about
the Shapely library, the
shoelace formula and
Pick’s theorem –
though not well enough to apply them. Several people also implemented
a floodfill after adjusting the grid, which was not an approach I’d
really thought about – but it is pretty clean.
Speeding up
As an experiment, I decided to record myself solving Day 09 while
fresh in the morning – already knowing the solution and just trying
to get a baseline of how long it would take me to solve the problem.
Analyzing my videos (original attempt, second attempt in the morning):
Notes
Actual attempt
“Ideal” attempt
Time to first
13m21s
3m11s
Time to second
20m10s
6m12s
Bugs
1m19s Didn’t split line
2m57s Didn’t include original number
4m35s Didn’t toggle the flag
With these times I would have barely made it into the leaderboard for
part 1, and not at all for part 2.
Potential follow ups:
Make it easier to get inputs from the puzzle
More utility functions / default to strip the input somehow?
Generally slow / stuck at debugging: either minimize debugging or
get better at speeding up debugging, specially when I don’t have the
problem saved in my head.
Explore using REPLs better – if I want to improve the tools
Keep a clearer head on the problem – if I want to generally improve myself
Somewhat disappointingly, trying to redo the problem still led to
bugs: in the past, I’ve run into this while trying to go too
fast. Instead of trying to push for speed, I think I need to focus
more on executing the code constantly in my head as I type and making
sure I avoid making mistakes instead.
Knowing the problems I’d run into the first time around helped speed
up the 3rd attempt to something around 4 minutes, but that’s so far
from a real attempt.
Transformers
Slowly working through the attention block and cross checking every
step; I hadn’t internalized that the calculations for K/Q/V are
exactly the same before masking / summing over values.
From Hacker News: LLM Visualization is
astonishingly well put together. I wish I was comfortable enough with
3-d programming to generate visualizations like that.
I’ve finally succeeded in making my attention block match the one from
the test, through main brute force – comparing tensor by tensor, step
by step. My mistake was at the last step: instead of summing over the
heads explicitly, and then adding biases I was using einsum to both
add the biases and sum over the heads at the same time. At the same
time, the test used a buffer IGNORE that I had just set as a
constant – but for evaluation the buffer would be overridden making my
values diverge further.
Clearly I’m still not entirely clear on the ordering of the operations
and why they matter; I suspect the model would train the same (but
that is probably just my naivete).
I finally managed to make my Transformer work closely against the
reference GPT2 implementation provided by the TransformerLens
library. Of course,
as soon as I find build the most basic familiarity with transformers,
a new paper is released that seems very promising:
Mamba. I’ll spend some time
playing with this as well, while I build more mechanical sympathy for
these models.
I’ve been spending some time thinking about what to tackle next, and
working through and implementing llama.c to reproduce the Llama2 and
actually using it for inference sounds like a good plan – and of
course, I’ll be doing all of that in Hy.
Fauvism
While playing with a terminal version of TensorBoard I started
wondering if I could render cubism-style charts instead to gain even
higher resolution than the
sparklines offered
by Textual. I also wanted something that didn’t try to fit the data
into the available width, and would instead let me scroll.
Which led to a small project I plan to use in a couple of places,
including the open source release of Intermediate Logging. Fauvism
tries to render both positive and negative values, and I’m building it
in layers (Rich Renderable -> Textual Component) and a reasonably
useful CLI app as well.
The most interesting bit is figuring out how to show negative values:
I’m experimenting with flipping the bg/fg colors and simulating
unicode blocks that start from the top instead. I still need to work
through the edge cases though – but I’ve only spent around two hours
on this so far.
This is in Hy too, but is transparent to library users (__init__.py
imports Hy, making the rest trivial).
Meta
On 2023-12-05 I completed 12 years working at Meta: 3.5 years at Facebook
California, and 8.5 years in New York. It’s been a while.
2023-12-03
Satisfying another static-site generation itch this week with a simple
table of contents generated with the strategic application of fragile
regular expressions.
Transformers
Continuing to work on a transformer implementation, I’m finding
Arena
very helpful, because I can write the transformer code layer by layer
and sanity check it against GPT2. Once I have this, I’d like to try
and implement my own completely from scratch or from a paper, but at
least this gives me something bite sized to tackle first.
Visualizing / thinking through multidimensional matrix multiplication
is giving me a massive headache, if I’m honest. Peeking at solutions
that just use einsum was bittersweet – I’m happy it’s possible to
express it so cleanly, and I was sad I hadn’t known about it earlier.
I definitely don’t enjoy having to deal with batches as an additional
dimension – I almost want batches to be something that can be plugged
into the model post hoc as a post-processing step.
Trying to debug my Attention block – given I have a reference
implementation open right in front of me – is a good exercise in
realizing very concretely that ML Debugging tooling is very primitive
at a lot of levels; trying to build a transformer is just driving that
home very viscerally. On the plus side, this gives me good ideas for
projects and visualizations to build.
I should read more about Signal Propagation Theory. But I think I’ll
stick to having my own transformer implementation first before I keep
distracting myself.
Type Macros
Still iterating on the attention block, I’m now rewriting it into
something like I would normally do instead of relying heavily on
existing Module patterns. Started with a little bit of procrastination
to implement my own typing Macros in Hy – I can see a whole new world
of DSLs opening up in front of me. I tried a plain old macro and a
reader macro to replace jaxtypings Float[torch.Tensor "dim1 dim2 dim3"]:
(defreader T
(.slurp-space &reader)
(setv out (.parse-one-form &reader))
(setv #(dims tensortype) (.split out ":"))
`(get ~(hy.models.Symbol tensortype) #(torch.Tensor ~(dims.replace "," " "))))
which looks like #T dim1,dim2,dim3:Float. But when used with an
annotation it becomes a bit too unwieldy for me (#^ #T ...) and I
couldn’t see a quick way to bypass that.
So I’ve settled on a simpler defmacro instead, which looks like
and I can use as (T dim1,dim2,dim3 Float). I’m not particularly
happy with this either, but it’s better than (get Float #(torch.Tensor "dim1 dim2 dim3")) which is what I’ve been living with
so far. I’ll have to think a little bit more about this, and figure
out how to write assertions on these – perhaps with a custom defn
wrapper macro instead.
Personally, I switched from Mac to Linux a couple of years ago (I’m
even typing this post on a decade old Macbook Air running ChromeOS
Flex) because while I really like Mac’s hardware, I was tired of
fighting the software.
Documentation
After writing a lot of documentation recently, I’ve been finding
myself surprisingly attracted to using presentations as a quick
documentation mechanism: I can easily annotate diagrams and text,
point people to a very specific slide – and most importantly – they
don’t seem to be as overwhelming even if there are large amounts of
content, just because of how skimmable they tend to be.
A significantly more formal – and cleaner – approach to
documentation is at Diataxis which I plan to
adopt and learn from.
Advent of Code
I must admit to being very excited about tackling Advent of Code again
with Hy this year (though I’m also tempted to try Zig or Go, just to
get another systems programming language under my belt). I’m hoping to
use it as motivation to improve some of the tools around Hy & Emacs –
potentially just updating some of the existing tools that have broken
with language changes.
Trying to run Blitzen – my Rust helper – failed; at this point it’s
probably too old too compile so I quickly made a version in Hy; with
Requests and BeautifulSoup this turned out to be very small and
surprisingly smooth to write; reinforcing why I enjoy writing Hy /
Python so much. SPOILERS below, of course.
I can’t say I started off particularly smoothly: the first part was
reasonably quick to pull off, even though I stumbled a couple of times
– submitting my solution with bzn also went off smoothly.
The second level was painful for me though, mainly because I
misunderstood the order in which the second number could be picked
up. Regexing backward on a reversed string was the best solution I
could come up with in a pinch.
At some point, I may try to do this with a state machine instead using
a Trie just to make the parsing more efficient; writing it out in Hy
will also be interesting – all the indexing gets pretty painful
quickly.
My original solution was significantly more verbose and painful, but
lended itself to refactoring remarkably well; so I’m happy I have a
small solution up and running at the end. Of course, looking at Reddit
shows me that I could have handled overlapping strings much more
smoothly in
Python
by retaining the original letters; given the constraints we were
facing this would just work. I could also have been fancier and used
lookahead regularexpressions (?=(...)).
Short and sweet; I even managed to just sneak into an under 1000 rank
for part 1, and just above 1000 for part 2. Every time I do Advent of
Code I remind myself that it would be a good idea to have some way to
very simply and quickly read strings, and perhaps I should keep some
library of parser combinators handy.
I took a mostly mechanical approach today and could generally write
functional code fairly quickly: though I was somewhat betrayed by my
laptop which ended up freezing and had to be restarted. Given it was
just past midnight, I have to wonder if there was an update
transparently being applied that broke things.
Happily, I could still submit my solution to part 1 through emacs even
though Chrome and my Terminal stopped function; bzn.submit-answer
happily pulled through.
2023-11-26
This week’s letter starts with a face lift, combining some of the
pieces I’ve been writing about: a color palette inspired by Dieter
Rams and a simple site
builder written in Hy. Almost everything is
typeset in Inter; though at some
point I’ll swap in Berkeley
Mono for
monospace fonts and maybe the headings.
(tl;dr; for the rest of this week: I’ve been playing with Hy.)
Transformers
There are a remarkable amount of excellent notes and explorations on
Transformers out there: this week I stumbled upon Neel
Nanda’s
blog from a Zulip discussion from RC. With
sufficient Googling I’m beginning to realize that Transformers may be
this decade’s Monad – and I probably shouldn’t write yet another
post on how to implement my own. Though I almost certainly will do one
for a Transformer in Hy.
Neel’s blog took me to
Arena
which hand-holds you through building a transformer. I’m taking a
slightly different approach from the recommended set up and working in
Hy, but maintaining everything else. Instead of copy-pasting set up
code directly, I’m manually transforming it to Hy and doing my own set
up, but so far that approach seems to be working. Learning about
einops and jaxtyping has also been interesting – working with
Hy I can see myself writing a couple of macros to make all of this
significantly more ergonomic.
Hy, Textual & UIs
I finally ended up making a working application with Textual, using Hy
as my actual language: a minimalistic version of TensorBoard that
renders scalars as sparklines. It was surprisingly satisfying,
particularly after I could enable sparklines.
Message Queues are probably my favorite mechanism to make sure the
main thread in a program is unblocked; and once I wrapped my head
around Textual’s APIs (particularly that every widget has access to
the message queue – and the thread safety of certain functions)
things became much easier. Enough to be able to set up auto-refresh
and manual refresh, with
Toasts popping up as
data quietly updates. Textual also seems to be handling mounting new
elements much more simply.
Hy, Common Lisp and Compilation
Hy keeps growing on me: while looking at old Hacker News posts on Hy,
I also came across a post on using
hy that
resonates a bit too much.
I slowly find myself thinking about common lisp’s approach to
compilation: and how Python is also reaching there:
Triton compiles code live and
creates Cuda kernels; torch.compile is the same idea. With a good
REPL I can easily seem myself having a similar workflow with Hy.
There are some tools that I miss at the moment: JedHy is a bit out of
date, the highlighting is slightly broken and autocomplete is
accordingly somewhat minimalistic. On the other hand, the language
itself simplifies code enough to make these trivial annoyances.
My favorite recent snippet is for generating HTML:
It’s delightfully minimal, handles the HTML as strings but still
gives me a remarkable amount of flexibility. In Python, I would
probably end up using with, but with the right notation I can just
write it out directly in Hy.
I’ll rewrite it to be slightly friendlier with a macro, potentially
allowing for #< instead of tag.
Black Friday
My biggest set of expenses for Black Friday have been books, of
course. I’ve picked up 3 books so far (that I will almost definitely
read):
Another surprisingly busy week; weeks where I don’t have enough time
to learn something new seem disppointing and pale – I should make a
stronger effort to take time out to study and explore and build.
I picked up one of my favorite books – The Art of Science and
Engineering by Richard Hamming; apart from Small Gods – it’s one of
the books that been most influential about how I think about my
life. I’m looking forward to re-reading with a new lens, and hopefully
getting something new out of the book.
Bash
I find myself slowly becoming more proficient with Bash; enough to be
able to quickly put things together without having to google too
much. Quoting and arrays are still a nightmare, of course, but there
are places where they just work.
Languages
Thinking about Bash, Python, and the desire to write systems
programming code, I found myself disappointed: a Lisp-like macro
system and homoiconicity seems perfect for writing efficient code, but
there was no Lisp that seemed to satisfy these requirements. I find
myself tempted to write my own. This is in stark contrast to last
week’s dreams on building an automatic profiler, but is somewhere
close by.
I find myself tempted to work through Crafting Interpreters with Hy,
using the effort to improve Hy itself, think about building my own
language and levelling up a little bit. At the same time, I’m curious
about which programming languages would be easy for a Transformer to
write programs with and get feedback; would assembly be simpler?
Of course, ChatGPT said that Python is the easiest language to write
because of the sheer amount of existing code. That said, I’m a little
surprised and suspicious.
At the same time, I’m also surprised at the lack of specific
programming tools: Copilot and ChatGPT should be able to do
significantly more analysis on the programs being written to design
real systems well and quickly.
Transformers
As a project, I expect I’ll go back to numpy or PyTorch – I haven’t
enjoyed using JAX much, and with PyTorch i should be able to write
code quickly.
I spent time standing in lines and sitting around in a cafe re-reading
How Transformers Work along with
several links within the post – and that helped make things click
much more clearly than they have recently, particularly when reading
after watching Karpathy’s videos a few weeks ago.
The thing I’m still struggling with is that the transformers – and
perhaps a lot of the architectures – are much more evolved and
empirically determined instead of being designed. Why does the
value of attention heads fall of after adding 6? That’s probably some
function of the input data, information theory, and may be aligned
with the tokenizer.
I really appreciated that this blog post also went into the details of
tokenization, which have been somewhat obscure for me – just because
I haven’t gotten around to paying attention. There is something here
to play with, and I really enjoy Anthropic’s approach to this with
Mechanistic evaluation of models in Transformer
Circuits.
Textual
On a completely different note, I also spent time building a TUI using
textual and Hy to let off steam (and I suspect I’ll be treating this
project as my personal video game for the coming few weeks).
I’ve been having a terrible time getting used to all the APIs and
mechanisms available in Textual to write apps – and if I had one
suggestion to make it would be to make it much simpler; right now the
API and the components offer too many things (worker threads, magic
async functions depending on how you define things, way to many magic
instance members that change behavior) and simplify it to something
that maintains views.
That’s how I’m planning to use it anyways, with all the business and
data fetching logic extracted (something like MVVM potentially? or
MVC?) in a way that feels comfortable. Hy is beginning to feel
familiar, though I still stumble ofter (why does for look like
(for [x xs] (print x)) but lfor skip the []:
(lfor x xs x * x). Potentially an implementation issue, but it was
surprising when I ran into it. The language is also significantly more
ergonomic than I had realized, with support for setx that sets and
returns values as an alternative for setv.
Next
Hopefully this Thanksgiving weekend I’ll have a chance to: take
significantly more detailed notes from The Art of Science and
Engineering, and potentially talk about applying it to the world
today.
I’d also like to refurbish my online presence, reset and simplify my
dotfiles and simply clean up this site and my slipbox
significantly. I’ll also be taking a stab at writing out the
implementation of intermediate logging for open sourcing.
2023-11-12
A busy week spent mostly traveling, and occasionally reading code and
how certain systems work.
Bash signals
I’ve been spending a remarkable amount of time trying to reason about
and understand how signals interact with bash scripts this week.
There’s one very important rule that may not be obvious: if a shell script
is running a command, bash will block all signals it can. The only
way to get signals unblocked is to run a backgrounded process (using
&) and using trap.
The other generally simpler alternative is to simply exec into the
script you want to forward signals to.
Automatic profiling
After spending a lot of time seeing people profile and improve
distributed systems, I started to wonder whether it would be worth
investigating simulating a system (hardware, network, performance
characteristics, even software, etc.) and using that to optimize the
system.
ChatGPT has pointed me to a significant number of resources, and
clearly this is something that has been deeply researched; I have a
lot of reading and exploration ahead to understand this.
User Interfaces
A recurring belief I have that slowly keeps strengthening is that the
amount of effort that goes into building working user interfaces is
completely disproportionate to the value created by them. As someone
who strongly prefers minimalistic designs and genearlly appreciates
form over function I’m even more biased against spending a lot of time
polishing UIs.
In past lives I’ve spent weeks to months aligning pixels, and even
rewriting a website from scratch for being off by 1 pixel. In
certain contexts, that makes a lot of sense; but for a lot of other
jobs to be done the UI is just not important. And spending engineering
years implementing drop shadows, animations (which also consume a
surprising amount of compute & battery) has come to feel like a bit of
a waste.
I do live this as well: wherever I have the opportunity (ie a linux
install) I’ll end up installing i3 or sway and then working
through them instead of dealing with other window managers. Text gets
me most of the way; simple drop-downs and affordances do the rest.
Part of this feels like the eco-system developed around UIs,
particularly cross platform UIs. HTML, CSS and Javascript get us
some part of the way – but the last mile seems to be so much more
costly than I would have expected.
Reflecting on these letters
The point of this newsletter was to reflect on the past week and write
down things I found interesting; now that I have almost a month of
letters to look back it there are some clear patterns:
I pick up books to get just enough out of them and move on fairly
quickly. After that they become part of my library and my sense of
guilt of my massive collection of unread books.
There’s so much I could be learning; it’s been most valuable to pull
what I need when I need it, and potentially record where I can go
back to get more information – to prevent re-work when I do need to
go deeper. This website and newsletter series should help.
- Particularly if I can connect it to a transformer for handling.
I should make it easier to read and revisit what’s going on with
cross linking the contents of this site, which will need a little
bit of programming.
When I revisit books I used to consider very hard, or impressive I’m
beginning to be a little bit less impressed – particularly
depending on the book. LOL is one such book, that didn’t really
stand up when I looked at it a second time – I definitely
appreciate the powers of metaprogramming, but the tone actually gets
annoying at times.
There’s learning for the sake of learning and learning for the sake
of application; sometimes there’s a lot more theory needed to truly
be able to apply something. I’d like to focus more on that, and the
fundamentals – particularly as it keeps becoming easier to look up
specifics with all the AI assistants cropping up every day.
Hacker News, Twitter and Threads throw a lot of interesting material
my way, and I do spend a lot of time reading through these articles;
but it’s probably much better to use them as a source of
recommendations (particularly for recent updates) instead while
doing directed study.
2023-11-05
Continuing the theme of not having that much progress, I still have
even more books coming in.
Books
This week I picked up Linkers and Loaders on Olivier’s
recommendation. The book is from '99, but so far it’s been extremely
helpful in filling in gaps in my understanding of just what
LD_PRELOAD, LD_LIBRARY_PATH, ldconfig, etc. do with
significantly more concrete examples. Understanding “runtimes” has
also been helpful.
Videos
Stumbled onto a new video by Antirez on building a minimal IRC server:
Smallchat. I really enjoy these kinds of projects, as a way to
teach / learn different types of programs. Every one of these I see I
want to reimplement in Python/Hy to explore and learn.
Smallchat is no different, and it sets up a server that can be
connected to over telnet.
Python
Continuing on last week’s theme, I’ve been exploring how conda sets
paths in the installed environments and I have to admit to being a bit
surprised at how much is patched into binaries. conda-pack does a
good job figuring things out, but it seems remarkably painful, and I
have to wonder why path resolution needs to be so complicated.
I’m also playing with Textual on the side, again using Hy because I
often want to create a GUI to display results, but generally can’t be
bothered to build one. Building command-driven guis seems like the
best way for me to stay sane.
Misc.
I’ve been thinking about the value of developer experience, and even
choice of programming languages. Over time, I’m becoming more cynical
on the value added by the last mile of improving tools or languages.
There are things that are essential: access to a devserver, how you
simulate a test vs production environment, something to run tests –
but then once the big things are taken care of the rest seems to be
… bikeshedding. unittest vs pytest or vim vs emacs –
perhaps this is a function of reading [[Kill it With Fire]] but
familiarity seems to overwhelm the rest of the choices. Whichever
toolset resonates with you is fine; the cost of switching outweighs
the value provided by an alternative tool.
The clear winner from that approach is to lean in harder into the unix
approach of building composable tools; having files as an abstraction
works a bit too well for the most part. I wish there was something
similar for building User Interfaces that composed just as well –
perhaps that’s why HTML/JS/CSS became the default UI standard –
websites have some support for composition.
Open Source also trumps custom tools in the same way; though open
source tools may not be particularly coherent across itself which can
be tricky.
2023-10-29
In some ways, this week’s letter is a little bit sparser: I’ve been
heads down at work, and haven’t really been able to focus as much on
learning new things as I would have liked. Looking back at last
week’s letter, I see that I ended up mostly abandoning the books I was
actively reading in return for focusing on things I’m working on, and
looking for small pieces of novelty.
Talks
As both a replacement, in addition to Netflix, I continued catching up
on Strange Loop videos:
Building Distributed Systems was a great talk on different,
new languages that can make it significantly easier to build and
reason about consistency and distributed systems. What really stood
out to me was a quote: “The speed of light is roughly 4 inches per
clock cycle”, building much better intuition on cpu speeds than I’d
had before.
Metrics for preventing security incidents is an interesting take
on building metrics for something I would generally consider
impossible to measure. Personally, I’ve been very skeptical on
building false precision by introducing numbers where none exist, but
I’m coming to understand the value of having something like this.
Books
I treated myself to a physical book as well, thinking about adding a
design to this site: a book by Dieter Rams on design. The 10
commandments are fascinating, and resonate strongly – particularly
Good design is as little design as possible. Sadly enough, the
book’s format is not as friendly to read as I would like – there are
thin columns of text with German and English translations next to each
other, making it somewhat unweildy to read.
Python
Python’s wheel building has been wonderfully inconsistent, and I’ve
slowly been learning more – sometimes with a growing sense of terror
as I consider the sheer number of options. Pypa has some excellent
documentation, with a small page on editable installs. The
Setuptools page on Development Mode is significantly more
detailed and useful; with corresponding PEPS: PEP660. There’s
also a library to build editable wheels. In practice, I often see
libraries installed with a custom .pth file in site-packages, with
a pointer to a custom data loader.
ZSH
I’ve been wanting to have a ZSH prompt that split the terminal, to
make it very obvious when the screen is moving from one command to the
next. All of the mechanisms I’ve been seeing involved using live
calculations, but a simpler trick has been relying on prompt
truncation; documented in the ZSH docs on Prompt Expansion.
Simply using a very long line of unicode box characters and
having them truncated by using %<< has worked very well. The
documentation is a little bit hard to pass, but as a simplified, real
example:
The prompt also explicitly starts with a # character so if I copy
paste my terminal into a script, it simply turns into a comment
without breaking anything. I remember seeing someone use ; which may
be an even more elegant way to achieve this.
Building my own transformer
Inspired by Andrej’s videos, I’ve been slowly iterating on a custom
implementation in Hy & Jax. I expect to fill that out with time.
2023-10-22
A – tentatively – weekly catalog of things I’ve been finding
interesting as a programmer. There’s always something interesting
going on, and I wanted to have some record of what’s been catching my
attention spread across time.
Writing things out – well, or poorly – has generally paid off well
in clarifying what I’m thinking about, showing the gaps in what I’m
thinking, and helping me navigate the world in general.
I hope these letters help me start – and maintain – this practice
again. And that they can capture some of the joy, curiousity,
frustration and sense of excitement I find on programming; and mail
themselves back to me on days I find myself jaded.
If you happen to come across these, you should expect a lot of links
across several domains: programming languages, systems programming,
ML, design, systems and organizational dynamics and whatever happens
to catch my fancy. This, very first, edition is likely to be
significantly longer than the rest just because I have so much to say
it forced me to start writing.
When it comes to books, my eyes are much, much, much bigger than my
stomach. I have far too many I’m trying to read at the same time; some
of the books I’ve read over the past week include:Kill It With
Fire, a fascinating book by Marianne Bellotti which I ran into
while catching up with Strange Loop talks I wasn’t able to
attend. There are several lessons here: the incredible value of
familiarity with the existing systems, why cp and ls were named
the way they were and more.
At the same time, I’d like to have a significantly better handle on
programming GPUs: Programming Massively Parallel Processors has
been a pleasure in both learning about CUDA and being up to date in a
very fast moving world.
On the same note, Understanding Software Dynamics brings
significantly more rigor to my understand of performance;
embarrassingly enough this book disappeared into one of my collections
and I forgot all about it till I stumbled back into it recently.
Bash Idioms, the Google Shell Style Guide and
ShellCheck have been helping me write up some production-worthy
shell scripts (with several questions to ChatGPT along the
way). Misunderstanding parameter expansion led me to committing
broken code repeatedly to the point of printing out a cheat
sheet and a solemn promise to only ever use [[ -z ${1-} ]] and
[[ -n ${1-} ]] when testing for an argument with “strict” mode
(-u) enabled.
Videos
Given I’m working on tools used by people building transformers, and
that I spend most of my day bothering ChatGPT with questions on
documentation I can’t be bothered to read, it seemed like a good idea
to implement Transformers on my own steam. I spent most of a 6 hour
flight watching and re-watching Andrej Karpathy’s video on
NanoGPT while also trying to implement pieces in HyLang and Jax
– as a way to make sure I actually understand the material. I’ve been
making slow progress on the bigram model.
Hy Language
I enjoy using Lisp, and I enjoy writing Python. Re-finding a surprisingly
functional implementation of a Lisp that runs on Python has been
surprisingly cathartic and enjoyable; I expect to use this combo for
most of my personal programs in the near future.
Hy is very usable, and I have a lot of stuck projects: Transformers,
a site generator for this website, migrating my slipbox, working
through PAIP, Let Over Lambda and similar books that get unblocked as
I play with this language. There are rough edges to work through, but
for the most part I find myself delighted.
Intermediate Logging
Last week, I was finally able to talk publicly about some work I did
in 2022; building support for logging intermediate values in PyTorch
– ignoring any transforms that may be applied to the model. It’s some
of the sneakiest code I’ve ever written, with significant amounts of
metaprogramming through code generation. I plan to refactor and
release the code soon; I have some ideas on how to write it in a way
that makes it both easy to understand and to use. The slides for the
talk are available online, and the video should be up soon.