Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
An extra unexpected week in San Francisco. I accidentally walked past the tail end of Bay-to-Breakers on the way to getting coffee (right before writing this week's letter) and was thoroughly confused and amused. Seeing several costumed people running past made up for missing out on the Dance Parade in NYC on Saturday.
I've been using this a lot, and see a lot of potential, but haven't quite made up the time/energy to actually implement features I'm really looking forward to. Having extremely simple query functionality implemented using a mix of bash scripts and fzf has taken me surprisingly far. As much as I appreciate POSIX, I guess I never really internalized the extremely minimal api it exposed for different programs to connect -- and I still find myself surprised just how much is accomplished through that minimal api.
I'm also working to configure nvim to be a good markdown editor and building a surprisingly pleasant/effective editing experience where I can navigate through text/ideas/notes quickly. THis finally feels like a flexible enough alternative that lets me keep flat files, easily swap between tools and still get all the benefits of Luhmann's methods & Notion & Index Cards and all the other tools I've used to try and keep my head in order.
With LLMs being able to easily consume text, fancier CLIs where things just work should have been much more common than they seem to be today, and is something I expect to start noodling around with soon.
Something I haven't figured out the ergonomics for is having a hierarchy of notes easily: the best cheap alternative I tend to have is to use a custom sheet where I mix in indentation by basically converting the first several columns into thinner indents. So I can show a nested hierarchy by simply starting from a different column and rely on how the UI simply overflows cell text to get nesting in a way that I can easily move rows around.
I'd like to actually build this out as a UI for easy modification and managing relationships between notes.
Feeling a little bit lost while playing with TF-IDF and realizing that I was getting extremely broken results because of bugs in implementation but still something that seemed valuable, I wanted to start levelling up in Math a little bit. Based on an answer from ChatGPT-4o I've started reading Elements of Information Theory and generally enjoying the book.
Even revisiting the minimal definitions of entropy (sum of -p . log p
) and mutual information between distributions (sum of -p . log p / log q
) I think I can take another stab at finding outlier logs by looking for logfiles that have the most different distribution from the distribution over the norm. I don't need real outliers, I just need the ones with the most distance.
Tokenization is of course critical: I've worked around it for now by simply normalizing a lot of strings: smashing numbers down to a single 0, smashing punctution into a single _, etc. THis obviously reduces a lot of nuance in the logs but does let me find logs with different stack traces much faster. I think I'll take the approach of mutual information to find logs that I should look at, and then look at cosine similarity & clustering (something else I spent time learning) to figure out batches of logs and hosts to work with.
Writing go continues to be pleasant: I definitely miss the rich python ecosystem (and have now started wondering if I could just reuse Python libraries in Go) -- there are so many good libraries apparently made for data mining.
As a reminder: none of the notes here represent those of my employer, nor do they include confidential information (not that I expect anyone to ever read this site either). With that out of the way, I've tried to think of what happens if I extrapolate into the future while thinking through what transformers can accomplish without needing them to make tremendous jump in capability (sitting in a cafe in SF seems like the right time to explore these ideas):
Most of these seem feasible with current technology with a lot of engineering applied to make things cheaper to deploy, faster, and better integrated -- I'd expect to see a lot of these kinds of applications to pop up within a decade if not sooner (a decade seems extremely conservative if I'm honest).
— Kunal