Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

2024-05-05

Tempus Fugit.

I don't really remember where April went; I still think it was March just yesterday.

Multiprocessing Queues

Spent most of Saturday debugging with several team-mates to realize that Multiprocessing queues will create their own private thread that copies values into a buffer, pickles them and sends them over a pipe to the other process. The part that can bite you is if you mutate the put object before it gets pickled.

I wrote up a small Thread as a teaser and also put together a demo gist to show the issue. The one reply I did get on Threads misattributed the problem to parallelism with the subprocess getting a chance to run: to make it more explicit I've updated the gist to only run the subprocess after the first one is finished. The race is between the main thread in the parent process sending values and mutating them against the Queue's inner thread.

Another coworker asked why my inital repro's only had 1 or 11 in the values (ie no mutations or all mutations): I can only attribute this to the points at which Python lets threads interleave I guess; adding a sleep(0) to the mutation lets me see a wider range of scenarios.

Working through this also reminded me of the value of doing debugging by understanding instead of debugging by hit-and-trial; even if the cost of understanding seems significantly higher over time debugging by hit-and-trial ends up getting nowhere.

A refreshed `.tmuxrc`

I started refreshing my dotfiles with my tmux configuration; I'm doing them piecewise (because the software to edit these configurations is also affected by them) and making sure they work well for me. There are a couple of new things I'm applying:

conditional code execution: I rely on using 2 tmux instances, one running locally on my laptop with the prefix C-b and one running remotely on any devservers with the prefix M-b. That allows my keybindings to be more or less symmetrical and I can easily leverage all of Tmux's fancy features without having to think about it. I used to achieve this with subtly different configurations on both ends, but I found out that with if-shell I can run configurations conditionally, and the variable $SSH_CLIENT is only set in an SSH session (thanks to Claude).
I explicitly asked Claude to review my tmux rc and suggest improvements: while ther was nothing permanent I did find out that tmux allows for synchronized-panes (enable with :setw synchronized-pane). This let me easily manipulate a live HPC job with 2 hosts -- both commands were automatically mirrored across the hosts.
- I did find myself reaching for tricks like vi $(ls -t | head -n 1) to edit the latest file across both hosts because filenames would often be different.

Autocomplete anything on screen

Another thread I posted earlier in the week involved a new ZSH, Tmux + FZF trick I finally managed to put together (again, leaning on Claude to parse man pages for me). I put out a thread and gist about it and recorded a video for co-workers, but an annotated version of the script:

# Grabs the contents of all the panes in the current window for easy processing.
function _print_all_panes() {
  # List all visible panes, changing the output format to only show the pane-id (used in the next set of commands)
  for pane_id in $(tmux list-panes -F '#{pane_id}'); do
    # tmux capture-pane: starting from the first visible line (`-S 0`) to the end (`-E`). `-t` identifies which
    # pane to capture, while `-p` redirects output to stdout and `-J` makes sure wrapped lines show up as joined.
    # This is piped to `tr` to replace spaces with new lines -- giving me one word per line. The sort & grep get
    # rid of pure collections of symbols, only giving me words and numbers to complete on.
    #
    # TODO: Explore additional tokenization strategies, to allow breaking up paths/into/components.
    # TODO: Remove duplicated output across panes
    tmux capture-pane -p -J -S 0 -E - -t "$pane_id" | tr ' ' '\n' | sort -u | rg '[a-zA-Z0-9]+'
  done
}

# The actual auto-complete function
_tmux_pane_words() {
  # `LBUFFER`, `RBUFFER` and `CURSOR` are magical environment variables from `zle` with the contents of the entered text
  # left and right of the cursor, with cursor marking the actual position.
  # Grab any half completed word in the LBUFFER (removing a greedy match that ends with a space)
  local current_word="${LBUFFER##* }"
  # Get rid of the half completed word in the rbuffer if any, greedly removing non space characters
  # I had to spend non trivial amounts of time reading zsh regex matching to get the behavior I expected.
  local new_rbuffer="${RBUFFER/#[^ ]##/}"
  # Build the prompt for fzf, using the ␣ as a way to mark insertion point for the completion
  local prompt="${LBUFFER% *} ␣ $new_rbuffer "

  # Tokenize and print the pane contents and generate an fzf window with the half-completed word from the LBUFFER as the content
  # `--layout=reverse` because I don't like needing my eyes to jump to the new cursor position when fzf pops up
  # `--no-sort` because we already did it, with the caveat of needing to de-dupe across panes
  # `--print-query` for the case when we can't find a good match; this prints the query first and any selections after
  # If the user doesn't select anything, rely on the fact that the query was filled in to choose the completion; that's why the `tail -n1`
  local selected_word=$(_print_all_panes | fzf --query="$current_word" --prompt="$prompt" --height=20 --layout=reverse --no-sort --print-query | tail -n1)

  # Build the new lbuffer with the completion; doing the opposite of the original aline
  local new_lbuffer="${LBUFFER% *} $selected_word"
  BUFFER="$new_lbuffer$new_rbuffer"
  # Reposition the cursor to the end of the completion
  CURSOR="${#${new_lbuffer}}"

  # Ask the zsh line editor to redraw the line with the new contents` 
  zle redisplay
}

# Register the completion mechanism, I went with `Ctrl-U`.`
zle -N _tmux_pane_words
bindkey '^U' _tmux_pane_words

Stanford Lectures

This week's lecture was a little more abstract but had some interesting ramifications and applications for being able to build small and focused LLMs. The main paper. There's also emphasis on the importance of finding the right starting values.

More go hacking

I've started working on building some CLI programs with Go (yet another time/notes/calendar/notion-equivalent) management app; but with the excellent TCell library and surprisingly powerful terminals available these days I'm much more bullish about good CLIs. The big hidden bonus is that I can shell out to Vim or Emacs for actually editing notes, while leaving the actual management to the app itself, which is an excellent bonus and partially inspired by how easy FZF makes it. I've picked up The Power of Go Tools to help me write idiomatic Go with the right approaches faster.

FZF is also the reason why I've been so impressed with go recently: I've begun to realize that languages end up making programs have a certain taste for the lack of a better word; some characteristics stand out: Python programs have very distinctive CLIs, slightly noticeable sluggishness; Javascript tends to be a bit faster and the CLIs tend to be very colorful. Rust is colorful, but generally characterized by being very fast. The most used CLIs tend to be C or similar languages. And finally some Go programs tend to be surprisingly useful: fzf, gotty, etc. Of course it's not perfect (until a few seconds ago I thought jq was also written in Go).

The prevalence of closures and function objects in Go has been the most surprising (and pleasant) departure from my previous assumptions about Go so far; they make programming significantly more ergonomic -- though there are also some factory patterns I don't think I'm going to enjoy (such as using a function to manipulate structures to set default arguments).

Anyways, I'm calling this new project termdex for terminal index cards. More updates next week!