Working notebook: a commonplace blog for collecting notes & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

Choosing a Programming Language

Every few months I find myself restless, expecting to find my ideal programming language out there perfectly balancing flexibility, performance, and the ability to express myself clearly; with the ability to prototype rapidly and refine to a great, fast solution. At which point I’ll Google endlessly, read programs, stalk language creators, watch talks, try a new project and then abandon everything and write Python anyways.

My perfect language probably looks like a lisp with homoiconicity, excellent hygienic macros, very tight abstractions and batteries (even if they happen to be stolen from other languages) with a strong blend of pragmatism and ability to maximize performance. If I’m honest with myself, the most likely end is implementing my own language at some point in the future with the tradeoffs set just the way I like them. (Yes, I did AoC in Julia in the past; No, it didn’t stick).

This note is a running commentary on languages I’ve tried out and compared, partly as a way to avoid redoing the same experiments repeatedly; it’s also a bit of a ramble as I figure out what I actually want from a language.

January 2025

I’ve mostly decided to spend more time playing with Modern C and building my own tiny ecosystem of languages I enjoy; but I have to say that I find myself constantly distracted: languages I’d also like to play with OCAML (which has been looking fairly fresh) and SBCL (always dreaming of a lisp). I’ll add them to the benchmarks, but what I actually want to solve for is have a language I really enjoy writing in: expressive, powerful, not unnecessarily constrained – and something that doesn’t need specialized tools to be effective in.

And then I want to be able to rapidly build anything I’d like to use and run with it – at least in personal projects. Python checks a lot of the boxes, but needs to be complemented by something I can quickly prototype in, while I also build my programming knowledge muscles. The main thing I want to pay attention to is to make sure I use the language I’m investing time in.

Professionally I expect to get better at Python, C++ and Rust – C complements those, and I can easily apply it at work as well. For stretching myself and to get more ideas I’ll play with Erlang, Common Lisp, and ocaml later.

December 2024

Professionally I’ve been using Python & C++; there are some opportunities to start working with Rust but I’ve been hesitant because it’s been hard to iterate towards a design in Rust.

I took a small tour by implementing day 1 part 1 of Advent of Code ’24 in different languages with naive first implementations (generally heavily helped by Deep Seek) to have a sense of what they felt like. Collecting the solutions, build commands, and perf stat runtimes here – I generally tried running them multiple times before taking a benchmark to make sure files were loaded into memory.

The solution to the problem is to read in a text file with 2 numbers per line, read them into 2 lists, sort the lists, and then sum up the absolute differences between them.

I’m running some extremely rough benchmarks on a Surface Pro 11 tablet through Ubuntu on WSL2, and then choosing the minimum value from hyperfine (this is a neat trick I’ve learned to minimize noise in benchmarks, at the cost of not being entirely fair across languages). If you’re reading this post, I’d recommend Rosetta Code instead for better examples – this was mainly an exercise for myself.

Nim

Something I become more excited about recently with the emphasis on python-like syntax, but still being performant. I liked how easy it was to write short code – though I had to import a surprising number of utility functions. Documentation was easy to google and I could fix up the code quickly.

Code

import std/algorithm
import std/sequtils
import std/streams
import std/strutils
import std/math

var
  list1: seq[int]
  list2: seq[int]

for line in stdin.lines:
  let nums = split(line).filterIt(it != "").map(parseInt)
  list1.add(nums[0])
  list2.add(nums[1])

sort(list1)
sort(list2)

echo zip(list1, list2).toSeq.map(
  proc (pair: (int, int)): int = return abs(pair[0] - pair[1])).sum()

Build

nim c -f -d:release --opt:speed -o:nim_day1  day1.nim

Perf

perf stat ./nim_day1 < input.txt
3508942

 Performance counter stats for './nim_day1':

			  1.08 msec task-clock:u                     #    0.710 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
				76      page-faults:u                    #   70.318 K/sec
		   2031947      cycles:u                         #    1.880 GHz
		   8386153      instructions:u                   #    4.13  insn per cycle
   <not supported>      branches:u
			 16254      branch-misses:u

	   0.001521502 seconds time elapsed

	   0.002224000 seconds user
	   0.002227000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./nim_day1
Benchmark 1: ./nim_day1
  Time (mean ± σ):     842.3 µs ±  70.4 µs    [User: 70.6 µs, System: 18.7 µs]
  Range (min … max):   701.0 µs … 1158.0 µs    3103 runs

Zig

Zig took a lot of boilerplate to set up and run, particularly if I compare it with the corresponding C code (this isn’t particularly fair because I’m using an arraylist instead of pure arrays), but it still feels remarkably verbose.

Surprisingly, the execution time is fairly disappointing and far away from the numbers I expected; I suspect this is a function of using a development version of zig instead of the actual release. (I got very similar numbers with the latest zig 0.14 build as well). If I’m doing something obviously wrong, please let me know!

Code

const std = @import("std");

pub fn main() !void {
	const stdin = std.io.getStdIn().reader();
	var buffered_stdin = std.io.bufferedReader(stdin);
	const reader = buffered_stdin.reader();
	const allocator = std.heap.page_allocator;
	var list1 = try std.ArrayList(i32).initCapacity(allocator, 1024);
	defer list1.deinit();
	var list2 = try std.ArrayList(i32).initCapacity(allocator, 1024);
	defer list2.deinit();

	while (true) {
		const line = try reader.readUntilDelimiterOrEofAlloc(allocator, '\n', std.math.maxInt(usize));
		if (line) |l| {
			defer allocator.free(l);
			var iter = std.mem.splitScalar(u8, l, ' ');
			var i: u32 = 0;

			while (iter.next()) |part| {
				if (part.len == 0) continue;

				const number = try std.fmt.parseInt(i32, part, 10);
				if (i == 0) {
					try list1.append(number);
				} else if (i == 1) {
					try list2.append(number);
				} else {
					return error.UnexpectedNumbers;
				}

				i += 1;
			}
		} else {
			break;
		}
	}

	std.mem.sort(i32, list1.items, {}, comptime std.sort.asc(i32));
	std.mem.sort(i32, list2.items, {}, comptime std.sort.asc(i32));

	var difference: i32 = 0;
	for (0..list1.items.len) |i| {
		const result = list1.items[i] - list2.items[i];
		difference += if (result < 0) -result else result;
	}

	std.debug.print("{}\n", .{difference});
}

Build

zig version
0.14.0-dev.367+a57479afc

Benchmark 1: ./zig_day1
  Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 1.8 ms, System: 1.6 ms]
  Range (min … max):     4.2 ms …   6.9 ms    664 runs

Perf

~/bin/perf stat ./zig_day1 < input.txt
3508942

 Performance counter stats for './zig_day1':

			 15.02 msec task-clock:u                     #    0.911 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
			  1022      page-faults:u                    #   68.028 K/sec
		  10421190      cycles:u                         #    0.694 GHz
		  21411320      instructions:u                   #    2.05  insn per cycle
   <not supported>      branches:u
			 86960      branch-misses:u

	   0.016498599 seconds time elapsed

	   0.007535000 seconds user
	   0.007534000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./zig_day1
Benchmark 1: ./zig_day1
  Time (mean ± σ):       6.7 ms ±   1.5 ms    [User: 2.9 ms, System: 2.2 ms]
  Range (min … max):     5.0 ms …  16.9 ms    340 runs

C

The old classic: I cheated by preallocating arrays and not bothering to actually allow for different sizes. Sadly I had to google where to find qsort.

Code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int compare(const void *a, const void *b) {
  return (*(int*)a - *(int*)b);
}

int main(int argc, char** argv) {
  char line[1024];

  int list1[8096];
  int list2[8096];
  int count = 0;

  while (fgets(line, 1024, stdin) != NULL) {
	char* token1 = strtok(line, " ");
	list1[count] = atoi(token1);

	char* token2 = strtok(NULL, " ");
	list2[count] = atoi(token2);

	count++;
  }

  qsort(list1, count, sizeof(int), compare);
  qsort(list2, count, sizeof(int), compare);

  int sum = 0;
  for (int i = 0; i < count; i++) {
	sum += abs(list1[i] - list2[i]);
  }

  printf("%d\n", sum);

  return 0;
}

Build

gcc -O3 -DNDEBUG day1.c -o gcc_day1
clang -O3 -DNDEBUG day1.c -o clang_day1

Perf

~/bin/perf stat ./gcc_day1 < input.txt
3508942

 Performance counter stats for './gcc_day1':

			  0.68 msec task-clock:u                     #    0.470 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
				51      page-faults:u                    #   74.703 K/sec
			674149      cycles:u                         #    0.987 GHz
		   1321335      instructions:u                   #    1.96  insn per cycle
   <not supported>      branches:u
			 13792      branch-misses:u

	   0.001451500 seconds time elapsed

	   0.001333000 seconds user
	   0.000000000 seconds sys


~/bin/perf stat ./clang_day1 < input.txt
3508942

 Performance counter stats for './clang_day1':

			  0.97 msec task-clock:u                     #    0.606 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
				51      page-faults:u                    #   52.833 K/sec
			659770      cycles:u                         #    0.683 GHz
		   1314873      instructions:u                   #    1.99  insn per cycle
   <not supported>      branches:u
			 13423      branch-misses:u

	   0.001592600 seconds time elapsed

	   0.000000000 seconds user
	   0.000203000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./gcc_day1
Benchmark 1: ./gcc_day1
  Time (mean ± σ):     450.2 µs ±  47.1 µs    [User: 13.2 µs, System: 12.1 µs]
  Range (min … max):   360.2 µs … 1842.4 µs    5201 runs

hyperfine --warmup 3 -N --input input.txt ./clang_day1
Benchmark 1: ./clang_day1
  Time (mean ± σ):     449.5 µs ±  45.6 µs    [User: 13.0 µs, System: 11.4 µs]
  Range (min … max):   357.6 µs … 1763.0 µs    4891 runs

Python

My default language; it took around a minute to type out the solution after having written it in so many other languages.

Code

import sys

l1, l2 = zip(*map(
	lambda x: map(int, x.split()),
	sys.stdin.readlines(),
))
print(sum(abs(x1 - x2) for (x1, x2) in zip(sorted(l1), sorted(l2))))

Perf

~/bin/perf stat python3 day1.py < input.txt
3508942

 Performance counter stats for 'python3 day1.py':

			 22.03 msec task-clock:u                     #    0.460 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
			   887      page-faults:u                    #   40.272 K/sec
		  22423685      cycles:u                         #    1.018 GHz
		  42900551      instructions:u                   #    1.91  insn per cycle
   <not supported>      branches:u
			425206      branch-misses:u

	   0.047923501 seconds time elapsed

	   0.011476000 seconds user
	   0.011491000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./day1.py
Benchmark 1: ./day1.py
  Time (mean ± σ):       6.1 ms ±   0.5 ms    [User: 3.6 ms, System: 1.1 ms]
  Range (min … max):     5.4 ms …  14.4 ms    478 runs

Janet

It was a little bit hard to get LLM help for Janet but the documentation was pretty great / helpful, and I could easily Ctrl-F through it.

Code

(defn parse-ints [line]
  (map scan-number (filter (fn [x] (not (= x "")))
			   (string/split " " (string/trim line)))))


(defn main [&args]
  (def list1 @[])
  (def list2 @[])

  (while (var line (file/read stdin :line))
	(def ints (parse-ints line))
	(array/push list1 (ints 0))
	(array/push list2 (ints 1)))

  (sort list1)
  (sort list2)

  (prin (sum (map math/abs (map - list1 list2))) "\n"))

Build

(declare-project
 :name "janet_day1")

(declare-executable
 :name "janet_day1"
 :entry "day1.janet")

jpm build --optimize=3

Perf

perf stat build/janet_day1 < input.txt
3508942

 Performance counter stats for 'build/janet_day1':

			  4.46 msec task-clock:u                     #    0.933 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
			   645      page-faults:u                    #  144.567 K/sec
		  14155185      cycles:u                         #    3.173 GHz
		  45143427      instructions:u                   #    3.19  insn per cycle
   <not supported>      branches:u
			104382      branch-misses:u

	   0.004784057 seconds time elapsed

	   0.004005000 seconds user
	   0.000000000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt 'janet day1.janet'
Benchmark 1: build/janet_day1
  Time (mean ± σ):       4.3 ms ±   0.4 ms    [User: 1.9 ms, System: 0.3 ms]
  Range (min … max):     3.5 ms …   7.9 ms    717 runs

Rust

Rust was also pleasant to write, but I’ve had a lot of issues iterating fast on code with it which is why I generally don’t reach for Rust by default.

Code

use std::io;
use std::io::BufRead;

pub fn main() -> io::Result<()> {
	let stdin = io::stdin();
	let mut list1: Vec<i32> = Vec::new();
	let mut list2: Vec<i32> = Vec::new();

	for line in stdin.lock().lines() {
		if let Ok(line) = line {
			if !line.is_empty() {
				let parts: Vec<i32> = line
					.split(" ")
					.filter(|&x| !x.is_empty())
					.map(|x| x.parse::<i32>().unwrap())
					.collect();
				list1.push(parts[0]);
				list2.push(parts[1]);
			}
		}
	}

	list1.sort();
	list2.sort();

	println!(
		"{}",
		list1
			.iter()
			.zip(list2.iter())
			.fold(0, |acc, (a, b)| acc + (a - b).abs())
	);
	Ok(())
}

Build

rustc day1.rs -o rust_day1 --codegen opt-level=3

Perf

perf stat ./rust_day1 < input.txt
3508942

 Performance counter stats for './rust_day1':

			  0.49 msec task-clock:u                     #    0.487 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
				62      page-faults:u                    #  125.684 K/sec
			805919      cycles:u                         #    1.634 GHz
		   2047961      instructions:u                   #    2.54  insn per cycle
   <not supported>      branches:u
			  9647      branch-misses:u

	   0.001013531 seconds time elapsed

	   0.000661000 seconds user
	   0.000000000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./rust_day1
Benchmark 1: ./rust_day1
  Time (mean ± σ):     561.3 µs ±  59.3 µs    [User: 23.2 µs, System: 17.6 µs]
  Range (min … max):   443.0 µs … 2031.3 µs    5128 runs

C++

I couldn’t resist trying out C++ after writing out this post; and I was also surprised at how familiar the language felt. I guess I’m finally getting comfortable with the language.

Code

#include <algorithm>
#include <iostream>
#include <vector>
#include <cmath>


int main(int argc, char** argv) {

  std::vector<int> list1;
  std::vector<int> list2;

  int num;
  while (true) {
	if (!(std::cin >> num)) {
	  break;
	}

	list1.push_back(num);
	std::cin >> num;
	list2.push_back(num);
  }

  std::sort(list1.begin(), list1.end());
  std::sort(list2.begin(), list2.end());

  int sum = 0;
  for (int i = 0; i < list1.size(); i++) {
	sum += abs(list1[i] - list2[i]);
  }

  std::cout << sum << "\n";

  return 0;
}

Build

clang++ -O3 -DNDEBUG day1.cpp -o clangcpp_day1

Perf

perf stat ./clangcpp_day1 < input.txt
3508942

 Performance counter stats for './clangcpp_day1':

			  0.94 msec task-clock:u                     #    0.737 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
			   108      page-faults:u                    #  114.516 K/sec
		   2376940      cycles:u                         #    2.520 GHz
		   6336631      instructions:u                   #    2.67  insn per cycle
   <not supported>      branches:u
			 26431      branch-misses:u

	   0.001279137 seconds time elapsed

	   0.000174000 seconds user
	   0.000000000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./clangcpp_day1
Benchmark 1: ./clangcpp_day1
  Time (mean ± σ):     983.1 µs ±  93.2 µs    [User: 117.3 µs, System: 36.6 µs]
  Range (min … max):   769.0 µs … 2400.8 µs    3473 runs

Go

I realized later (in 2025) that I forgot to compare Go. Fixing that up quickly.

Code

package main

import (
	"bufio"
	"fmt"
	"sort"
	"os"
	"strconv"
)

func main() {
	scanner := bufio.NewScanner(os.Stdin)
	scanner.Split(bufio.ScanWords)

	var list1 []int64
	var list2 []int64
	var i = 0

	for scanner.Scan() {
		val, _ := strconv.ParseInt(scanner.Text(), 10, 32);
		if (i % 2 == 0) {
			list1 = append(list1, val)
		} else {
			list2 = append(list2, val)
		}

		i += 1
	}

	sort.Slice(list1, func(i, j int) bool { return list1[i] < list1[j] })
	sort.Slice(list2, func(i, j int) bool { return list2[i] < list2[j] })



	var sum int64 = 0
	for i := 0; i < len(list1); i++ {
		result := list1[i] - list2[i]
		if result > 0 {
			sum += result
		} else {
			sum -= result
		}
	}

	fmt.Println(sum)
}

Build

go build -o go_day1 day1.go

Stat

perf stat ./go_day1 < input.txt
3508942

 Performance counter stats for './go_day1':

			 14.70 msec task-clock:u                     #    0.536 CPUs utilized
				 0      context-switches:u               #    0.000 /sec
				 0      cpu-migrations:u                 #    0.000 /sec
			   106      page-faults:u                    #    7.211 K/sec
		   1209830      cycles:u                         #    0.082 GHz
		   3109975      instructions:u                   #    2.57  insn per cycle
   <not supported>      branches:u
			 17011      branch-misses:u

	   0.027412002 seconds time elapsed

	   0.000000000 seconds user
	   0.017138000 seconds sys

Hyperfine

hyperfine --warmup 3 -N --input input.txt ./go_day1
Benchmark 1: ./go_day1
  Time (mean ± σ):     964.8 µs ±  76.3 µs    [User: 793.0 µs, System: 154.8 µs]
  Range (min … max):   802.1 µs … 2020.8 µs    2985 runs

Summary

Language LoC Compile Time (s) Instructions Min Time (us) Binary size (k)
Nim 20 1.33 8386153 701.0 100
Zig 49 1.10 21411320 4200.0 2400
C (GCC) 38 0.13 1321335 360.2 14
C (Clang) 38 0.11 1314873 357.6 9
Python 8 ~ 42900551 5400.0 ~
Janet 16 0.09 45143427 3500.0 2000
Rust 34 0.68 2047961 443.0 3700
C++ 34 0.26 6336631 769.0 15
Go 46 0.09 3109975 964.8 2000

Overall, I think I’m going to invest more time / energy into Nim and Janet given the results I’m seeing here; particularly given they also seem to have fairly nice ecosystems. C is as useful as always, and I can spend some more time there – I’m fairly tempted to try and make something homoiconic that just transpiles to C to start, and go from there.

The biggest change is that I’ll spend less time with Zig for the foreseeable future.

Kunal