Working notebook: a commonplace blog for collecting notes & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
Every few months I find myself restless, expecting to find my ideal programming language out there perfectly balancing flexibility, performance, and the ability to express myself clearly; with the ability to prototype rapidly and refine to a great, fast solution. At which point I’ll Google endlessly, read programs, stalk language creators, watch talks, try a new project and then abandon everything and write Python anyways.
My perfect language probably looks like a lisp with homoiconicity, excellent hygienic macros, very tight abstractions and batteries (even if they happen to be stolen from other languages) with a strong blend of pragmatism and ability to maximize performance. If I’m honest with myself, the most likely end is implementing my own language at some point in the future with the tradeoffs set just the way I like them. (Yes, I did AoC in Julia in the past; No, it didn’t stick).
This note is a running commentary on languages I’ve tried out and compared, partly as a way to avoid redoing the same experiments repeatedly; it’s also a bit of a ramble as I figure out what I actually want from a language.
I’ve mostly decided to spend more time playing with Modern C and building my own tiny ecosystem of languages I enjoy; but I have to say that I find myself constantly distracted: languages I’d also like to play with OCAML (which has been looking fairly fresh) and SBCL (always dreaming of a lisp). I’ll add them to the benchmarks, but what I actually want to solve for is have a language I really enjoy writing in: expressive, powerful, not unnecessarily constrained – and something that doesn’t need specialized tools to be effective in.
And then I want to be able to rapidly build anything I’d like to use and run with it – at least in personal projects. Python checks a lot of the boxes, but needs to be complemented by something I can quickly prototype in, while I also build my programming knowledge muscles. The main thing I want to pay attention to is to make sure I use the language I’m investing time in.
Professionally I expect to get better at Python, C++ and Rust – C complements those, and I can easily apply it at work as well. For stretching myself and to get more ideas I’ll play with Erlang, Common Lisp, and ocaml later.
Professionally I’ve been using Python & C++; there are some opportunities to start working with Rust but I’ve been hesitant because it’s been hard to iterate towards a design in Rust.
I took a small tour by implementing day 1 part 1 of Advent of Code ’24 in different languages with naive first implementations (generally heavily helped by Deep Seek) to have a sense of what they felt like. Collecting the solutions, build commands, and perf stat
runtimes here – I generally tried running them multiple times before taking a benchmark to make sure files were loaded into memory.
The solution to the problem is to read in a text file with 2 numbers per line, read them into 2 lists, sort the lists, and then sum up the absolute differences between them.
I’m running some extremely rough benchmarks on a Surface Pro 11 tablet through Ubuntu on WSL2, and then choosing the minimum value from hyperfine (this is a neat trick I’ve learned to minimize noise in benchmarks, at the cost of not being entirely fair across languages). If you’re reading this post, I’d recommend Rosetta Code instead for better examples – this was mainly an exercise for myself.
Something I become more excited about recently with the emphasis on python-like syntax, but still being performant. I liked how easy it was to write short code – though I had to import a surprising number of utility functions. Documentation was easy to google and I could fix up the code quickly.
import std/algorithm
import std/sequtils
import std/streams
import std/strutils
import std/math
var
list1: seq[int]
list2: seq[int]
for line in stdin.lines:
let nums = split(line).filterIt(it != "").map(parseInt)
list1.add(nums[0])
list2.add(nums[1])
sort(list1)
sort(list2)
echo zip(list1, list2).toSeq.map(
proc (pair: (int, int)): int = return abs(pair[0] - pair[1])).sum()
nim c -f -d:release --opt:speed -o:nim_day1 day1.nim
perf stat ./nim_day1 < input.txt
3508942
Performance counter stats for './nim_day1':
1.08 msec task-clock:u # 0.710 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
76 page-faults:u # 70.318 K/sec
2031947 cycles:u # 1.880 GHz
8386153 instructions:u # 4.13 insn per cycle
<not supported> branches:u
16254 branch-misses:u
0.001521502 seconds time elapsed
0.002224000 seconds user
0.002227000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./nim_day1
Benchmark 1: ./nim_day1
Time (mean ± σ): 842.3 µs ± 70.4 µs [User: 70.6 µs, System: 18.7 µs]
Range (min … max): 701.0 µs … 1158.0 µs 3103 runs
Zig took a lot of boilerplate to set up and run, particularly if I compare it with the corresponding C code (this isn’t particularly fair because I’m using an arraylist instead of pure arrays), but it still feels remarkably verbose.
Surprisingly, the execution time is fairly disappointing and far away from the numbers I expected; I suspect this is a function of using a development version of zig instead of the actual release. (I got very similar numbers with the latest zig 0.14 build as well). If I’m doing something obviously wrong, please let me know!
const std = @import("std");
pub fn main() !void {
const stdin = std.io.getStdIn().reader();
var buffered_stdin = std.io.bufferedReader(stdin);
const reader = buffered_stdin.reader();
const allocator = std.heap.page_allocator;
var list1 = try std.ArrayList(i32).initCapacity(allocator, 1024);
defer list1.deinit();
var list2 = try std.ArrayList(i32).initCapacity(allocator, 1024);
defer list2.deinit();
while (true) {
const line = try reader.readUntilDelimiterOrEofAlloc(allocator, '\n', std.math.maxInt(usize));
if (line) |l| {
defer allocator.free(l);
var iter = std.mem.splitScalar(u8, l, ' ');
var i: u32 = 0;
while (iter.next()) |part| {
if (part.len == 0) continue;
const number = try std.fmt.parseInt(i32, part, 10);
if (i == 0) {
try list1.append(number);
} else if (i == 1) {
try list2.append(number);
} else {
return error.UnexpectedNumbers;
}
i += 1;
}
} else {
break;
}
}
std.mem.sort(i32, list1.items, {}, comptime std.sort.asc(i32));
std.mem.sort(i32, list2.items, {}, comptime std.sort.asc(i32));
var difference: i32 = 0;
for (0..list1.items.len) |i| {
const result = list1.items[i] - list2.items[i];
difference += if (result < 0) -result else result;
}
std.debug.print("{}\n", .{difference});
}
zig version
0.14.0-dev.367+a57479afc
Benchmark 1: ./zig_day1
Time (mean ± σ): 4.7 ms ± 0.3 ms [User: 1.8 ms, System: 1.6 ms]
Range (min … max): 4.2 ms … 6.9 ms 664 runs
~/bin/perf stat ./zig_day1 < input.txt
3508942
Performance counter stats for './zig_day1':
15.02 msec task-clock:u # 0.911 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
1022 page-faults:u # 68.028 K/sec
10421190 cycles:u # 0.694 GHz
21411320 instructions:u # 2.05 insn per cycle
<not supported> branches:u
86960 branch-misses:u
0.016498599 seconds time elapsed
0.007535000 seconds user
0.007534000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./zig_day1
Benchmark 1: ./zig_day1
Time (mean ± σ): 6.7 ms ± 1.5 ms [User: 2.9 ms, System: 2.2 ms]
Range (min … max): 5.0 ms … 16.9 ms 340 runs
The old classic: I cheated by preallocating arrays and not bothering to actually allow for different sizes. Sadly I had to google where to find qsort
.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int compare(const void *a, const void *b) {
return (*(int*)a - *(int*)b);
}
int main(int argc, char** argv) {
char line[1024];
int list1[8096];
int list2[8096];
int count = 0;
while (fgets(line, 1024, stdin) != NULL) {
char* token1 = strtok(line, " ");
list1[count] = atoi(token1);
char* token2 = strtok(NULL, " ");
list2[count] = atoi(token2);
count++;
}
qsort(list1, count, sizeof(int), compare);
qsort(list2, count, sizeof(int), compare);
int sum = 0;
for (int i = 0; i < count; i++) {
sum += abs(list1[i] - list2[i]);
}
printf("%d\n", sum);
return 0;
}
gcc -O3 -DNDEBUG day1.c -o gcc_day1
clang -O3 -DNDEBUG day1.c -o clang_day1
~/bin/perf stat ./gcc_day1 < input.txt
3508942
Performance counter stats for './gcc_day1':
0.68 msec task-clock:u # 0.470 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
51 page-faults:u # 74.703 K/sec
674149 cycles:u # 0.987 GHz
1321335 instructions:u # 1.96 insn per cycle
<not supported> branches:u
13792 branch-misses:u
0.001451500 seconds time elapsed
0.001333000 seconds user
0.000000000 seconds sys
~/bin/perf stat ./clang_day1 < input.txt
3508942
Performance counter stats for './clang_day1':
0.97 msec task-clock:u # 0.606 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
51 page-faults:u # 52.833 K/sec
659770 cycles:u # 0.683 GHz
1314873 instructions:u # 1.99 insn per cycle
<not supported> branches:u
13423 branch-misses:u
0.001592600 seconds time elapsed
0.000000000 seconds user
0.000203000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./gcc_day1
Benchmark 1: ./gcc_day1
Time (mean ± σ): 450.2 µs ± 47.1 µs [User: 13.2 µs, System: 12.1 µs]
Range (min … max): 360.2 µs … 1842.4 µs 5201 runs
hyperfine --warmup 3 -N --input input.txt ./clang_day1
Benchmark 1: ./clang_day1
Time (mean ± σ): 449.5 µs ± 45.6 µs [User: 13.0 µs, System: 11.4 µs]
Range (min … max): 357.6 µs … 1763.0 µs 4891 runs
My default language; it took around a minute to type out the solution after having written it in so many other languages.
import sys
l1, l2 = zip(*map(
lambda x: map(int, x.split()),
sys.stdin.readlines(),
))
print(sum(abs(x1 - x2) for (x1, x2) in zip(sorted(l1), sorted(l2))))
~/bin/perf stat python3 day1.py < input.txt
3508942
Performance counter stats for 'python3 day1.py':
22.03 msec task-clock:u # 0.460 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
887 page-faults:u # 40.272 K/sec
22423685 cycles:u # 1.018 GHz
42900551 instructions:u # 1.91 insn per cycle
<not supported> branches:u
425206 branch-misses:u
0.047923501 seconds time elapsed
0.011476000 seconds user
0.011491000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./day1.py
Benchmark 1: ./day1.py
Time (mean ± σ): 6.1 ms ± 0.5 ms [User: 3.6 ms, System: 1.1 ms]
Range (min … max): 5.4 ms … 14.4 ms 478 runs
It was a little bit hard to get LLM help for Janet but the documentation was pretty great / helpful, and I could easily Ctrl-F through it.
(defn parse-ints [line]
(map scan-number (filter (fn [x] (not (= x "")))
(string/split " " (string/trim line)))))
(defn main [&args]
(def list1 @[])
(def list2 @[])
(while (var line (file/read stdin :line))
(def ints (parse-ints line))
(array/push list1 (ints 0))
(array/push list2 (ints 1)))
(sort list1)
(sort list2)
(prin (sum (map math/abs (map - list1 list2))) "\n"))
(declare-project
:name "janet_day1")
(declare-executable
:name "janet_day1"
:entry "day1.janet")
jpm build --optimize=3
perf stat build/janet_day1 < input.txt
3508942
Performance counter stats for 'build/janet_day1':
4.46 msec task-clock:u # 0.933 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
645 page-faults:u # 144.567 K/sec
14155185 cycles:u # 3.173 GHz
45143427 instructions:u # 3.19 insn per cycle
<not supported> branches:u
104382 branch-misses:u
0.004784057 seconds time elapsed
0.004005000 seconds user
0.000000000 seconds sys
hyperfine --warmup 3 -N --input input.txt 'janet day1.janet'
Benchmark 1: build/janet_day1
Time (mean ± σ): 4.3 ms ± 0.4 ms [User: 1.9 ms, System: 0.3 ms]
Range (min … max): 3.5 ms … 7.9 ms 717 runs
Rust was also pleasant to write, but I’ve had a lot of issues iterating fast on code with it which is why I generally don’t reach for Rust by default.
use std::io;
use std::io::BufRead;
pub fn main() -> io::Result<()> {
let stdin = io::stdin();
let mut list1: Vec<i32> = Vec::new();
let mut list2: Vec<i32> = Vec::new();
for line in stdin.lock().lines() {
if let Ok(line) = line {
if !line.is_empty() {
let parts: Vec<i32> = line
.split(" ")
.filter(|&x| !x.is_empty())
.map(|x| x.parse::<i32>().unwrap())
.collect();
list1.push(parts[0]);
list2.push(parts[1]);
}
}
}
list1.sort();
list2.sort();
println!(
"{}",
list1
.iter()
.zip(list2.iter())
.fold(0, |acc, (a, b)| acc + (a - b).abs())
);
Ok(())
}
rustc day1.rs -o rust_day1 --codegen opt-level=3
perf stat ./rust_day1 < input.txt
3508942
Performance counter stats for './rust_day1':
0.49 msec task-clock:u # 0.487 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
62 page-faults:u # 125.684 K/sec
805919 cycles:u # 1.634 GHz
2047961 instructions:u # 2.54 insn per cycle
<not supported> branches:u
9647 branch-misses:u
0.001013531 seconds time elapsed
0.000661000 seconds user
0.000000000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./rust_day1
Benchmark 1: ./rust_day1
Time (mean ± σ): 561.3 µs ± 59.3 µs [User: 23.2 µs, System: 17.6 µs]
Range (min … max): 443.0 µs … 2031.3 µs 5128 runs
I couldn’t resist trying out C++ after writing out this post; and I was also surprised at how familiar the language felt. I guess I’m finally getting comfortable with the language.
#include <algorithm>
#include <iostream>
#include <vector>
#include <cmath>
int main(int argc, char** argv) {
std::vector<int> list1;
std::vector<int> list2;
int num;
while (true) {
if (!(std::cin >> num)) {
break;
}
list1.push_back(num);
std::cin >> num;
list2.push_back(num);
}
std::sort(list1.begin(), list1.end());
std::sort(list2.begin(), list2.end());
int sum = 0;
for (int i = 0; i < list1.size(); i++) {
sum += abs(list1[i] - list2[i]);
}
std::cout << sum << "\n";
return 0;
}
clang++ -O3 -DNDEBUG day1.cpp -o clangcpp_day1
perf stat ./clangcpp_day1 < input.txt
3508942
Performance counter stats for './clangcpp_day1':
0.94 msec task-clock:u # 0.737 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
108 page-faults:u # 114.516 K/sec
2376940 cycles:u # 2.520 GHz
6336631 instructions:u # 2.67 insn per cycle
<not supported> branches:u
26431 branch-misses:u
0.001279137 seconds time elapsed
0.000174000 seconds user
0.000000000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./clangcpp_day1
Benchmark 1: ./clangcpp_day1
Time (mean ± σ): 983.1 µs ± 93.2 µs [User: 117.3 µs, System: 36.6 µs]
Range (min … max): 769.0 µs … 2400.8 µs 3473 runs
I realized later (in 2025) that I forgot to compare Go. Fixing that up quickly.
package main
import (
"bufio"
"fmt"
"sort"
"os"
"strconv"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
scanner.Split(bufio.ScanWords)
var list1 []int64
var list2 []int64
var i = 0
for scanner.Scan() {
val, _ := strconv.ParseInt(scanner.Text(), 10, 32);
if (i % 2 == 0) {
list1 = append(list1, val)
} else {
list2 = append(list2, val)
}
i += 1
}
sort.Slice(list1, func(i, j int) bool { return list1[i] < list1[j] })
sort.Slice(list2, func(i, j int) bool { return list2[i] < list2[j] })
var sum int64 = 0
for i := 0; i < len(list1); i++ {
result := list1[i] - list2[i]
if result > 0 {
sum += result
} else {
sum -= result
}
}
fmt.Println(sum)
}
go build -o go_day1 day1.go
perf stat ./go_day1 < input.txt
3508942
Performance counter stats for './go_day1':
14.70 msec task-clock:u # 0.536 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
106 page-faults:u # 7.211 K/sec
1209830 cycles:u # 0.082 GHz
3109975 instructions:u # 2.57 insn per cycle
<not supported> branches:u
17011 branch-misses:u
0.027412002 seconds time elapsed
0.000000000 seconds user
0.017138000 seconds sys
hyperfine --warmup 3 -N --input input.txt ./go_day1
Benchmark 1: ./go_day1
Time (mean ± σ): 964.8 µs ± 76.3 µs [User: 793.0 µs, System: 154.8 µs]
Range (min … max): 802.1 µs … 2020.8 µs 2985 runs
Language | LoC | Compile Time (s) | Instructions | Min Time (us) | Binary size (k) |
---|---|---|---|---|---|
Nim | 20 | 1.33 | 8386153 | 701.0 | 100 |
Zig | 49 | 1.10 | 21411320 | 4200.0 | 2400 |
C (GCC) | 38 | 0.13 | 1321335 | 360.2 | 14 |
C (Clang) | 38 | 0.11 | 1314873 | 357.6 | 9 |
Python | 8 | ~ | 42900551 | 5400.0 | ~ |
Janet | 16 | 0.09 | 45143427 | 3500.0 | 2000 |
Rust | 34 | 0.68 | 2047961 | 443.0 | 3700 |
C++ | 34 | 0.26 | 6336631 | 769.0 | 15 |
Go | 46 | 0.09 | 3109975 | 964.8 | 2000 |
Overall, I think I’m going to invest more time / energy into Nim and Janet given the results I’m seeing here; particularly given they also seem to have fairly nice ecosystems. C is as useful as always, and I can spend some more time there – I’m fairly tempted to try and make something homoiconic that just transpiles to C to start, and go from there.
The biggest change is that I’ll spend less time with Zig for the foreseeable future.
— Kunal