Nuxt HN | Show HN: Tilth v0.4.1 – 29% cheaper Sonnet, 22% on Opus (benchmark: 114 runs)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

v0.4.0 added search ranking, sibling surfacing, transitive callees, cognitive load stripping, smart truncation, and bloom filters. Got -17% on Sonnet, -20% on Opus.

v0.4.1 was pure instruction tuning — zero code changes that alone jumped Sonnet adoption from 89% to 98% and $ cost/correct answer from -17% to -29%.

The instruction tuning result surprised me. The model already knew tilth tools existed — it just wasn’t choosing them consistently. Making the replacement relationship explicit in the tool description was worth more than all the search ranking work in v0.4.0.

Haiku remains the outlier — only 42% tilth adoption despite instruction tuning.

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.