LLM Neuroanatomy III: Why RYS Works — The Language-Agnostic Middle
Probing a 27B model shows its middle layers organise by meaning, not by language or format — weak evidence against the strong Sapir-Whorf hypothesis, and the reason RYS works.
Probing a 27B model shows its middle layers organise by meaning, not by language or format — weak evidence against the strong Sapir-Whorf hypothesis, and the reason RYS works.
In mid-2024, the HuggingFace Open LLM Leaderboard was the Colosseum for Open-Weight AI. Thousands of models were battling it out, submitted by both well-funded labs with teams of PhDs and fine-tuni...
Introduction Part 1 measured the dual GH200 workstation as a memory system. Part 2 used those measurements to explain why DeepSeek V4 Flash can be fast in vLLM when the model layout fits the hardw...
Introduction Small AI computers are usually sold with large dreams and shitty memory buses. I have a ridiculous server that pulls a few kilowatts, but I wanted a local Hermes Agent box that cou...
Introduction A while back I did some optimisation on my Hopper system for MiniMax M2.1, and this was followed by some deeper GH200 benchmarking, where I measured the machine as a memory-shuffling ...
Introduction This article is mostly for me, as a way to record the peculiarities of my server; but it might come in handy for the ~3 other people running a home Grace-Hopper server? In a previous ...
In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, whic...
Introduction May 2016, Munich. I had just joined NanoTemper Technologies as a Bioanalytics Scientist. If you aren’t familiar with NanoTemper, they build high-end biophysical instruments. At the ti...
Introduction So you’ve built a €9,000 Grace–Hopper “desktop” (see: my previous post involving 16-million-degree GPU temperatures). Running llama.cpp benchmarks is fine, but the real test of local ...
Introduction Running large language models locally has always been a game of compromise. You either spend \$10,000+ on consumer GPUs that can barely handle 70B parameter models, or you dream about...