Building a High-End AI Desktop
Introduction
Running large language models locally has always been a game of compromise. You either spend \$10,000+ on consumer GPUs that can barely handle 70 B parameter models, or you dream about enterprise hardware you’ll never afford. The Grace-Hopper platform—Nvidia’s unified CPU-GPU superchip architecture—represents the kind of dream-rig AI infrastructure LocalLlama drools over, with systems typically costing well over \$100,000 and exclusively available to data centers and research institutions.
So when I stumbled across a Grace-Hopper system being sold for 10K euro on Reddit, my first thought was “obviously fake.” My second thought was “I wonder if he’ll take 7.5K euro?”.
This is the story of how I bought enterprise-grade AI hardware designed for liquid-cooled server racks, converted it to air cooling, survived multiple near-disasters (including GPUs reporting temperatures of 16 million degrees), and ended up with a desktop that can run 235B parameter models at home. It’s a tale of questionable decisions, creative problem-solving, and what happens when you try to turn datacenter equipment into a daily driver.
If you’ve ever wondered what it takes to run truly large models locally, or if you’re just here to watch someone disassemble $80,000 worth of hardware with nothing but hope and isopropanol, you’re in the right place.
The Deal
Early this year, while browsing r/LocalLLaMA/new, I came across a ridiculously good deal. How good? These were the specs for the server offered for 10K euro, and a serious upgrade to my 4x RTX 4090 rig:
Specs:
- 2x Nvidia Grace-Hopper Superchip
- 2x 72-core Nvidia Grace CPU
- 2x Nvidia Hopper H100 Tensor Core GPU
- 2x 480GB of LPDDR5X memory with error-correction code (ECC)
- 2x 96GB of HBM3 memory
- 1152GB of total fast-access memory
- NVLink-C2C: 900 GB/s of bandwidth
- Programmable from 1000W to 2000W TDP (CPU + GPU + memory)
- 1x High-efficiency 3000W PSU 230V to 48V
- 2x PCIe Gen4 M.2 22110/2280 slots on board
- 4x FHFL PCIe Gen5 x16
UPDATE:Since I bought this, DDR5 RAM prices have become insane. 960GB of fast DDR5 now costs more than what I paid for the whole Grace-Hopper system 🤯
Obviously fake I thought, because
- H100s cost about 30-40,000 euro each, and this system has two of them
- Grace-Hopper NVL2 systems are basically not for sale for consumers anyway!
The Reddit thread explained the reason the system was being sold cheap:
The main reason why is that it is a Frankensystem converted from liquid-cooled to aircooled. Also it is not very pretty and not rackable, because it has a 48V power supply attached. It is originally directly from Nvidia.
I immediately offered to buy it, because why not? If it was a scam, I could always back out, but I wanted to be first in line!
It turns out I live near the seller, and he runs an online shop that sells modified Nvidia server equipment as desktops. It still seemed pretty risky, so I did some research and found a video review of one of his Desktops on Youtube. With the deal now seeming at least plausible, and the seller only a two-hour drive away and agreeing to take cash, it was time to take a Bavarian road trip.
I arrived at a farmhouse in a small forest, and met Bernhard the proprietor of GPTshop.ai. He showed me a nice workshop (plasma cutters, an electronics lab, etc.) from which he fabricates custom cases for the high-end H100 desktops he builds. These desktops seem pretty damn nice, so it’s unfortunate that his webshop gives off shady vibes; the business registration in the Cayman Islands definitely doesn’t help. What I can say though is that this item was heavily discounted, and not what he usually sells.
Disclaimer: I have zero affiliation with GPTshop.ai beyond handing them a stack of cash and receiving a dust-covered server in return. If this were a sponsored post, they probably wouldn’t let me mention the 16 million degree GPU temperatures or the part where I had to free-solder components while praying to the electronics gods.
Disassembling the Grace Hopper server
The server itself was not in great condition. These things run extremely loud and high-throughput fans, and these had sucked in a lot of dust, coating the mainboard so heavily I couldn’t tell the color of the PCB. However, it booted up and ran OK, so I handed over a wad of cash, strapped it into the backseat of my car with the seatbelt (it weighed ~20 kg), and drove it home.
Did I mention it’s loud? Firing up the system is physically painful. There are 8x Sunon dual-fan modules, and each is as loud as a powerful vacuum cleaner, but with a much higher and more annoying pitch. With all 8 running at full power, hearing protection is necessary - I could hear the system running in my basement with the windows closed from 50 meters away! My wife immediately (and quite fairly), banned its use at home. We both work home-office and it was simply too loud for online meetings. But I had other plans anyway…
First things first, I of course quickly decided and then proceeded to strip down the server, after first photo-documenting the various connectors between the various PCBs, modules and mainboard.
Cleaning the Server
The majority of the dust was vacuumed off during disassembly, but there was clearly a lot more under the Grace-Hopper modules. After removing those as well, I decided to go with a full washdown of the mainboard.
I purchased a few litres of Isopropanol, and with a soft brush I went over the whole board a few times to get the remaining fine dust from inside connectors and between SMD-component pins.
I suspected there might also be dust inside the Grace-Hopper modules, but actually, I really just wanted to pop them open to poke around.
The mainboard went on my heated floor to dry for a week, while I moved on to replacing the cooling system.
A new Water Cooling system
I had looked into building a custom water-cooling block, but I was worried about leaks, when I found cheap all-in-one water cooling systems for ~40 euro each on sale. Two per GH200 module would be sufficient, so I carefully measured the dimensions of the GPU die and CPU, as well as screw locations, and threw those into Fusion 360 to model up an adapter block.
I have a Bambu X1, which came in very handy for prototyping the adapter blocks. The tolerances have to be very tight, so I printed several cut-away versions to make sure there was solid contact to the bare GPU die, and a safe margin from contact to fragile parts.
The parts were then sent for CNC milling, and were delivered as the mainboard was finished drying. After using yet more isopropanol to clean off the machining oil, they were mounted without much fuss.
Assembling the Desktop
My go-to material for this kind of project is ProfilAlu from eBay. It’s cheap, stiff, and delivered pre-cut for assembly. I put together a design in Fusion 360, and had the parts in a few days. The various mounts however were much more work. I needed to design a few dozen custom mounts for the various PCBs and air-filter fixings; this used up a few kilos of filament to get things just right.
Disaster(s)
Critical Fan Errors
The system didn’t start to boot anymore. Checking the logs, I saw 16 critical errors, one for each fan in the 8 pairs:
4 08/06/25 19:24:08 CEST Fan FAN_5_F Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM 5 08/06/25 19:24:08 CEST Fan FAN_6_R Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM 6 08/06/25 19:24:08 CEST Fan FAN_8_F Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM 7 08/06/25 19:24:08 CEST Fan FAN_5_R Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM 8 08/06/25 19:24:08 CEST Fan FAN_7_F Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM 9 08/06/25 19:24:08 CEST Fan FAN_8_R Lower Critical going low Asserted Reading 0 < Threshold 2156 RPM…
With the fans removed, the BMC (Baseboard Management Controller) immediately panicked, and shut down the mainboard to prevent thermal damage, even with the water coolers in place. So, I disabled the fan-check subsystem.
1
2
3
4
5
# stops the service for the current session
systemctl stop phosphor-sensor-monitor.service
# prevents the service from starting on the next boot
systemctl disable phosphor-sensor-monitor.service
Who needs hardware monitoring? ¯\_(ツ)_/¯
Nuclear Fusion?
Great! I could start the boot process, and even reach login! But only about 1 time in 4… Not optimal. And even logged in, the server would crash within 2 minutes.
Looking into the BMC logs, I saw:
| Sep 23 08:20:18 | oberon-bmc | shutdown_ok_mon[1478] | event: FALLING EDGE offset: 26 timestamp: [571.615238550] |
| Sep 23 08:20:18 | oberon-bmc | power-status[1493] | event: FALLING EDGE offset: 18 timestamp: [571.632491062] |
| Sep 23 08:20:18 | oberon-bmc | shutdown_ok_mon[545] | SHDN_OK_L-I = 0 |
| Sep 23 08:20:18 | oberon-bmc | shutdown_ok_mon[545] | Asserting SYS_RST_IN_L-O to hold host in reset |
| Sep 23 08:20:18 | oberon-bmc | shutdown_ok_mon[545] | gpioset SYS_RST_IN_L-O = 0 |
| Sep 23 08:20:18 | oberon-bmc | power-status[697] | gpioset SYS_RST_IN_L-O = 0 |
| Sep 23 08:20:18 | oberon-bmc | power-status[697] | Set SYS_RST_IN_L-O=0 |
So, a Critical Failure at 08:20:18:
- SHDN_OK_L-I signal goes low (falling edge detected)
- This immediately triggers a shutdown sequence
- System powers off within ~30 seconds of successful boot
But why?!!? I had shut down the hardware monitoring.
Diving deeper into the logs:
| Oct 05 10:15:00 | oberon-bmc | ipmid[520] | thresholdChanged: Assert |
| Oct 05 10:15:00 | oberon-bmc | ipmid[520] | thresholdChanged: Assert |
| Oct 05 10:15:00 | oberon-bmc | ipmid[520] | thresholdChanged: Assert |
| Oct 05 10:15:00 | oberon-bmc | satellitesensor[2351] | Sensor HGX_GPU_1_TEMP_1 high threshold 92 assert: value 1.67772e+07 raw data nan |
| Oct 05 10:15:00 | oberon-bmc | satellitesensor[2351] | Sensor HGX_GPU_1_TEMP_1 high threshold 89 assert: value 1.67772e+07 raw data nan |
| Oct 05 10:15:00 | oberon-bmc | satellitesensor[2351] | Sensor HGX_GPU_1_TEMP_1 high threshold 87 assert: value 1.67772e+07 raw data nan |
| Oct 05 10:15:00 | oberon-bmc | phosphor-fru-fault-monitor[524] | /xyz/openbmc_project/logging/entry/496 created |
| Oct 05 10:15:00 | oberon-bmc | phosphor-fru-fault-monitor[524] | /xyz/openbmc_project/logging/entry/497 created |
| Oct 05 10:15:00 | oberon-bmc | sensor-monitor[499] | Starting 1000ms HardShutdownAlarmHigh shutdown timer due to sensor /xyz/openbmc_project/sensors/temperature/HGX_GPU_0_TEMP_1 value 16777214 |
Warning: Your GPU should not reach 16,777,214 Celsius during boot. Imagine what would happen under load!
This took some time to debug, as I was quite sure the sensors could not physically handle reading temperatures over 16 million Celsius… But then I noticed something interesting about that specific number:
| Decimal | Binary | Hex |
|---|---|---|
| 16,777,214 | 1111 1111 1111 1111 1111 1110 | 0xFFFFFE |
This is 2²⁴ - 2, which is suspiciously close to the maximum value of a 24-bit unsigned integer. In the hardware world, this is the equivalent of a sensor throwing up its hands and screaming “I have no idea what’s happening!” When hardware can’t read a value properly—whether due to a loose connection, damaged circuit, or initialization failure—it often returns the maximum (or near-maximum) representable value. It’s like the digital version of a shrug.
The logs confirmed this theory: seeing 1.67772e+07 (16,777,214) wasn’t evidence that my GPU had achieved nuclear fusion temperatures 🔥—it was evidence that the temperature sensor had simply stopped working. And if a sensor error is intermittent, the most likely culprit is a loose connection or physical damage.
After spending way too long pursuing software solutions (because who wants to disassemble everything again?), I finally accepted the inevitable and broke out the screwdrivers.
I happened to have bought a new microscope earlier this year, and it turned out to be the perfect tool for diagnosing and fixing the issue. Near one of the modules, I found some damaged surface mount components. The damage must have happened after cleaning, probably during the reassembly of the modules with the copper adapters. They weigh over 2 kg, so a slight bump would have easily caused this damage. Amazingly, the tiny components were still attached to the traces, and so I could measure them easily: a 100 nF capacitor, and 4.7k resistor (both of which I had on-hand, as they are standard values for decoupling circuits). The bad news? I had huge “0805” sized parts (2mm long), these were tiny “0402” (1mm long). And one of the traces was just gone.
With some very fiddly soldering, and scratching off the solder mask on the PCB to expose more trace, I was able to ‘free solder’ the parts into a wonderful 3D sculpture which was then liberally coated in UV-curing mask resin, set, and then held in place with sticky tape. Very professional. After reassembly, the system booted smoothly.
Final Touches
I 3D printed a few extra parts:
- Mounts for the E1.S 8TB SSD I found cheap online
- A full rear-panel, that mounts the 3KW 48V power supply
- Cool-looking mesh to protect the water-cooling radiators and dust filters
Getting the actual GPU working was also painful, so I’ll leave the details here for future adventurers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Data Center/HGX-Series/HGX H100/Linux aarch64/12.8 seem to work!
wget https://us.download.nvidia.com/tesla/570.195.03/NVIDIA-Linux-aarch64-570.195.03.run
# Tell the driver to completely ignore the NVLINK and it should allow the GPUs to initialise independently over PCIe !!!! This took a week of work to find, thanks Reddit!
# create a modprobe config file:
sudo nano /etc/modprobe.d/nvidia-disable-nvlink.conf
# add the driver option
options nvidia NVreg_NvLinkDisable=1
# update the boot files:
sudo update-initramfs -u
# reboot
sudo reboot
Benchmarks
That’s what you’re here for, maybe? I have only just started, but after compiling the latest Llama.cpp version using 144 cores in 90 seconds, here’s some benchmarks on larger LLMs:
| Model | Prompt Processing | Token Generation |
|---|---|---|
| gpt-oss-120b-Q4_K_M | 2974.79 | 195.84 |
| GLM-4.5-Air-Q4_K_M | 1936.65 | 100.71 |
| Qwen3-235B-A22B-Instruct-2507-Q4_K | 1022.79 | 65.90 |
This is pretty unoptimized, but it’s looking promising so far! During the LLM tests I hit around 300W per GPU, far from the 900W max.
Cost Breakdown
Here’s what the entire build actually cost me, from the initial purchase to the final touches:
| Component | Description | Cost (EUR) |
|---|---|---|
| Grace-Hopper Server | 2x GH200 superchips with H100 GPUs (the Frankenstein special) | €7,500 |
| Storage | ‘like-new’ used 8TB E1.S NVMe SSD | €250 |
| Custom Water Cooling Adapters | 2x CNC-milled copper mounting plates for AIO coolers | €700 |
| AIO Water Coolers | 4x Arctic Liquid Freezer III 420 (B-Ware) | €180 |
| Structural Frame | Extruded aluminum profiles, pre-cut and delivered | €200 |
| 3D Printing Filament | 1kg black PLA for custom mounts and brackets | €20 |
| Hardware | Nuts, bolts, and mounting hardware | €50 |
| Cleaning Supplies | 5 liters of 99.9% isopropanol (used liberally throughout) | €20 |
| Aesthetics | LED lighting strip (because RGB makes it faster) | €10 |
| Total | €8,930 |
Not included: hearing protection (absolutely necessary), the microscope I already owned (but proved essential), several failed 3D prints, and the emotional cost of seeing “16,777,214°C” in system logs.
Conclusion
So, was it worth it? I now have a desktop that can run 235B parameter models at home for less than the cost of a single H100. It required disassembling $80,000 worth of enterprise hardware, debugging sensors that reported temperatures approaching the surface of the sun, and free-soldering components under a microscope. Your mileage may vary. Literally: I had to drive two hours to pick this thing up.







