Hunting the Repetition Loop in a Self-Hosted LLM Agent

When the Agent Kept Repeating Itself At first I thought a request had hung. Where a tool call should have been, the model was instead generating its way toward max_tokens and getting nowhere — sometimes repeating the same sentence over and over, other times just producing low-value filler that never resolved into the JSON the tool call needed. Either way it would burn through the token budget, occasionally time out, and take the whole agent loop down with it. ...

June 23, 2026 · nbdawn

Why My Second GPU Is Lazy: From PCIe to NVLink, Understanding x86 I/O Bottlenecks

Introduction Put two identical GPUs in the same machine, run the same workload on both, and the second one will often lag. Same model, same driver, same data — different throughput. It is not thermals or a bad BIOS profile. The second GPU is being starved at the bus level, and the reason has nothing to do with the card itself. Most of us live on top of drivers and kernel modules and never need to look down at how x86 systems actually move bytes between the CPU, RAM, and PCIe devices. But the moment you start debugging throughput asymmetry, tuning interrupt affinity, or wondering why irqaffinity matters, hardware topology stops being an abstraction. ...

January 2, 2026 · nbdawn