LLMs: The Things We've Been Overlooking

LLMs: The Things We’ve Been Overlooking “What temperature are you using?” If someone asks, what do you say? “The default.” “0.7.” “I don’t know — does it matter?” Most answers fall into one of those three. And if you try to justify the answer, you run out of words fast. That’s how we use LLMs. We call the APIs every day — stuff prompts into messages, send them off, get responses. But when the question becomes “What does Temperature actually do?”, “How is Top-P different from Temperature?”, “Does Prompt Caching just work if you turn it on?”, “Will hallucinations go away with a better model?” — the answers get fuzzy. ...

April 12, 2026 · nbdawn

Why My Second GPU Is Lazy: From PCIe to NVLink, Understanding x86 I/O Bottlenecks

Introduction Put two identical GPUs in the same machine, run the same workload on both, and the second one will often lag. Same model, same driver, same data — different throughput. It is not thermals or a bad BIOS profile. The second GPU is being starved at the bus level, and the reason has nothing to do with the card itself. Most of us live on top of drivers and kernel modules and never need to look down at how x86 systems actually move bytes between the CPU, RAM, and PCIe devices. But the moment you start debugging throughput asymmetry, tuning interrupt affinity, or wondering why irqaffinity matters, hardware topology stops being an abstraction. ...

January 2, 2026 · nbdawn