Why My Second GPU Is Lazy: From PCIe to NVLink, Understanding x86 I/O Bottlenecks

Introduction How does a CPU communicate with an NVMe drive, a network card, or a GPU? Most software engineers rely on drivers and kernel modules without thinking about the underlying mechanisms. But when you’re debugging performance issues, tuning interrupt handling, or trying to understand why perf shows certain bottlenecks, you need to know what’s happening at the hardware level. x86 systems use three primary mechanisms for CPU-peripheral communication: Port-mapped I/O (PMIO): The CPU uses dedicated I/O instructions to access a separate address space Memory-mapped I/O (MMIO): Peripherals expose their control registers as memory addresses Direct Memory Access (DMA): Peripherals transfer data directly to/from RAM without CPU intervention Modern PCIe devices—your NVMe drives, network cards, GPUs—almost exclusively use MMIO and DMA. PMIO is legacy, primarily seen in older ISA devices and backward compatibility modes. Understanding these mechanisms helps you interpret system behavior: why DMA buffer sizes affect throughput, why interrupt moderation matters for latency, and what kernel parameters like irqaffinity actually control. ...

January 2, 2026 · nbdawn