书名：Hands-On High Performance Programming with Qt 5
作者名：Marek Krajewski
本章字数：371字
更新时间：2025-04-04 15:04:54

Caches

The attempts to overcome these problems led to a slew of architectural innovations. First, the impedance between CPU and memory speeds was fought using a classic optimization technique we already know, namely, caching, on chip level. The on-chip static RAM (SRAM) memory requires six transistors (forming a flip-flop) per bit, and all of them must be kept under voltage. This means it's expensive and it drives the power consumption up (take care not to hit that power wall!). In exchange, the memory access times are at lightning speed, as all that is needed is to apply the current to the input and read the output.

So, the idea is to add a small caching stage of expensive but fast on-chip memory in front of big, slow but inexpensive main memory. Meanwhile, modern CPUs can command up to three levels of caches, commonly denoted with L1, L2, and L3 acronyms, decreasing in density and speed but increasing in size as the cache level goes up. The figure below shows us an overview of the memory hierarchy typically found in modern CPUs:

As of the time of this writing, access times for L1 caches are in the order of 3 cycles, L2 of 12 cycles, L3 of 38 cycles, and main memory is around 100-300 cycles. The main memory access time is that high because the analog nature of DRAM requires, among other things, periodic charge refreshing, pre-charging of the read line before reading, analog-digital conversion, communication through memory controller unit (MCU), and so on.

Caches are organized in cache lines, which on the current Intel architectures are 64 bytes long. Each cache update will hence fetch the entire cache line from main memory, doing a kind of prefetching already at that level. Speaking about prefetching, Intel processors have a special prefetch instruction we can invoke in assembler code for very low-level optimizations.

In addition to data caches, there's also an instruction cache, because in the von Neumann architecture, both are kept in the common memory. The instruction caches were added to Intel Pentium Pro (P6) as an experiment, but they were never removed since then.