ConTExT_A Generic Approach for Mitigating Spectre

信息安全论文阅读

论文阅读

发布日期: 2023-10-24

更新日期: 2023-10-24

文章字数: 4.1k

阅读时长: 16 分

讲解：https://www.youtube.com/watch?v=lVeK1C_AHhc

源码：https://github.com/IAIK/contextlight

Introduction

Spectre attack 的攻击方式有：

a cache covert channel（Original）
instruction timings
contention, etc

针对瞬态执行攻击，目前的工作仅仅 try to prevent the cache covert channel

Contributions：

一种硬件软件协同设计以减轻瞬态执行攻击
only minimal changes are necessary
阻止所有已知的 Spectre 变体

Background

A. Transient Execution（瞬态执行）

现代处理器将指令解码为 simpler micro-operations (µOPs)。

Optimizations：针对 µOPs，

not to execute them in-order as given by the instruction stream but to execute them out-of-order as soon as the execution unit and required operands are available. 执行单元&所需操作数可用时立即 (乱序) 执行

Necessities：

需要 reorder buffer 缓冲区来存储来自 µOPs 的 intermediate results，直到 they can be retired as intended by the instruction stream

Speculation 主要包含两种形式：

①运行时程序存在的不同分支结构

Intel提供的几种结构如下：

PHT：Pattern History Table 记录分支历史的模式，并根据该模式预测下一次分支的方向

BHB：Branch History Buffer 记录以前的分支行为，包括分支是否被执行以及执行的方向

BTB：Branch Target Buffer 记录分支指令的目标地址，以便在分支指令执行时能够快速获取目标地址

STL：Speculative Store Bypass（规范存储绕过）

RSB：a small hardware stack of recent return addresses pushed during recent call instructions 存储最近调用指令期间推送的返回地址的硬件栈

②推测数据依赖性的存在

如果结果预测：

正确，reorder buffer 中的指令 are retired in-order
错误，结果被废除（squashed），并通过 flushing the pipeline and the reorder buffer 执行回滚，过程中 all architectural but no microarchitectural changes are reverted

得出结论：瞬态执行存在 measurable microarchitectural side effects

B. Transient-Execution Attacks & Defenses（攻击&防御手段）

定义：利用 microarchitectural state 的变化来提取 sensitive information 攻击称为 transient-execution attacks

分类：

Spectre-type attacks：利用预测机制
Meltdown-type attacks：利用 architectural or microarchitectural fault 后的 transient execution（即利用异常的延迟处理）

Spectre攻击

若干变体：

变体	解释
Spectre-PHT	利用PHT和BHB，使处理器错误预测条件分支后的代码路径
Spectre-BTB	用攻击者选择的目的地毒害BTB，导致在攻击者选择的目的地临时执行代码
Spectre-STL	exploits when the processor transiently uses a stale value（旧值） because it could not find the updated value in the store buffer, e.g., due to aliasing（多个变量或表达式使用同一个内存位置）.
Spectre-RSB & ret2spec	When a ret is executed, the top of the RSB is used to predict the return address. An attacker can force misspeculation（强制错误预测） in various ways, e.g., by overfilling the RSB, or by overwriting the return address on the software stack.

共同点：

use transient execution to access data that they would not access in normal, considerate execution
use this data to influence the microarchitectural state（微架构状态）, which can be observed using microarchitectural attacks, e.g., Flush+Reload
all are executed locally on the victim machine, requiring the attacker to run code on the machine

Meltdown攻击

不利用错误推测，而利用技术来临时执行指令

Between the occurrence of an exception（异常发生） and it being raised（异常引发）, 访问由出错指令检索到的数据的指令可以短暂地执行。

变体	解释
origin	exploited the deferred page fault following a user/supervisor bit violation, allowing to leak arbitrary memory（泄露任意内存）
a variation	allows an attacker to read system registers
microarchitectural data sampling (MDS) attacks	have been demonstrated on other internal buffers of the CPU

防御方法

easiest：禁用预测
- 类似的Intel and AMD：using serializing instructions on both outcomes of a branch
Evtyushkin等人的方法：allow a developer to annotate branches that could leak sensitive data, which are then not predicted（开发人员进行注释）

所有 local Spectre variants 使用 Flush+ Reload 或 Prime+Probe 方法，需要访问 high-resolution timer，一种防御措施是降低 timer 的准确性。

Flush+Reload（清空重载）：

首先，攻击者将目标数据从内存加载到缓存中；

然后，攻击者通过执行clflush指令将该数据从缓存中清空（flush）；

接着，攻击者等待一段时间，再次读取该数据并计算访问时间，如果读取时间较快，说明该数据被其他程序所访问，并且可以通过测量访问时间的差异来推断出目标数据的值。

Prime+Probe（填充探测）：

首先，攻击者将自己的数据加载到缓存中（即进行预热操作）；

接着，攻击者等待一段时间，使得目标系统上的操作也会访问到相同的缓存行，从而将目标数据加载到缓存中；

然后，攻击者执行一个与目标数据无关的计算任务，该计算任务需要访问到同一缓存集（set）中的其他缓存行；

最后，攻击者测量目标数据的访问时间，如果读取时间较短，说明目标数据已经被加载到了缓存中。

Design of CONTEXT

Idea：non-transient mappings 非瞬态映射

指示映射包含在瞬态执行域中不能访问的秘密（must not be accessed within the transient-execution domain）

ConTExT 的优点：

the instructions leaking the secret are not executed
independent instructions (marked with arrows) later on in the instruction stream can still be executed during the out-of-order execution.
- 独立指令虽然无法彻底执行完毕（cannot retire），它们可以预热（warm up） caches 和 buffers, 例如通过触发预取器（prefetchers）

A. Non-Transient Memory Mappings

实现机制包含以下三种：

①Currently Reserved Page-Table Entry Bit

下图是 x86-64 的页表条目，其中46到51位的6位是保留位，可以将其中一个 bit 利用为 non-transient bit

②Currently Ignored Page-Table Entry Bit and Control Register

通过在其中一个CPU控制寄存器（如CR4、EFER或XCR0）中的一个位上启用该功能，操作系统就能意识到具体忽略位的改变语义

优点是物理地址空间可以兼容 4PB （Approach 1只能是2PB），缺点是操作系统不能自由使用the retrofitted ignored bit。

③Memory Type using Page-Attribute Table

Page-Attribute Table 允许操作系统为 classes of pages 重新配置各种属性。

将其中一个PAT条目设置为非瞬态内存类型 '2' （x86定义了6种）

优点是不必对页表项进行语义更改，缺点是需要对操作系统进行过多更改（Linux 已经利用了所有PAT条目）。

B. Secret Tracking

Non-transient mappings 确保在瞬态执行期间不能访问非瞬态内存位置，但我们仍需要保护已经加载到寄存器中的秘密数据。

商用CPU中的寄存器没有内存类型或保护，故我们需要改变硬件来实现对寄存器的保护。

Tainting Registers 以保护其中的敏感数据
- 添加一个额外的非瞬态位来表示寄存器中的值是否是非瞬态的（即标记为非瞬态）。该”标记”会在内存和寄存器之间进行传播，并且即使只访问了寄存器的一部分，仍然会将标记传播到整个寄存器中；
- 对于特殊寄存器 rflags（其会在执行各种指令时更新），引入 shadow_rflags 寄存器来按位追踪其因控制流而产生的taint，并且在分支指令中遇到带标记位的 rflags 时会使流水线停顿；
- taint过程只考虑将寄存器作为目标操作数的指令
Untainting Registers：
- 在不使用非瞬态内存或寄存器的情况下替换寄存器的全部内容
- 将受污染的寄存器写入正常的内存位置，即未标记为非瞬态的内存位置
  - 原理：寄存器溢出到正常(即不安全的)内存位置，那么无论如何都可能泄露潜在的秘密。若该内存操作是无意中发生的，其就是程序中的错误，必须在软件级别进行修复；在许多情况下，将秘密移动到正常内存是有意的行为，因为开发人员认为寄存器不再包含秘密。
- the automated untainting 使 tainted registers 的数量保持较低水平
Taint Propagation across Memory Operations：
- 每个寄存器都有一个 taint bit，但 taint bit 只能传播给其他寄存器，而不能传播到内存。如果一个操作将带有秘密信息（即被污染的）的寄存器写入内存，taint bit 便会永久地丢失；
- 编译器不可避免地需要在标记为非瞬态的内存区域中临时存储（不安全的）寄存器。我们会通过将它们溢出到非瞬态内存位置并从那里读取，逐渐过度估计和污染越来越多的寄存器。
Optimizing Performance via Caching：
- 通过缓存来减少污染过度估计的影响
- 每64位数据的缓存线中引入一位额外的元数据，即每64B缓存行额外增加8位元数据
- 在从内存中读取时，缓存行中存储的位优先于TLB中的信息，也就是说，缓存会覆盖内存映射定义的污染位
- 如果缓存行被清除，寄存器就会自动受到污染
Taint Control：
- 读写Taint：引入MSR IA32_TAINT，每个体系结构寄存器的污染位直接映射到这个MSR的一个位，这允许操作系统在一次操作中读写所有的污染位
- 中断处理：在中断中，首先要保存的应该是IA32_TAINT MSR，因为它包含前一个上下文的taint。由于中断处理程序中不能破坏现有的寄存器内容，所以需要先保存所有在中断处理程序中使用的寄存器。通过将IA32_TAINT值自动复制到IA32_SHADOW_TAINT，可以在任何中断处理程序进行寄存器操作之前，保留所有寄存器的污染状态。IA32_SHADOW_TAINT可以像其他寄存器一样处理，例如操作系统可以在上下文切换时将其保存到内核结构中。

C. Software Support

Idea：Instead of annotating all branches that potentially lead to a secret-dependent operation, application developers simply annotate the secret variables in their applications directly.

Compiler：
- parse（解析） the annotations of secrets
- 编译器将寄存器溢出到堆栈中是不可避免的，并且由于调用约定，一些(可能是秘密的)值必须在堆栈上传递 -> 堆栈也必须使用非瞬态内存映射进行映射
- 为了减少非瞬态堆栈对性能的影响，我们修改编译器，使其仅在必要时使用非瞬态堆栈。这个非瞬态堆栈只包含寄存器溢出、可能的函数参数和返回值（由编译器用作临时内存）
Operating System：
- 操作系统负责 setting up non-transient memory mappings
- 当操作系统解析二进制文件时，它可以直接设置非瞬态内存映射，这些映射由编译器标记
- 更改：The operating system has to save and restore taint values on context switches（上下文切换）. The hardware already saves the current taint value of all registers into the IA32_SHADOW_TAINT MSR upon interrupts. Thus, the operating system only has to read this register and save it together with all other saved registers.
- 当从二进制文件初始加载非瞬态内存位置的内容时，操作系统还必须刷新缓存（因为到内存页的初始数据传输不是通过非瞬态用户空间映射完成的），因此，操作系统必须在此操作之前禁用缓存，或者在此操作之后刷新相应的缓存行。（该功能已经存在于 x86 ISA 中）

Implementation of CONTExT

A. Hardware Simulation

模拟器：open-source x86-64 emulator Bochs

我们在Bochs中实现行为变化来分析功能，并在真实的CPU上使用ConTExT-light来估计性能开销

Page-Table Entry：为了区分非瞬态和正常的内存映射，我们必须在PTE中相应地标记每个内存映射。
Translation Lookaside Buffer：现代CPU在TLB中缓存页表项。For cached Page-Table Entry, memory accesses（内存访问） use the cached non-transient bit from the TLB.
Cache：Bochs only implements an instruction cache, but no data cache, which plays a vital role in our design to cache taint information. Hence, we extended Bochs with data-cache emulation by implementing an 8-way (inclusive) last-level cache. 在我们的模拟缓存中，我们为每条缓存线添加了8个 taint bit 作为元数据。
Model-Specific Registers：We added two new MSRs to Bochs. Accesses to IA32_TAINT are directly mapped to the taint bits of the architectural registers, allowing the operating system to read and write all at once.

B. ConTExT-light

ConTExT-light依赖于一个属性，即存储在不可缓存内存中的值通常不能在瞬态执行域内使用，除非该值在寄存器中达到体系结构层面，或者在加载缓冲区、存储缓冲区或行填充缓冲区中达到微体系结构层面。这是因为在这些缓冲区中存储的值通常不会被瞬态执行期间的攻击利用。

Consists of two parts, a kernel module, and a runtime library. For the full ConTExT, we provide a compiler extension that minimizes the performance penalties of register spills.

Evaluation

A. Security

安全保证建立在以下两个假设：

开发人员正确注释所有秘密【重要】
应用程序不会主动泄露秘密

系统允许不影响微架构状态的指令进行临时执行。对于整个内存都是秘密的极端情况，ConTExT不会允许任何临时指令的执行。

①区分两种情况，即在应用程序中是否使用了具有体系结构意义的秘密信息：

Architecturally Unused Secrets：攻击者无法从非临时内存区域中获取信息
Architecturally Used Secrets：攻击者可以从任何未受保护的副本中获取信息，但只要目标被标记为秘密，攻击就会失败

②将秘密复制到未标记为非瞬态的内存区域的操作可能会受到攻击。

但是，编译器永远不会隐式地生成这些操作，因为编译器只将堆栈用作临时内存。因此，这样的操作必须由应用程序开发人员显式地定义，这违背了应用程序不会主动泄露机密的假设。

Limitations：

ConTExT-light cannot protect secrets while they are architecturally stored in registers of running threads. Furthermore, ConTExT-light is not designed as a protection against Meltdown-type attacks.
ConTExT只有在应用程序开发人员正确使用的情况下才有效，也就是说，如果开发人员将所有秘密标记为秘密，并且不主动泄露秘密。

B. Performance

类别	增加开销	备注
Compiler Extension(using CONTExT)	1.26%(average), 5.13%(worst)	额外的非瞬态堆栈的设置时间
OpenSSL RSA	71.14 % (± 4:66 %, n = 10 000).	RSA在安全缓冲区内执行许多 in-place operations
AES	338%	将AES密钥以（中间轮密）注释为秘密
OpenSSH	连接时间(包括ConTExT-light初始化时间)平均增加了24.7% ($n = 1000$， $σ_{\overline{x}} =0.038$)	主要 asset 是存储在内存中的私钥。为应用ConTExT-light，我们注释了全局变量并更改了sshbufs函数中的堆分配，以使用ConTExT-light提供的堆操作函数，导致了14行代码的更改
VeraCrypt	对于挂载加密容器，平均时间从1.59 s增加到1.64 s，增加了3.21% ($n = 1000$， $σ_{\overline{x}} =0.001$)	VeraCrypt使用SecureBuffer类来存储敏感数据，例如主密码，则利用其保护SecureBuffer的实例即可
OATH One Time Password Tool	无关	支持基于时间的一次性密码算法(TOTP)，基于用户和服务之间的共享密钥，该工具计算共享密钥和当前时间的加密散列。主要的更改确保存储共享密钥的缓冲区以及用于散列计算的缓冲区被标记为不可用。
Password Manager	a slowdown from 0:162 s to 0:248 s for a slowdown of 53 % with ConTExT-light applied	确保存储主密码以及解密密钥的缓冲区被标记为不可缓存
NGINX	a decrease from 63 695 to 59 071 transactions, a decrease of 7.3 %. The average response time per transaction increased from 0.62 s to 0.65 s	使用ConTExT-light修改NGINX来保护证书密钥