📊
CTF-All-In-One
  • 简介
  • 前言
  • 一、基础知识篇
    • 1.1 CTF 简介
    • 1.2 学习方法
    • 1.3 Linux 基础
    • 1.4 Web 安全基础
      • 1.4.1 HTML 基础
      • 1.4.2 HTTP 协议基础
      • 1.4.3 JavaScript 基础
      • 1.4.4 常见 Web 服务器基础
      • 1.4.5 OWASP Top Ten Project 漏洞基础
      • 1.4.6 PHP 源码审计基础
    • 1.5 逆向工程基础
      • 1.5.1 C/C++ 语言基础
      • 1.5.2 汇编基础
      • 1.5.3 Linux ELF
      • 1.5.4 Windows PE
      • 1.5.5 静态链接
      • 1.5.6 动态链接
      • 1.5.7 内存管理
      • 1.5.8 glibc malloc
      • 1.5.9 Linux 内核
      • 1.5.10 Windows 内核
      • 1.5.11 jemalloc
    • 1.6 密码学基础
      • 1.6.1 密码学导论
      • 1.6.2 流密码
      • 1.6.3 分组密码
      • 1.6.4 公钥密码
      • 1.6.5 消息认证和哈希函数
      • 1.6.6 数字签名
      • 1.6.7 密码协议
      • 1.6.8 密钥分配与管理
      • 1.6.9 数字货币
    • 1.7 Android 安全基础
      • 1.7.1 Android 环境搭建
      • 1.7.2 Dalvik 指令集
      • 1.7.3 ARM 汇编基础
      • 1.7.4 Android 常用工具
  • 二、工具篇
    • 虚拟化分析环境
      • 2.1.1 VirtualBox
      • 2.1.2 QEMU
      • 2.1.3 Docker
      • 2.1.4 Unicorn
    • 静态分析工具
      • 2.2.1 radare2
      • 2.2.2 IDA Pro
      • 2.2.3 JEB
      • 2.2.4 Capstone
      • 2.2.5 Keystone
      • 2.2.6 Ghidra
    • 动态分析工具
      • 2.3.1 GDB
      • 2.3.2 OllyDbg
      • 2.3.3 x64dbg
      • 2.3.4 WinDbg
      • 2.3.5 LLDB
    • 其他工具
      • 2.4.1 pwntools
      • 2.4.2 zio
      • 2.4.3 metasploit
      • 2.4.4 binwalk
      • 2.4.5 Burp Suite
      • 2.4.6 Wireshark
      • 2.4.7 Cuckoo Sandbox
  • 三、分类专题篇
    • Pwn
      • 3.1.1 格式化字符串漏洞
      • 3.1.2 整数溢出
      • 3.1.3 栈溢出
      • 3.1.4 返回导向编程(ROP)(x86)
      • 3.1.5 返回导向编程(ROP)(ARM)
      • 3.1.6 Linux 堆利用(一)
      • 3.1.7 Linux 堆利用(二)
      • 3.1.8 Linux 堆利用(三)
      • 3.1.9 Linux 堆利用(四)
      • 3.1.10 内核 ROP
      • 3.1.11 Linux 内核漏洞利用
      • 3.1.12 Windows 内核漏洞利用
      • 3.1.13 竞争条件
      • 3.1.14 虚拟机逃逸
    • Reverse
      • 3.2.1 patch 二进制文件
      • 3.2.2 脱壳技术(PE)
      • 3.2.3 脱壳技术(ELF)
      • 3.2.4 反调试技术(PE)
      • 3.2.5 反调试技术(ELF)
      • 3.2.6 指令混淆
    • Web
      • 3.3.1 SQL 注入利用
      • 3.3.2 XSS 漏洞利用
    • Crypto
    • Misc
      • 3.5.1 Lsb
    • Mobile
  • 四、技巧篇
    • 4.1 Linux 内核调试
    • 4.2 Linux 命令行技巧
    • 4.3 GCC 编译参数解析
    • 4.4 GCC 堆栈保护技术
    • 4.5 ROP 防御技术
    • 4.6 one-gadget RCE
    • 4.7 通用 gadget
    • 4.8 使用 DynELF 泄露函数地址
    • 4.9 shellcode 开发
    • 4.10 跳转导向编程(JOP)
    • 4.11 利用 mprotect 修改栈权限
    • 4.12 利用 __stack_chk_fail
    • 4.13 利用 _IO_FILE 结构
    • 4.14 glibc tcache 机制
    • 4.15 利用 vsyscall 和 vDSO
  • 五、高级篇
    • 5.0 软件漏洞分析
    • 5.1 模糊测试
      • 5.1.1 AFL fuzzer
      • 5.1.2 libFuzzer
    • 5.2 动态二进制插桩
      • 5.2.1 Pin
      • 5.2.2 DynamoRio
      • 5.2.3 Valgrind
    • 5.3 符号执行
      • 5.3.1 angr
      • 5.3.2 Triton
      • 5.3.3 KLEE
      • 5.3.4 S²E
    • 5.4 数据流分析
      • 5.4.1 Soot
    • 5.5 污点分析
      • 5.5.1 TaintCheck
    • 5.6 LLVM
      • 5.6.1 Clang
    • 5.7 程序切片
    • 5.8 SAT/SMT
      • 5.8.1 Z3
    • 5.9 基于模式的漏洞分析
    • 5.10 基于二进制比对的漏洞分析
    • 5.11 反编译技术
      • 5.11.1 RetDec
  • 六、题解篇
    • Pwn
      • 6.1.1 pwn HCTF2016 brop
      • 6.1.2 pwn NJCTF2017 pingme
      • 6.1.3 pwn XDCTF2015 pwn200
      • 6.1.4 pwn BackdoorCTF2017 Fun-Signals
      • 6.1.5 pwn GreHackCTF2017 beerfighter
      • 6.1.6 pwn DefconCTF2015 fuckup
      • 6.1.7 pwn 0CTF2015 freenote
      • 6.1.8 pwn DCTF2017 Flex
      • 6.1.9 pwn RHme3 Exploitation
      • 6.1.10 pwn 0CTF2017 BabyHeap2017
      • 6.1.11 pwn 9447CTF2015 Search-Engine
      • 6.1.12 pwn N1CTF2018 vote
      • 6.1.13 pwn 34C3CTF2017 readme_revenge
      • 6.1.14 pwn 32C3CTF2015 readme
      • 6.1.15 pwn 34C3CTF2017 SimpleGC
      • 6.1.16 pwn HITBCTF2017 1000levels
      • 6.1.17 pwn SECCONCTF2016 jmper
      • 6.1.18 pwn HITBCTF2017 Sentosa
      • 6.1.19 pwn HITBCTF2018 gundam
      • 6.1.20 pwn 33C3CTF2016 babyfengshui
      • 6.1.21 pwn HITCONCTF2016 Secret_Holder
      • 6.1.22 pwn HITCONCTF2016 Sleepy_Holder
      • 6.1.23 pwn BCTF2016 bcloud
      • 6.1.24 pwn HITCONCTF2016 House_of_Orange
      • 6.1.25 pwn HCTF2017 babyprintf
      • 6.1.26 pwn 34C3CTF2017 300
      • 6.1.27 pwn SECCONCTF2016 tinypad
      • 6.1.28 pwn ASISCTF2016 b00ks
      • 6.1.29 pwn Insomni'hack_teaserCTF2017 The_Great_Escape_part-3
      • 6.1.30 pwn HITCONCTF2017 Ghost_in_the_heap
      • 6.1.31 pwn HITBCTF2018 mutepig
      • 6.1.32 pwn SECCONCTF2017 vm_no_fun
      • 6.1.33 pwn 34C3CTF2017 LFA
      • 6.1.34 pwn N1CTF2018 memsafety
      • 6.1.35 pwn 0CTF2018 heapstorm2
      • 6.1.36 pwn NJCTF2017 messager
      • 6.1.37 pwn sixstarctf2018 babystack
      • 6.1.38 pwn HITCONCMT2017 pwn200
      • 6.1.39 pwn BCTF2018 house_of_Atum
      • 6.1.40 pwn LCTF2016 pwn200
      • 6.1.41 pwn PlaidCTF2015 PlaidDB
      • 6.1.42 pwn hacklu2015 bookstore
      • 6.1.43 pwn 0CTF2018 babyheap
      • 6.1.44 pwn ASIS2017 start_hard
      • 6.1.45 pwn LCTF2016 pwn100
    • Reverse
      • 6.2.1 re XHPCTF2017 dont_panic
      • 6.2.2 re ECTF2016 tayy
      • 6.2.3 re CodegateCTF2017 angrybird
      • 6.2.4 re CSAWCTF2015 wyvern
      • 6.2.5 re PicoCTF2014 Baleful
      • 6.2.6 re SECCONCTF2017 printf_machine
      • 6.2.7 re CodegateCTF2018 RedVelvet
      • 6.2.8 re DefcampCTF2015 entry_language
    • Web
      • 6.3.1 web HCTF2017 babycrack
    • Crypto
    • Misc
    • Mobile
  • 七、实战篇
    • CVE
      • 7.1.1 CVE-2017-11543 tcpdump sliplink_print 栈溢出漏洞
      • 7.1.2 CVE-2015-0235 glibc __nss_hostname_digits_dots 堆溢出漏洞
      • 7.1.3 CVE-2016-4971 wget 任意文件上传漏洞
      • 7.1.4 CVE-2017-13089 wget skip_short_body 栈溢出漏洞
      • 7.1.5 CVE–2018-1000001 glibc realpath 缓冲区下溢漏洞
      • 7.1.6 CVE-2017-9430 DNSTracer 栈溢出漏洞
      • 7.1.7 CVE-2018-6323 GNU binutils elf_object_p 整型溢出漏洞
      • 7.1.8 CVE-2010-2883 Adobe CoolType SING 表栈溢出漏洞
      • 7.1.9 CVE-2010-3333 Microsoft Word RTF pFragments 栈溢出漏洞
    • Malware
  • 八、学术篇
    • 8.1 The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)
    • 8.2 Return-Oriented Programming without Returns
    • 8.3 Return-Oriented Rootkits: Bypassing Kernel Code Integrity Protection Mechanisms
    • 8.4 ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks
    • 8.5 Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks
    • 8.7 What Cannot Be Read, Cannot Be Leveraged? Revisiting Assumptions of JIT-ROP Defenses
    • 8.9 Symbolic Execution for Software Testing: Three Decades Later
    • 8.10 AEG: Automatic Exploit Generation
    • 8.11 Address Space Layout Permutation (ASLP): Towards Fine-Grained Randomization of Commodity Softwa
    • 8.13 New Frontiers of Reverse Engineering
    • 8.14 Who Allocated My Memory? Detecting Custom Memory Allocators in C Binaries
    • 8.21 Micro-Virtualization Memory Tracing to Detect and Prevent Spraying Attacks
    • 8.22 Practical Memory Checking With Dr. Memory
    • 8.23 Evaluating the Effectiveness of Current Anti-ROP Defenses
    • 8.24 How to Make ASLR Win the Clone Wars: Runtime Re-Randomization
    • 8.25 (State of) The Art of War: Offensive Techniques in Binary Analysis
    • 8.26 Driller: Augmenting Fuzzing Through Selective Symbolic Execution
    • 8.27 Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware
    • 8.28 Cross-Architecture Bug Search in Binary Executables
    • 8.29 Dynamic Hooks: Hiding Control Flow Changes within Non-Control Data
    • 8.30 Preventing brute force attacks against stack canary protection on networking servers
    • 8.33 Under-Constrained Symbolic Execution: Correctness Checking for Real Code
    • 8.34 Enhancing Symbolic Execution with Veritesting
    • 8.38 TaintEraser: Protecting Sensitive Data Leaks Using Application-Level Taint Tracking
    • 8.39 DART: Directed Automated Random Testing
    • 8.40 EXE: Automatically Generating Inputs of Death
    • 8.41 IntPatch: Automatically Fix Integer-Overflow-to-Buffer-Overflow Vulnerability at Compile-Time
    • 8.42 Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits
    • 8.43 DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation
    • 8.44 Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics
    • 8.45 Ramblr: Making Reassembly Great Again
    • 8.46 FreeGuard: A Faster Secure Heap Allocator
    • 8.48 Reassembleable Disassembling
  • 九、附录
    • 9.1 更多 Linux 工具
    • 9.2 更多 Windows 工具
    • 9.3 更多资源
    • 9.4 Linux 系统调用表
    • 9.5 python2到3字符串转换
    • 9.6 幻灯片
Powered by GitBook
On this page
  • 简介
  • 自动化二进制分析
  • 静态漏洞挖掘
  • 控制流图恢复
  • 值集分析
  • 动态漏洞挖掘
  • 动态具体执行
  • 动态符号执行
  • angr 分析引擎
  • 设计目标
  • 子模块:Intermediate Representation
  • 子模块:Binary Loading
  • 子模块:Data Model
  • 子模块:Full-Program Analysis
  • 实现:数据流图恢复
  • 假设
  • 迭代生成 CFG
  • 实现:值集分析
  • 实现:动态符号执行
  • 实现:非约束的符号执行
  • 实现:符号辅助的 fuzzing
  • 实现:崩溃重现
  • 实现:利用生成
  • 实现:利用强化
  • 比较评估

Was this helpful?

  1. 八、学术篇

8.25 (State of) The Art of War: Offensive Techniques in Binary Analysis

Previous8.24 How to Make ASLR Win the Clone Wars: Runtime Re-RandomizationNext8.26 Driller: Augmenting Fuzzing Through Selective Symbolic Execution

Last updated 3 years ago

Was this helpful?

简介

这篇文章提出了一个二进制分析框架,并实现了许多现有的分析技术。通过将这些技术系统化地实现,可以让其他研究人员直接利用并开发新的技术。此外,在统一框架中实现这些技术可以更直接地进行比较,并确定各自的优缺点。

自动化二进制分析

为了保持程序分析的可行性,往往需要在可重现性和语义理解两个方面需要进行权衡:

  • 可重现性:由于分析系统做出的权衡,特定的分析所发现的漏洞可能无法重现。这可能是分析操作的覆盖范围导致的,一些分析从头执行整个应用程序,因此可以推断出触发漏洞的原因,而其他一些分析只是分析了程序的某个部分,这样做可以在特定模块中发现漏洞,但无法完整地推断出触发漏洞的原因,于是无法重现。

  • 语义理解:一些分析缺乏对程序语义的理解。例如,动态分析能够追踪程序执行的代码,但不能理解为什么这些代码被执行或者程序输入的哪些部分导致了这样的执行。

为了得到可重现的输入或者语义理解的能力,就需要对分析技术进行权衡。例如,高可重现性往往和低覆盖率相关,因为要想生成可重现的输入必须知道如何到达任何想要分析的代码,那么它将不能分析尽可能多的代码。另一方面,如果不能通过重现来验证漏洞,那么会产生高误报率(即并不存在漏洞)。在缺乏可重现性的情况下,这些误报必须通过启发式的方法进行过滤,反过来又会引入高漏报率。同样的,为了实现语义理解,必须存储和处理大量的数据。例如,具有语义理解能力的动态分析必须保存下程序分支的条件,而具有语义理解能力的静态分析需要适时地调整数据域。但由于系统资源有限,在分析中必须做出取舍。

下面是一个例子,可以对不同分析技术的能力有个简单的认识:

对于静态分析,它可能会将全部 3 个 memcpy 调用都标记为漏洞(即使 16 行的调用其实是安全的),因为静态分析没有足够的信息来确定漏洞是否真的会发生。另外,静态分析可以得到漏洞的地点,但不能得到触发漏洞的输入。对于动态分析(例如 fuzzing),它通过制造输入来触发漏洞,所以通常有很大可能会漏掉需要精确输入的漏洞,例如第 10 行的漏洞。动态符号执行能够检测出第 10 行的错误并通过约束求解得到输入,也能判断出第 16 行没有漏洞,但是它很可能会漏掉第 30 行,因为有多个潜在的路径不会触发该漏洞。另外,在符号执行进行到循环时,可能存在路径爆炸的问题。

静态漏洞挖掘

Static analyses can be split into two paradigms: those that model program properties as graphs and those that model the data itself.

控制流图恢复

CFG recovery is implemented as a recursive algorithm that disassembles and analyzes a basic block, identifies its possible exits and adds them to the CFG, and then repeats the analysis recursively until no new exits are identified.

CFG recovery has one fundamental challenge: indirect jumps. Specifically, indirect jumps fall into several categories:

  • Computed. The target of a computed jump is determined by the application by carrying out a calculation specified by the code. This calculation could further rely on values in other registers or in memory. A common example of this is a jump table.

  • Context-sensitive. An indirect jump might depend on the context of an application. The common example is qsort() in the standard C library.

  • Object-sensitive. A special case of context sensitivity is object sensitivity. In object-oriented languages, object polymorphism requires the use of virtual functions, often implemented as virtual tables of function pointers that are consulted, at runtime, to determine jump targets.

The goal of CFG recovery is to resolve the targets of as many of these indirect jumps as possible, in order to create a CFG. Depending on how well jump targets are resolved, the CFG recovery analysis has two properties:

  • Soundness. A CFG recovery technique is sound if the set of all potential control flow transfers is represented in the graph generated.

  • Completeness. A complete CFG recovery builds a CFG in which all edges represent actually possible control flow transfers.

值集分析

At a high level, VSA attempts to identify a tight over-approximation of the program state at any given point in the program. This can be used to understand the possible targets of indirect jumps or the possible targets of memory write operations.

动态漏洞挖掘

Dynamic techniques here are split into two main categories: concrete and symbolic execution.

动态具体执行

The most relevant application of dynamic concrete execution to vulnerability discovery is fuzzing.

  • Coverage-based fuzzing. Such fuzzers attempt to produce inputs that maximize the amount of code executed in the target application based on the insight that the more code is executed, the higher the chance of executing vulnerable code.

    • Coverage-based fuzzing suffers from a lack of semantic insight into the target application.

  • Taint-based fuzzing. Such fuzzers analyze how an application processes input to understand what parts of the input to modify in future runs.

    • While a taint-based fuzzer can understand what parts of the input should be mutated to drive execution down a given path in the program, it is still unaware of how to mutate this input.

动态符号执行

Dynamic symbolic execution executes a program in an emulated environment with an abstract domain of symbolic variables. They track the state of registers and memory throughout program execution and the constraints on those variables. Whenever a conditional branch is reached, execution forks and follows both paths, saving the branch condition as a constraint on the path in which the branch was taken and the inverse of the branch condition as a constraint on the path in which the branch was not taken.

  • Classical dynamic symbolic execution. These engines analyze an application by performing path exploration until a vulnerable state is identified.

  • Symbolic-assisted fuzzing. Such fuzzers modify inputs identified by the fuzzing component by processing them in a dynamic symbolic execution engine. Dynamic symbolic execution uses a more in-depth understanding of the analyzed program to properly mutate inputs, providing additional test cases that trigger previously-unexplored code and allow the fuzzing component to continue making progress.

  • Under-constrained symbolic execution. These engines execute only parts of an application in order to increase the tractability of dynamic symbolic execution.

angr 分析引擎

设计目标

  • Cross-architecture support

  • Cross-platform support

  • Support for different analysis paradigms

  • Usability

子模块:Intermediate Representation

We leveraged libVEX, the IR lifter of the Valgrind project. libVEX produces an IR, called VEX, that is specifically designed for program analysis. We used PyVEX to expose the VEX IR to Python.

子模块:Binary Loading

The task of loading an application binary into the analysis system is handled by a module called CLE. CLE abstracts over different binary formats to handle loading a given binary and any libraries that it depends on, resolving dynamic symbols, performing relocations, and properly initializing the program state.

子模块:Program State Representation/Modification

The SimuVEX module is responsible for representing the program state. The state, named SimState in SimuVEX terms, is implemented as a collection of state plugins, which are controlled by state options specified by the user or analysis when the state is created.

  • Registers. SimuVEX tracks the values of registers at any given point in the program as a state plugin of the corresponding program state.

  • Symbolic memory. To enable symbolic execution, SimuVEX provides a symbolic memory model as a state plugin.

  • Abstract memory. The abstract memory state plugin is used by static analyses to model memory. Unlike symbolic memory, which implements a continuous indexed memory model, the abstract memory provides a region-based memory model.

  • POSIX. When analyzing binaries for POSIX-compliant environments, SimuVEX tracks the system state in this state plugins.

  • Log. SimuVEX tracks a log of everything that is done to the state in this plugin.

  • Inspection. SimuVEX provides a powerful debugging interface, allowing breakpoints to be set on complex conditions, including taint, exact expression makeup, and symbolic conditions. This interface can also be used to change the behavior of SimuVEX.

  • Solver. The Solver is a plugin that exposes an interface to different data domains, through the data model provider.

  • Architecture. The architecture plugin provides architecturespecific information that is useful to the analysis. The information in this plugin is sourced from the archinfo module, that is also distributed as part of angr.

子模块:Data Model

Claripy abstracts all values to an internal representation of an expression that tracks all operations in which it is used. These expressions are represented as “expression trees” with values being the leaf nodes and operations being non-leaf nodes.

At any point, an expression can be translated into data domains provided by Claripy’s backends. User-facing operations, such as interpreting the constructs provided by the backends into Python primitives are provided by frontends. A frontend augments a backend with additional functionality of varying complexity.

  • FullFrontend. This frontend exposes symbolic solving to the user, tracking constraints, using the Z3 backend to solve them, and caching the results.

  • CompositeFrontend. Splitting constraints into independent sets reduces the load on the solver. The CompositeFrontend provides a transparent interface to this functionality.

  • LightFrontend. This frontend does not support constraint tracking, and simply uses the VSA backend to interpret expressions in the VSA domain.

  • ReplacementFrontend. The ReplacementFrontend expands the LightFrontend to add support for constraints on VSA values.

  • HybridFrontend. The HybridFrontend combines the FullFrontend and the ReplacementFrontend to provide fast approximation support for symbolic constraint solving.

子模块:Full-Program Analysis

Project is the analyst-facing part of angr, which provides complete analyses, such as dynamic symbolic execution and controlflow graph recovery.

  • Path Groups. A PathGroup is an interface to dynamic symbolic execution.

  • Analyses. angr provides an abstraction for any full program analysis with the Analysis class.

实现:数据流图恢复

  • CFGAccurate. Given a specific program, angr performs an iterative CFG recovery, starting from the entry point of the program, with some necessary optimizations. angr leverages a combination of forced execution, backwards slicing, and symbolic execution to recover, where possible, all jump targets of each indirect jump. Moreover, it generates and stores a large quantity of data about the target application, which can be used later in other analyses such as data-dependence tracking.

  • CFGFast. A secondary algorithm that uses a quick disassembly of the binary (without executing any basic block), followed by heuristics to identify functions, intra-function control flow, and direct inter-function control flow transitions.

假设

angr’s CFGAccurate makes several assumptions about binaries to optimize the run time of the algorithm.

  • All code in the program can be distributed into different functions.

  • All functions are either called by an explicit call instruction, or are preceded by a tail jump in the control flow.

  • The stack cleanup behavior of each function is predictable, regardless of where it is called from. This lets CFGAccurate safely skip functions that it has already analyzed while analyzing a caller function and keep the stack balanced.

迭代生成 CFG

Throughout CFG recovery, CFGAccurate maintains a list of indirect jumps, Lj, whose jump targets have not been resolved. When the analysis identifies such a jump, it is added to Lj. After each iterative technique terminates, CFGAccurate triggers the next one in the list. This next technique may resolve jumps in Lj, may add new unresolved jumps to Lj, and may add basic blocks and edges to the CFG C. CFGAccurate terminates when a run of all techniques results in no change to Lj or C, as that means that no further indirect jumps can be resolved with any available analysis.

  • Forced Execution. angr’s CFGAccurate leverages the concept of Dynamic Forced Execution for the first stage of CFG recovery. Forced Execution ensures that both directions of a conditional branch will be executed at every branch point. CFGAccurate maintains a work-list of basic blocks, Bw, and a list of analyzed blocks, Ba. When the analysis starts, it initializes its work-list with all the basic blocks that are in C but not in Ba. Whenever CFGAccurate analyzes a basic block from this work-list, the basic block and any direct jumps from the block are added to C. Indirect jumps, however, cannot be handled this way. So each indirect jump is stored in the list Lj for later analysis.

  • Symbolic Execution. For each jump J ∈ Lj, CFGAccurate traverses the CFG backwards until it find the first merge point or up to a threshold number of blocks. From there, it performs forward symbolic execution to the indirect jump and uses a constraint solver to retrieve possible values for the target of the indirect jump. If the jump is resolved successfully, J is removed from Lj and edges and nodes are added to the CFG for each possible value of the jump target.

  • Backward Slicing. CFGAccurate computes a backward slice starting from the unresolved jump. The slice is extended through the beginning of the previous call context. That is, if the indirect jump being analyzed is in a function Fa that is called from both Fb and Fc, the slice will extend backward from the jump in Fa and contain two start nodes: the basic block at the start of Fb and the one at the start of Fc. CFGAccurate then executes this slice using angr’s symbolic execution engine and uses the constraint engine to identify possible targets of the symbolic jumps, with the same threshold of 256 for the size of the solution set for the jump target. If the jump target is resolved successfully, the jump is removed from Lj and the edge representing the control flow transition, and the target basic blocks are added to the recovered CFG.

The goal of the fast CFG generation algorithm is to generate a graph, with high code coverage, that identifies at least the location and content of functions in the binary.

  • Function identification. We use hard-coded function prologue signatures, which can be generated from techniques like ByteWeight, to identify functions inside the application.

  • Recursive disassembly. Recursive disassembly is used to recover the direct jumps within the identified functions.

  • Indirect jump resolution. Lightweight alias analysis, dataflow tracking, combined with pre-defined strategies are used to resolve intra-function control flow transfers.

实现:值集分析

Value-Set Analysis (VSA) is a static analysis technique that combines numeric analysis and pointer analysis for binary programs. It uses an abstract domain, called the Value-Set Abstract domain, for approximating possible values that registers or abstract locations may hold at each program point.

  • Creating a discrete set of strided-intervals. The basic data type of VSA, the strided interval, is essentially an approximation of a set of numbers. It is great for approximating a set of normal concrete values. We developed a new data type called “strided interval set”, which represents a set of strided intervals that are not unioned together. A strided interval set will be unioned into a single strided interval only when it contains more than K elements, where K is a threshold that can be adjusted.

  • Applying an algebraic solver to path predicates. Tracking branch conditions helps us constrain variables in a state after taking a conditional exit or during a merging procedure, which produces a more precise analysis result. We implemented a lightweight algebraic solver that works on the strided interval domain, based on modulo arithmetic which take care of some of the affine relations. When a new path predicate is seen, we attempt to simplify and solve it to obtain a number range for the variables involved in the path predicate. Then we perform an intersection between the newly generated number range and the original values for each corresponding variable.

  • Adopting a signedness-agnostic domain. Wrapped Interval Analysis is such an interval domain for analyzing LLVM code, which takes care of signed and unsigned numbers at the same time. We based our signedness-agnostic strided-interval domain on this theory, applied to the VSA domain.

The main interface that angr provides into a full-program VSA analysis is the Value Flow Graph. The VFG is an enhanced CFG that includes the program state representing the VSA fix-point at each program location.

实现:动态符号执行

The dynamic symbolic execution module of our analysis platform is mainly based on the techniques described in Mayhem. Our implementation follows the same memory model and path prioritization techniques.

We use Claripy’s interface into Z3 to populate the symbolic memory model (specifically, SimSymbolicMemory) provided by SimuVEX. Individual execution paths through a program are managed by Path objects, provided by angr, which track the actions taken by paths, the path predicates, and various other path-specific information. Groups of these paths are managed by angr’s PathGroup functionality, which provides an interface for managing the splitting, merging, and filtering of paths during dynamic symbolic execution.

angr has built-in support for Veritesting, implementing it as a Veritesting analysis and exposing transparent support for it with an option passed to PathGroup objects.

实现:非约束的符号执行

We implemented under-constrained symbolic execution (UCSE), as proposed in UC-KLEE, and dubbed it UC-angr. UCSE is a dynamic symbolic execution technique where execution is performed on each function separately.

We made two changes to the technique described in UCSE:

  • Global memory under-constraining.We mark all global data as underconstrained, allowing us to lower our false positive rate.

  • Path limiters. We abort the analysis of a function when we find that it is responsible for a path explosion. We detect this by hard-coding a limit and, when a single function branches over this many paths, we replace the function with an immediate return, and rewind the analysis from the call site of that function.

  • False positive filtering. When we detect an exploitable state, we attempt to ensure that the state is not incorrectly made exploitable by a lack of constraints on under-constrained data.

实现:符号辅助的 fuzzing

Our implementation of symbolic-assisted fuzzing, called Driller, uses the AFL fuzzer as its foundation and angr as its symbolic tracer.

实现:崩溃重现

We implemented the approach proposed by Replayer to recover missing relationships between input values and output values.

We can define the problem of replaying a crashing input as the search for an input specification is to bring a program from an initial state s to a crash state q. Our implementation symbolically executes the path from sa to qa, using the input ia. It records all constraints that are generated while executing P. Given the constraints, the execution path, the program P, and the new initial state sb, we can symbolically execute P with an unconstrained symbolic input, following the previously recorded execution path until the new crash state qb is reached. At this point, the input constraints on the input and output can be analyzed, and relationships between them can be recovered. This relationship data is used to generate the input specification is, allowing the crashing input to be replayed.

实现:利用生成

we generate exploits by performing concolic execution on crashing program inputs using angr. We drive concolic execution forward, forcing it to follow the same path as a dynamic trace gathered by concretely executing the crashing input applied to the program. Concolic execution is stopped at the point where the program crashed, and we inspect the symbolic state to determine the cause of the crash and measure exploitability. By counting the number of symbolic bits in certain registers, we can triage a crash into a number of categories such as frame pointer overwrite, instruction pointer overwrite, or arbitrary write, among others.

实现:利用强化

To harden exploits against modern mitigation techniques, we implemented a ROP chain compiler based on the ideas in Q.

  • Gadget discovery. We scan all executable code in the application, at every byte offset, to identify ROP gadgets and classify them according to their effects. To carry out the classification, our analysis leverages the action history provided by angr’s Path objects and symbolic relations provided by Claripy.

  • Gadget arrangement. The ROP chain compiler then determines arrangements of gadgets that can be used to perform high-level actions.

  • Payload generation. After the ROP compiler identifies the requisite set of gadget arrangements, it combines these gadgets into a chain to carry out high-level actions. This is done by writing gadget arrangements into a program state in angr, constraining their outputs to the provided arguments, and querying the SMT solver for a solution for their inputs.

比较评估

paper
slides
video
source
img
img