L09 Machine-Level Programming V: Advanced Topics

Source: https://www.bilibili.com/video/BV1iW411d7hd?p=9

Today

内存安排 Memory Layout

缓冲区溢出 Buffer Overflow

漏洞 Vulnerability

防护 Protection

联合体 Unions【这是一种数据结构】

x86-64 Linux 内存安排

--------------------- \  <- 0000 7FFF FFFF FFFF 由于硬件限制，内存仅使用 47 位(128TB)而非 64 位
|       Stack       |  \
---------------------   \
|         |         |    8MB 此为常用机器的设置，你可以使用 limit 命令查看得到  stacksize 这一数据
|         v         |   /
|                   |  /
--------------------- /
|                   |
|                   |
|                   |
---------------------
|  Shared Libraries |
---------------------
|                   |
|         ^         |
|         |         |
---------------------
|                   |       
|       Heap        |
|                   |
---------------------
|       Data        |
---------------------
|       Text        |
--------------------- <- Hex address 0000 4000 0000 0000
|                   | 
--------------------- <- Hex address 0000 0000 0000 0000

栈 Stack

Runtime stack (8MB limit)

例如本地变量 e. g. local variables

堆 Heap【这部分用于存放所有因 malloc() 或其相关函数申请的变量】

Dynamically allocated as needed

When call malloc(), calloc(), new()

数据区 Data 【这部分存放了所有的全局变量】

（存储）静态分配数据 Statically allocated data

例如全局变量、静态变量、字符串常量 e.g. global vars, static vars, string constants

文本段 Text / Shared Libraries 【文本段用于存放可执行程序】【SL 用于存放库函数对应的可执行程序，动态加载】

（存储）可执行机器指令 Executable machine instructions

只读 Read-only

例子如下：

char big_array[1L<<24];       /*  16 MB */ // -> 0000 0000 8060 1060 Data
char huge_array[1L<<31];      /*   2 GB */ // -> 0000 0000 0060 1060 Data

int global = 0;

int useless() { return 0; }                // -> 0000 0000 0040 0590 Text

int main() {                               // -> 0000 0000 0040 060c Text
    void *p1, *p2, *p3, *p4;
    int local = 0;                         // -> 0000 7ffe 4d3b e87c Stack 
    p1 = malloc(1L << 28);    /* 256 MB */ // -> 0000 7f72 62a1 e010 Heap (Upper)
    p2 = malloc(1L << 8);     /* 256  B */ // -> 0000 0000 8359 d010 Heap (Lower)
    p3 = malloc(1L << 32);    /*   4 GB */ // -> 0000 7f71 62a1 d010 Heap (Upper)
    p4 = malloc(1L << 8);     /* 256  B */ // -> 0000 0000 8359 d120 Heap (Lower)
    /* Some print statements ... */
}

可以看到有上下两个堆，其扩展方向也相反，小空间接近数据区，大空间接近栈区，原因是 malloc() 使用 mmap 分配大块内存，使用 brk 分配小块内存。

缓冲区溢出

一般称作缓冲区溢出 Generally called a "buffer overflow"

当超过了为数组分配的内存大小时 when exceeding the memory size allocated for an array

为什么很严重？ Why a big deal?

这是第一技术漏洞 It's the #1 technical cause of security vulnerability

在所有起因中第一的时社会工程/用户忽略 #1 overall cause is social engineering / user ignorance

最常见的形式 Most common form

未检查字符串输入的长度 Unchecked lengths on string inputs

部分是因为栈上有界字符数组 Particularity for bounded character arrays on the stack

sometimes referred to as stack smashing

字符串库代码 String Library Code

Unix 中函数 gets() 的实现 Implementation of Unix function gets()

/* Get string form stdin */
char *gets(char *dest) {
    int c = getchar();
    char *p = dest;
    while (c != EOF && c != '\n') {
        *p++ = c;
        c = getchar();
    }
    *p = '\0';
    return dest;
}

不去特殊地限制读入字符数量 No way to specify limit on number of characters to read

其他库函数也有类似问题 Similar problems with other library functions

strcpy()、strcat()：复制一定长度的字符串 Copy strings of arbitrary length

scanf()、fscanf()、sscanf()，特别是使用 %s 时 when given %s conversion specifically.

有漏洞的缓冲区代码 Vulnerable Buffer Code

/* Echo Line */
void echo() {
    char buf[4];  /* Way too small! */
    gets(buf);
    puts(buf);
}
void call_echo() {
    echo();
}

unix>./bufdemo-nsp
Type a string:012345678901234567890123

缓冲区溢出反编译 Buffer Overflow Disassembly

0000 0000 0040 06cf <echo>:
  4006cf: 48 83 ec 18        sub     $0x18,%rsp
  4006d3: 48 89 e7           mov     %rsp,%rdi
  4006d6: e8 a5 ff ff ff     callq   400680 <gets>
  4006db: 48 89 e7           mov     %rsp,%rdi
  4006de: e8 3d fe ff ff     callq   400520 <puts@plt>
  4006e3: 48 83 c4 18        add     $0x18,%rsp
  4006e7: c3                 retq

call_echo()：

  4006e8: 48 83 ec 08        sub     $0x8,%rsp
  4006ec: b8 00 00 00 00     mov     $0x0,%eax
  4006f1: e8 d9 ff ff ff     callq   4006cf <echo>
  4006f6: 48 83 c4 08        add     $0x8,%rsp
  4006fa: c3                 retq

缓冲区溢出到栈 Buffer Overflow Stack

--------------------- 
|    Stack Frame    |
|   for call_echo   |   
|                   |   
|                   |  
--------------------- 
|  Return Address   |
|     (8 Bytes)     |
---------------------
|  20 Bytes Unused  |
|                   |
---------------------
|  [3]|[2]|[1]|[0]  | buf <- %rsp
---------------------

如上图所示，我们输入 23 个字符，恰好不越界（字符串结尾自带值为 00 的 \0 字符），用光了整个缓冲区。

再加，则会溢出到返回地址区域，并引发段错误（segmentation fault）【当试图访问不被允许访问的内存区域就会段错误】

代码注入攻击 Code Injection Attacks

void P() {
    Q();
    ...   // <- return address
}
int Q() {
    char buf[64];
    gets(buf);
    ...
    return ...;
}

---------------------  ---
|                   |   ^
|                   |   |
|                   |   P stack frame
---------------------   |---
|         B         |   v ^
---------------------  ---|--- 
|        pad        |     | ^
|                   |     da|ta written by gets()
---------------------     | |
|      exploit      |     v Q stack frame
|       code        | <--B- |
---------------------       |
|                   |       v
---------------------      ---

换句话说，攻击者通过输入特定字符串，将程序返回地址篡改了，从而劫持了程序运行的控制权，进而执行任何该控制权可以执行的命令（即执行注入的代码），当这个命令为最高权限时，攻击者就完全掌握了这个机器。【此事在 NAD pwn 中亦有记载】

Input string contains byte representation of executable code

Overwrite return address A with address of buffer B

When Q() executes ret, will jump to exploit code.

代码注入攻击曾经是一个相当大的问题（你看这些输入字符串的库函数随随便便拉出来一个都有这漏洞），1988 年，第一个互联网攻击被称为「莫里斯蠕虫」（the Morris worm），当时互联网规模还比较小，所以它基本上感染了当时互联网上大部分机器（笑点解析：MIT6.824 分布式系统的课程的作者就是该病毒的作者）

基于缓冲区溢出的攻击 Exploits Based on Buffer Overflows

缓冲区溢出漏洞能让远端机器在受害机上执行任意命令 Buffer overflow bugs can allow remote machines to execute arbitrary code on victim machines

Distressingly common in real programs

Programmers keep making the same mistakes.

Recent measures make these attacks much more difficult

Examples across the decades

Original "Internet worm" (1988)

"IM wars" (1999)

Twilight hack on Wii (2000s)

... and many, many more

You will learn some of the tricks in attack lab

Hopefully to convince you to never leave such holes in your programs.

Example: the original Internet worm (1988)

Exploit a few vulnerability to spread

Early versions of the finger server (fingered) used gets() to read the argument sent by the client.

finger droh @cs.cmu.edu 【现在被广泛禁用的 finger 命令，用于向远端机器发送消息】

Worm attacked fingered server by sending phony argument:

finger "exploit-code padding new-returen-address"

exploit code: executed a root shell on the victim machine with a direct TCP connection to the attacker

Once on a machine, scanned for other machines to attack

invaded about 6000 computers in hours (10% of the Internet)

see June 1989 article in Comm. of the ACM

the young author of the worm was prosecuted...

and CERT was formed... still homed at CMU

Example 2: IM War

July, 1999

Microsoft launches MSN Messenger (instant messaging system)

Messenger clients can access popular AOL Instant Messaging Service (AIM) servers

                              AIM client
                                    ^
                                    |
                                    v 
MSN server <-> MSN client <-> AIM server
                                    ^
                                    |
                                    V
                              AIM client

August, 1999

Mysteriously, Messenger clients can no longer access AIM servers

Microsoft and AOL begin the IM war:

AOL changes server to disallow Messenger clients

Microsoft makes changes to clients to defeat AOL changes

At least 13 such skirmishes

What was really happening?

AOL had discovered a buffer overflow bug in their own AIM clients

They exploited it to detect and block Microsoft: the exploit code returned a 4-byte signature (the bytes at some location in the AIM client to server)

When Microsoft changed code to match signature, AOL changes signature location

AOL 利用漏洞偷信息这件事被一个自称为 Phil Bucking 的人披露，后来确认这封邮件是从微软内部发出的。

异同：蠕虫和病毒 Aside: Worms and Viruses

蠕虫：一段这样的程序 Worm: A program that

可以自行运行 Can run by itself

可以传播一个自己的可工作版本到其他计算机上 Can propagate a fully working version of itself to other computers.

病毒：这样的代码 Virus: Code that

将自己添加到其他程序当中 Adds itself to other programs

无法独立运行 Does not run independently

两者都（通常）被设计出来在计算机之间传播，并且大肆破坏 Both are (usually) designed to spread among computers and to wreak havoc.

如何处理缓冲区溢出 OK, what to do about buffer overflow attacks

1. 避免溢出漏洞 Avoid overflow vulnerabilities

2. 利用系统级别防护 Employ system-level protections

3. 编译器在栈上使用「金丝雀」Have compiler use "stack canaries"【金丝雀是一个常用比喻，以前矿工下煤矿，会带一只金丝雀等脆弱动物来探测矿洞内气体成分变化，人无法察觉出来的气体成分变化能够被它们察觉到，因此金丝雀用于比喻敏感的探测器。】

Lets talk about each...

避免溢出漏洞

/* Echo Line */
void echo() {
    char buf[4]; /* Way too small! */
    fgets(buf, 4, stdin);
    puts(buf);
}

例如，使用限制字符串长度的库函数 For example, use library routines that limit string lengths

使用 fgets() 而不是 gets() fgets instead of gets

使用 strncpy 而不是 strcpy strncpy instead of strcpy

不要使用带 %s 的 scanf() Don't use scanf with %s conversion specification

使用 fgets() 函数来读入字符串 Use fgets to read the string

或者使用 %ns，其中 n 是一个合适的整数 Or use %ns where n is a suitable integer.

系统级别防护 System-Level Protection can help

随机栈偏移 Randomized stack offsets【这种技术也被称为「地址空间布局随机化」ASLR, address space layout randomization】

At start of program, allocate random amount of space on stack

Shifts stack addresses for entire program

Makes it difficult for hacker to predict beginning of inserted code

e.g. 5 executions of memory allocation code

stack repositioned each time program executes

不可执行代码段 Nonexecutable code segments

In traditional x86, can mark region of memory as either "read-only" or "writeable"

Can execute anything readable

x86-64 added explicit "execute" permission【这逐渐演化成了 3 位的读写执权限码】

stack marked as nonexecutable

使用金丝雀 Stack Canaries can help

Idea

Place special value ("canary") on stack just beyond buffer

Check for corruption before exiting function

GCC Implementation

-fstack-pretector

Now the default (disabled earlier)

unix>./bufdemo-sp
Type a string:0123456
0123456

unix>./bufdemo-sp
Type a string:01234567
*** stack smashing detected ***

受保护的缓冲区反编译 Protected Buffer DIsassembly

echo:

  40072f: sub     $0x18,%rsp 
  400733: mov     %fs:0x28,%rax                  #
  40073c: mov     %rax,0x8(%rsp)                 #
  400741: xor     %eax,%eax
  400743: mov     %rsp,%rdi
  400746: callq   4006e0 <gets>
  40074b: mov     %rsp,%rdi
  40074e: callq   400570 <puts@plt>
  400753: mov     0x8(%rsp),%rax                 #
  400758: xor     %fs:0x28,%rax                  #
  400761: je      400768 <echo+0x39>
  400763: callq   400580 <__stack_chk_fail@plt>  #
  400768: add     $0x18,%rsp
  40076c: retq

针对以上常见保护措施，又有了新的攻击手段，其被称为面向返回编程（return oriented programming），成功率可观。前两种都可以比较轻易地破解，金丝雀比较难。

Return-Oriented Programming Attacks

Challenge (for hackers)

Stack randomization makes it hard to predict buffer location

Marking stack nonexecutable makes it hard to insert binary code

Alternative Strategy

Use existing code

e.g. library code from stdlib

string together fragments to achieve overall desired outcome

Does not over come stack canaries

Construct program from gadgets （gadget 指部分可执行程序的一系列字节）

Sequence of instructions ending in ret

Encoded by single byte 0xc3

Code positions fixed from run to run

Code is executable

【面向返回编程就是利用 gadget，gadget 利用各个可执行代码中 ret 及前面的一部分代码，因为返回地址可以得知，所以可以使用缓冲区溢出跳转到这些地方，然后每到一个地方执行一部分代码后返回，这个返回又跳转到下一个 gadget 处，由此拼凑出了可以完成目标的指令序列。】

联合体

联合体内存分配

根据最大元素分配，一次只能使用一个字段 Allocate according to largest element. Can only use one field at a time.

union U1 {
    char c;
    int i[2];
    double v;
} *up;
struct S1 {
    char c;
    int i[2];
    double v;
} *sp;

---
|c|
--+--+-----
|i[0]|i[1]|
-----+-----
|v        |
-----------
up+0      up+8

-----+----+----+----+---------
|c\3B|i[0]|i[1]\4B  |v       |
-----+----+----+----+---------
sp+0 sp+4 sp+8 sp+12sp+16    sp+24

typedef union {
    float f;
    unsigned u;
} bit_float_t;

float bit2float(unsigned u) {
    bit_float_t arg;
    arg.u = u;
    return arg.f;
}
unsigned float2bit(float f) {
    bit_float_t arg;
    arg.f = f;
    return arg.u;
} 
// 此处 u 和 f 的空间完全重叠。

Byte Ordering Revisited

Idea

Short/long/quad words stored in memory as 2/4/8 consecutive bytes

Which byte is most (least) significant?

Can cause problems when exchanging binary data between machines

大端方式 Big Endian

Most significant byte has lowest address

Sparc

小端方式 Little Endian

Least significant byte has lowest address

Intel x86, ARM Android and IOS

双端方式 Bi Endian

Can be configured either way

ARM

联合体最初是用来解决字节序列切换问题的，让数据在大端小端之间切换。

IA32 使用小端，Sun 使用大端，x86-64 使用小端，总之现在大多数机器都是小端

Summary of Compound Types in C

Arrays

Contiguous allocation of memory

Aligned to satisfy every element's alignment requirement

Pointer to first element

No bounds checking

Structures

Allocate bytes in order declared

Pad in middle and at end to satisfy alignment

Unions

Overlay declaration

Way to circumvent type system