Secure Dev Part 1 - Memory Safety in C
this series is about writing code that doesn’t get you or your company into a CVE report. not “sanitize your inputs lol.” but real life bugs and the exact tools that catch them. part 1 is memory safety in C. part 2 will be privilege separation. part 3 input validation. by the end you should be able to sit in a security audit, read every finding, and know exactly what went wrong at the assembly level.
this post is long. open a terminal next to it. the only way any of this actually lands is if you run the commands yourself.
The email
three weeks into your internship at Pied Piper. your first real task: write a small C utility for the compression pipeline. it takes a username, compression level, session token. logs stuff, stores session state. you spent two days on it. it compiled, it ran, you went home feeling like a person who had accomplished something.
then you woke up to this.
From: bertram.gilfoyle@piedpiper.com
To: you@piedpiper.com
Subject: re: your C submission
I went through your code.
It compiled. I want to be transparent about how meaningless that is, so I’ll put it this way: my mother’s Windows XP laptop with 512MB of RAM running her slot machine game also compiles, and yet nobody is putting it in a security-critical compression pipeline. The bar you cleared is the same bar. Congratulations.
Now. I found eight classes of memory safety vulnerabilities in sixty lines of C. Not bugs. Classes. Stack buffer overflow, heap overflow, integer overflow feeding your malloc so the size is wrong before anything else can be wrong, use-after-free because apparently freeing memory and then reading from it is just how you live your life, a format string vulnerability where you passed user input directly as the format argument to sprintf which I want you to think about for a second and feel something, an uninitialized struct you read a return value from, a signed-to-unsigned mismatch in the function you called “safe_read” which is the funniest fucking thing I’ve seen this quarter, and a memory leak.
If this binary ever processes network input, which it will, because that is what session token processors do you dense fuck, someone competent is going to spend an afternoon reading our private keys out of heap memory, overwrite your return address with a ROP chain, and escalate to root. Before dinner. Using a laptop. Probably your laptop since you apparently left it unlocked.
I’m attaching the source. Study it until you understand what each of those things means at the register level, not at the “I googled it” level. Don’t come back with “I added bounds checking” because I will ask you what the stack layout looks like when strcpy overflows and if you cannot draw it I’m not interested.
— Gilfoyle
P.S. naming a function “safe_read” and then passing a signed int directly to memcpy is the most confident wrongness I have encountered in a professional context. Your mom would be proud. I am not. She probably would be though, she seems like a warm and supportive woman.
okay. let’s go through every single thing he found. not at a surface level. at the level where you actually understand what’s happening to registers and memory.
What a binary actually is before you run it
before touching any bugs you need to understand what happens between gcc pipeline.c -o pipeline and the first instruction executing. this is not optional. the memory layout that makes buffer overflows possible comes directly from how ELF binaries are structured and how the OS loads them.
ELF stands for Executable and Linkable Format. it’s the binary format Linux uses for executables, shared libraries, and object files. every ELF file starts with a fixed-size header that describes the file type, target architecture, where to find the entry point, and where the section headers and program headers live on disk. you can read this header directly:
1
2
3
4
5
6
7
8
9
10
11
$ readelf -h pipeline
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Entry point address: 0x401060
Start of program headers: 64 (bytes into file)
Start of section headers: 13672 (bytes into file)
Number of sections: 31
that magic value at the start, 7f 45 4c 46, is how tools like file and the dynamic linker figure out they’re looking at an ELF binary in about 4 bytes. the Entry point address: 0x401060 is where the CPU actually starts executing, and it’s not main. it’s a function called _start that the linker adds. _start sets up the C runtime environment and then calls your main. you can see this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ objdump -d -M intel pipeline | grep -A 20 "<_start>:"
0000000000401060 <_start>:
401060: xor ebp, ebp
401062: mov r9, rdx
401065: pop rsi
401066: mov rdx, rsp
401069: and rsp, 0xfffffffffffffff0
40106d: push rax
40106e: push rsp
40106f: mov r8, 0x401200
401076: mov rcx, 0x401190
40107d: mov rdi, 0x401136 ; address of main()
401084: call 401030 <__libc_start_main@plt>
401089: hlt
__libc_start_main gets the address of main in rdi (first argument register on x86-64). it sets up atexit handlers, initializes the C library, then calls your main. that @plt suffix after __libc_start_main is the PLT, which becomes relevant when we talk about RELRO.
now look at the sections inside the binary. an ELF binary is organized into sections, each with a specific purpose and permissions:
1
2
3
4
5
6
7
8
9
10
$ readelf -S pipeline | grep -E "Name|\.text|\.data|\.bss|\.rodata|\.plt|\.got|\.interp|\.dynamic"
[ 1] .interp PROGBITS ... A interpreter path
[11] .plt PROGBITS ... AX PLT stubs (executable)
[14] .text PROGBITS ... AX your compiled code
[16] .rodata PROGBITS ... A read-only constants
[22] .dynamic DYNAMIC ... WA dynamic linking info
[23] .got PROGBITS ... WA global offset table
[24] .got.plt PROGBITS ... WA PLT's got entries
[25] .data PROGBITS ... WA initialized globals
[26] .bss NOBITS ... WA zero-initialized globals
the Flg column: A means the section gets allocated into memory when the binary runs. X means executable. W means writable. .text is AX: your code lives there, the CPU reads and executes it, nobody writes to it during normal execution. .data is WA: writable, not executable. hardware page permissions enforce this separation, and it’s what NX means in checksec output.
.bss is NOBITS, which is interesting. it takes zero bytes in the ELF file on disk but gets real memory allocated at runtime. the OS fills it with zeros. your uninitialized global variables live here and they start clean. your local stack variables don’t get this treatment. that difference becomes the uninitialized memory read bug later.
1
2
3
$ size pipeline
text data bss dec hex filename
4321 632 48 5001 1389 pipeline
.interp contains the path to the dynamic linker, the user-space program that loads your binary and resolves shared library dependencies. check it:
1
2
3
$ readelf -p .interp pipeline
String dump of section '.interp':
[ 0] /lib64/ld-linux-x86-64.so.2
when you run ./pipeline, the kernel sees the PT_INTERP program header, loads ld-linux-x86-64.so.2 into memory, and hands control to it. ld-linux then loads the binary and all its shared library dependencies using mmap, does relocations, and finally jumps to _start. your main is several function calls deep before it starts.
The PLT, GOT, lazy binding, and why RELRO exists
okay this is the part that makes checksec output actually mean something instead of just being words you memorize.
when your C code calls printf, that function lives in libc.so. the linker doesn’t know where libc will be in memory at compile time because ASLR randomizes it. so instead of a direct call, your code calls a stub in the PLT.
the PLT (Procedure Linkage Table) is a section of short stubs, one per external function. look at what it looks like:
1
2
3
4
5
$ objdump -d -M intel pipeline | grep -A 8 "<printf@plt>"
0000000000401040 <printf@plt>:
401040: jmp QWORD PTR [rip+0x2fc2] # 404008 <printf@GLIBC_2.2.5>
401046: push 0x2
40104b: jmp 401010 <_init+0x20>
that jmp QWORD PTR [rip+0x2fc2] reads an address from location 0x404008 and jumps to it. 0x404008 is a slot in the GOT (Global Offset Table), specifically in .got.plt.
first time you call printf@plt: the GOT slot points back into the PLT stub itself, to the push 0x2 instruction. so the jump lands there, pushes an identifier for printf onto the stack, then jumps to a default PLT stub that calls the dynamic linker. the dynamic linker looks up printf in libc, gets its real address, writes it into that GOT slot. done. subsequent calls: the GOT slot has the real printf address, the jmp goes directly there, dynamic linker not involved. this whole deferred resolution process is lazy binding.
you can watch it happen in real time:
1
2
3
4
5
6
7
$ gdb -q ./pipeline
(gdb) b printf
(gdb) run <<< "alice 1 5"
(gdb) info address printf
Symbol "printf" is a function at address 0x7f... (this is libc's address)
(gdb) x/xg 0x404008
0x404008: 0x00007f... (same address now stored in GOT)
the GOT is writable memory. if an attacker can overwrite a GOT entry, every subsequent call to that function goes to the attacker’s address. format string %n can do this. heap overflows that corrupt adjacent memory can do this.
full RELRO fixes this. with -Wl,-z,relro,-z,now, the linker forces eager binding at startup, resolves all PLT entries before main runs, then marks the .got.plt region read-only with mprotect. after that, no process-level write can touch those pages. format string attack that overwrites a GOT entry? SIGSEGV on the write, not on execution.
1
2
3
4
5
6
7
8
# partial RELRO (default): .got is protected, .got.plt is still writable
$ checksec --file=./pipeline
RELRO: Partial RELRO
# full RELRO: everything resolved at startup, whole GOT read-only
$ gcc -Wl,-z,relro,-z,now pipeline.c -o pipeline_relro
$ checksec --file=./pipeline_relro
RELRO: Full RELRO
the tradeoff with full RELRO is startup time: all library functions get resolved at program start rather than first use. for a long-running server, completely negligible. for a program that starts and exits in milliseconds, sometimes matters.
Symbols and what strip does to your binary
when gcc compiles your code, it emits symbols: a mapping from human-readable names to addresses. function symbols say “this function lives at this address and is this many bytes long”. you can see them with readelf -s or nm:
1
2
3
4
$ nm pipeline | grep -E "process_user|safe_read|log_entry"
00000000004011a6 T process_user
0000000000401289 T safe_read
000000000040118a T log_entry
T means the symbol is in the text (code) section. t would mean local. U means undefined, referenced but not defined here (like printf).
stripped binaries have these removed:
1
2
3
4
5
6
7
8
$ strip pipeline
$ nm pipeline 2>&1
nm: pipeline: no symbols
$ nm -D pipeline # dynamic symbols stay even after strip
0000000000000000 w __gmon_start__
0000000000000000 U printf@GLIBC_2.2.5
0000000000000000 U strcpy@GLIBC_2.2.5
.dynsym stays because the dynamic linker needs it at runtime to resolve dependencies. .symtab goes because it’s only useful for debugging and reverse engineering. this is why stripped malware is harder to analyze: objdump still gives you the assembly, but function names become 0x4011a6 instead of process_user.
symbols make disassembly dramatically easier. without them, figuring out where one function ends and another begins in objdump output requires recognizing function prologues and epilogues by pattern. with the -g flag during compilation, you also get DWARF debug info embedded in the binary, which gives you source line numbers in GDB and Valgrind output. always compile with -g during development.
Stack frames, the x86-64 ABI, and what actually gets overwritten
x86-64 Linux uses the System V AMD64 ABI. when you call a function, arguments go in registers: rdi, rsi, rdx, rcx, r8, r9 for the first six integer/pointer values. overflow beyond six? stack. return value comes back in rax.
the rsp register is the stack pointer, always pointing to the top of the stack. stack grows down toward lower addresses. the rbp register is the base pointer, anchoring the current frame. call instruction pushes the return address onto the stack then jumps. ret pops it back into rip.
standard function prologue:
push rbp ; push caller's base pointer
mov rbp, rsp ; our frame's base is now here
sub rsp, 0x70 ; allocate 112 bytes for locals
now the frame looks like this, from bottom (lower addr) to top:
1
2
3
4
[rbp - 0x70] ... local variables, buffers
[rbp - 0x08] ... more locals
[rbp + 0x00] <- saved rbp (caller's base pointer)
[rbp + 0x08] <- return address (CPU jumps here on ret)
the return address is 8 bytes above rbp. local buffers grow downward from rbp. overflow a local buffer past its end and you travel upward through memory, hitting saved rbp at offset (frame_size), return address at offset (frame_size + 8).
let’s confirm with actual disassembly of process_user:
1
2
$ gcc -g -O0 -fno-stack-protector -no-pie pipeline.c -o pipeline_debug
$ objdump -d -M intel pipeline_debug | grep -A 30 "<process_user>:"
00000000004011a6 <process_user>:
4011a6: push rbp
4011a7: mov rbp,rsp
4011aa: sub rsp,0x70 ; 112 bytes reserved for locals
4011ae: mov QWORD PTR [rbp-0x68],rdi ; username
4011b2: mov DWORD PTR [rbp-0x6c],esi ; level
4011b5: mov DWORD PTR [rbp-0x70],edx ; count
...
; buf[64] is at [rbp-0x50], which is rbp minus 80
4011be: lea rax,[rbp-0x50]
4011c2: mov rdi,rax
4011c5: mov rsi,QWORD PTR [rbp-0x68]
4011c9: call 401060 <strcpy@plt>
buf is at [rbp-0x50] = 80 bytes below rbp. to reach the saved rbp: write 80 bytes. to reach the return address: write 88 bytes.
The code
here is what you submitted. every bug is in here. it looks completely ordinary:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LOG 256
#define TOKEN_SIZE 32
typedef struct {
int user_id;
int level;
char tag[8];
} UserSession;
void log_entry(char *fmt, char *msg) {
char logbuf[MAX_LOG];
sprintf(logbuf, fmt, msg); /* format string bug */
fprintf(stderr, "%s\n", logbuf);
}
char *init_token_store(int count) {
return malloc(count * TOKEN_SIZE); /* integer overflow in size calc */
}
int process_user(char *username, int level, int count) {
char buf[64];
UserSession session; /* declared, never initialized */
strcpy(buf, username); /* stack buffer overflow */
char *store = init_token_store(count);
if (!store) return -1;
memcpy(store, username, count * TOKEN_SIZE); /* heap overflow */
log_entry("User logged in: %s", username);
free(store);
char *backup = malloc(64);
/* backup is never freed = memory leak */
return session.user_id; /* reading uninitialized memory */
}
int safe_read(char *buf, int max_len, int user_len) {
if (user_len < max_len) { /* signed/unsigned mismatch */
memcpy(buf, "input", user_len);
}
return 0;
}
int main() {
char username[64];
int level, count;
printf("Username: ");
fgets(username, sizeof(username), stdin);
username[strcspn(username, "\n")] = '\0';
printf("Level: ");
scanf("%d", &level);
printf("Count: ");
scanf("%d", &count);
process_user(username, level, count);
return 0;
}
compiled fine. no warnings. zero flags. let’s go.
Stack buffer overflow
strcpy(buf, username). strcpy(char *dst, const char *src) copies bytes until null terminator. the whole function is basically:
1
while (*dst++ = *src++) ;
no size argument. no length awareness. the function signature has no place to put destination size because C strings are null-terminated not length-prefixed. strcpy has no idea buf is 64 bytes, and it doesn’t care.
buf is 64 bytes. what’s above it on the stack? saved rbp at 80 bytes in. return address at 88 bytes in. write 80 bytes of username into a 64-byte buffer and you’ve overwritten the saved base pointer. write 88 and you’ve overwritten where the CPU jumps when this function returns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ gcc -g -O0 -fno-stack-protector -no-pie pipeline.c -o pipeline_vuln
$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*88 + b'\n')" | ./pipeline_vuln
Segmentation fault (core dumped)
$ python3 -c "
import sys
# 80 bytes to fill buf + saved rbp, then 8 bytes to overwrite return addr
payload = b'A' * 80 + b'BBBBBBBB' + b'\n'
sys.stdout.buffer.write(payload)
" | gdb -q ./pipeline_vuln
Program received signal SIGSEGV
(gdb) info registers rip
rip 0x4242424242424242 0x4242424242424242
0x4242424242424242 is BBBBBBBB. we picked where the CPU jumped. from here an attacker checks what mitigations are present:
1
2
3
4
5
$ checksec --file=./pipeline_vuln
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE
NX means the stack isn’t executable so jumping to injected shellcode won’t work. but no PIE means the binary loads at the same address every run, which means objdump addresses are valid every run. no canary means nothing detects the return address overwrite before ret executes.
NX without PIE means ROP. you find short instruction sequences that already exist in the binary’s code, ending in a ret, called gadgets. chain their addresses together in the overflow payload and the CPU bounces between them executing your logic using code that was already there:
1
2
3
4
5
$ ROPgadget --binary ./pipeline_vuln | grep "pop rdi ; ret"
0x00000000004012a3 : pop rdi ; ret
$ ROPgadget --binary ./pipeline_vuln | grep ": ret$"
0x000000000040101a : ret
basic shell chain: overflow fills to return address, then lay down addresses: pop rdi ; ret gadget, then address of a /bin/sh string in memory, then address of system. CPU hits pop rdi, pulls /bin/sh address into rdi (first arg), hits ret, jumps to system. shell.
add a canary and the picture changes:
1
2
3
4
$ gcc -g -O0 -no-pie pipeline.c -o pipeline_canary
$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*88 + b'\n')" | ./pipeline_canary
*** stack smashing detected ***: terminated
the canary is a random 8-byte value inserted between your locals and rbp by the compiler. before ret executes, the function reads the canary back and compares it to the original value stored in the thread-local storage region (fs:0x28). mismatch means someone wrote past the buffer and the process dies before jumping anywhere.
look at it in the assembly:
1
$ objdump -d -M intel pipeline_canary | grep -A 40 "<process_user>:"
<process_user>:
push rbp
mov rbp,rsp
sub rsp,0x80
mov rax, QWORD PTR fs:0x28 ; read thread-local canary
mov QWORD PTR [rbp-0x8],rax ; store it just below saved rbp
...
; at function exit:
mov rax, QWORD PTR [rbp-0x8] ; load stored canary
xor rax, QWORD PTR fs:0x28 ; xor with original
je .L_clean ; match = return normally
call __stack_chk_fail ; mismatch = abort
the canary value comes from fs:0x28, a per-thread slot initialized randomly per process. different every run. you can’t guess it.
you can bypass it if you have an information leak first. the format string bug in this same codebase would let you read the canary off the stack before triggering the overflow. that’s why you need to fix all the bugs, not just the one Gilfoyle is yelling about.
real world: CitrixBleed (CVE-2023-4966). buffer overread in Citrix NetScaler. attackers read live session tokens straight out of heap memory. no auth, no noise, no warning in the logs. LockBit ransomware operationalized it within weeks. Boeing, Comcast, and others. the bug: response buffer not sized for all data being written into it.
the fix:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* strncpy: copies at most n-1 bytes. will NOT null-terminate if src is longer.
you have to manually add the null byte or you have a string with no terminator */
strncpy(buf, username, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0';
/* strlcpy: always null-terminates.
returns how many bytes WOULD have been written if buffer was big enough.
if return >= sizeof(buf), truncation happened. check this. */
size_t written = strlcpy(buf, username, sizeof(buf));
if (written >= sizeof(buf)) {
return -1; /* don't silently accept truncated data in security code */
}
/* snprintf: most portable, same guarantees as strlcpy */
snprintf(buf, sizeof(buf), "%s", username);
checking the strlcpy return value matters. silent truncation can itself be a security bug. if a username gets truncated from admin_alice to admin and downstream code does a lookup by that value, you might be authenticating as a different user. always check.
gets was literally removed from C11. strcat, sprintf: same size-blindness problem, bounded replacements exist for both.
Heap buffer overflow
malloc(count * TOKEN_SIZE) allocates the store. memcpy(store, username, count * TOKEN_SIZE) fills it. the sizes look like they match. the bug is that count is user-supplied and unbounded, but that’s the next section. the deeper point is what happens when you write past a heap allocation’s end.
when you call malloc(n), glibc’s ptmalloc2 allocator doesn’t hand you exactly n raw bytes. it gives you a chunk. every chunk has a 16-byte header sitting immediately before the pointer you receive. the header contains prev_size (size of previous chunk, used during coalescing on free) and size (this chunk’s size, with the three lowest bits used as flags):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ cat heap_layout.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main() {
char *a = malloc(8);
char *b = malloc(8);
/* read chunk headers: they sit 8 bytes before the pointer */
uint64_t a_size = *((uint64_t*)(a - 8));
uint64_t b_size = *((uint64_t*)(b - 8));
printf("a ptr: %p\n", (void*)a);
printf("a chunk size: 0x%lx (size: %lu, PREV_INUSE: %lu)\n",
a_size, a_size & ~7UL, a_size & 1);
printf("b ptr: %p\n", (void*)b);
printf("b chunk size: 0x%lx\n", b_size);
printf("gap: %ld bytes\n", b - a);
return 0;
}
$ gcc heap_layout.c -o heap_layout && ./heap_layout
a ptr: 0x55f1a2b012a0
a chunk size: 0x21 (size: 32, PREV_INUSE: 1)
b ptr: 0x55f1a2b012b0
b chunk size: 0x21
gap: 16 bytes
you asked for 8 bytes. you got a 32-byte chunk. 8-byte header plus 8 bytes you requested, rounded up to 16-byte alignment. chunk A is at 0x...2a0, chunk B at 0x...2b0, exactly 16 bytes later. they’re physically adjacent in memory.
overflow into the adjacent chunk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cat heap_overflow.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
int main() {
char *a = malloc(8);
char *b = malloc(8);
printf("before: b size field = 0x%lx\n", *((uint64_t*)(b-8)));
memset(a, 0x41, 32); /* overflow: 32 bytes into 8-byte buffer */
printf("after: b size field = 0x%lx\n", *((uint64_t*)(b-8)));
free(a);
free(b); /* allocator reads b's now-corrupted header */
return 0;
}
$ gcc -g heap_overflow.c -o heap_overflow && ./heap_overflow
before: b size field = 0x21
after: b size field = 0x4141414141414141
free(): invalid pointer
Aborted (core dumped)
chunk B’s size field is now garbage. when free(b) runs, the allocator tries to find the next chunk by adding the (corrupt) size to the current address, jumps somewhere ridiculous, tries to read chunk metadata there, dies.
in exploitation you put specific values in that corrupted header. glibc 2.26+ uses tcache, a per-thread free list. when you free a chunk, the allocator writes a pointer to the next free chunk into the first 8 bytes of the freed chunk’s user data. corrupt that pointer and the next malloc can return an attacker-chosen address:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cat tcache_poison.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int target = 0xdeadbeef; /* we want to overwrite this */
int main() {
void *a = malloc(32);
void *b = malloc(32);
free(b); /* b enters tcache, b's user area now holds: next_free_ptr = NULL */
free(a); /* a enters tcache, a's user area now holds: next_free_ptr = b */
/* corrupt a's tcache forward pointer to point at &target */
*((long*)a) = (long)⌖
void *x = malloc(32); /* pops a from tcache */
void *y = malloc(32); /* pops our poisoned entry -> returns &target */
printf("y == &target: %s\n", y == (void*)&target ? "yes" : "no");
*(int*)y = 0xcafebabe; /* write to y = write to target */
printf("target is now: 0x%x\n", target);
return 0;
}
in real exploitation the forward pointer corruption comes from the heap overflow itself, not from direct memory access like this demo. but the chain is the same: overflow corrupts a tcache metadata pointer, next malloc returns attacker-chosen address, write goes wherever the attacker wants.
real world: CVE-2024-38812, VMware vCenter Server heap overflow. unauthenticated remote code execution. found at Pwn2Own 2024.
1
2
3
4
5
6
7
/* fix */
char *init_token_store(size_t count) {
if (count == 0 || count > 1024) return NULL;
/* calloc checks the multiplication for overflow internally
and zeros the allocation, killing two bugs at once */
return calloc(count, TOKEN_SIZE);
}
Integer overflow in allocation size
malloc(count * TOKEN_SIZE) with count as user-supplied int has a time bomb in it. int is 32-bit signed. max value: 2,147,483,647. TOKEN_SIZE is 32. multiply something close to INT_MAX by 32 and the product doesn’t fit in 32 bits. it wraps.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat intover.c
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
int main() {
int count = 134217729; /* 0x08000001 */
int token_size = 32;
int product = count * token_size;
printf("expected: %lld\n", (long long)count * token_size);
printf("actual: %d\n", product);
printf("as size_t to malloc: %zu\n", (size_t)product);
printf("malloc would get: %zu bytes\n", (size_t)(count * token_size));
return 0;
}
$ gcc intover.c -o intover && ./intover
expected: 4294967328
actual: 32
as size_t to malloc: 32
malloc would get: 32 bytes
134217729 * 32 should be 4,294,967,328. doesn’t fit in 32-bit int. wraps to 32. malloc(32) succeeds and returns a 32-byte buffer. in this code the write also uses the same overflowed expression so it also writes 32 bytes, which happens to fit. in real codebases the allocation and the write often happen in different scopes with different type contexts and evaluate to different values. you get a 32-byte allocation and a write of several gigabytes into whatever comes after it in the heap.
-Woverflow won’t catch this. that flag only catches constants that overflow at compile time. this is a runtime arithmetic overflow. the compiler has no visibility into what count will be at runtime. you need explicit checks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdint.h>
char *init_token_store(size_t count) {
/* manual overflow check before multiplying */
if (count == 0 || count > SIZE_MAX / TOKEN_SIZE) {
return NULL;
}
return calloc(count, TOKEN_SIZE);
/* or: let GCC generate a zero-overhead hardware overflow check */
size_t total;
if (__builtin_mul_overflow(count, (size_t)TOKEN_SIZE, &total)) {
return NULL;
}
return malloc(total);
}
__builtin_mul_overflow uses the CPU’s overflow flag. on x86-64 that’s the jo/jno (jump if overflow) instruction pair. hardware check, genuinely zero software overhead.
the rule that kills this class: use size_t for anything representing a size, length, count, or index. size_t is defined in <stddef.h>, exactly as wide as a pointer on the current platform (64 bits on x86-64). overflowing a 64-bit size_t requires values in the exabyte range, which you will never encounter from a user.
Format string vulnerability
1
2
3
void log_entry(char *fmt, char *msg) {
char logbuf[MAX_LOG];
sprintf(logbuf, fmt, msg);
called as log_entry("User logged in: %s", username). looks fine because the caller passes a literal. the bug is that the function accepts any pointer as fmt. nothing in the type system stops a future caller from passing user input as the format string. and the more common version in real codebases:
1
2
3
printf(user_input);
fprintf(logfile, user_input);
sprintf(buf, user_input);
to understand why this is catastrophic you need to know how printf works internally.
printf(const char *format, ...) is variadic. on x86-64, arguments go in registers: rdi holds the format string, then rsi, rdx, rcx, r8, r9 hold up to six arguments, then the stack. when printf encounters %x, it reads the next argument from wherever arguments come from. if there are no more actual arguments, it reads from the stack anyway. it has no way to know how many arguments were actually passed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ cat fmtstr.c
#include <stdio.h>
#include <string.h>
int main() {
char secret[32] = "tok:s3cr3t_api_k3y_9a2f";
char input[128];
fgets(input, sizeof(input), stdin);
input[strcspn(input, "\n")] = '\0';
printf(input); /* user controls format string */
printf("\n");
return 0;
}
$ gcc -O0 -g fmtstr.c -o fmtstr
$ echo "%x %x %x %x %x %x %x %x %x %x" | ./fmtstr
f7fb8780 0 5665620c 3a6b6f74 33726373 33745f33 69705f74 5f796b61 326639 0
those hex values are stack contents. let’s decode them:
1
2
3
4
5
6
7
8
9
10
11
>>> import struct
>>> vals = ["3a6b6f74","33726373","33745f33","69705f74","5f796b61","326639"]
>>> for v in vals:
... b = bytes.fromhex(v)[::-1] # little-endian reversal
... print(repr(b))
b'tok:'
b'scr3'
b'3_t'
b't_pi'
b'ak_y'
b'9f2'
it printed tok:s3cr3t_api_k3y_9a2f, our secret string that was on the stack from the previous frame. in a real program that’s a session token, a recently-compared password, a private key in memory.
%p is more useful to attackers because it prints pointer-sized values and reveals heap and stack addresses. six %ps can defeat ASLR by leaking the base addresses you need to compute gadget locations.
%n is the one that makes format string bugs arbitrary writes. it reads the next argument as a int * and writes the count of characters printed so far to that address. if there’s no real argument there, it reads a stack value and writes to that address. an attacker who controls the format string can arrange to have their chosen addresses on the stack, then use %n to write to those addresses. overwrite a GOT entry. overwrite a variable that controls authentication state. %n was removed from glibc’s printf family in more recent versions, but plenty of old code uses old libcs, and the attack class still applies to custom format string implementations.
Wu-FTP (2000): passed user-controlled strings to syslog() as format argument. syslog calls vsprintf internally. remote root on one of the most deployed FTP servers of the time. from a logging function. nobody took format string bugs seriously before that.
1
2
3
4
$ gcc -Wformat=2 -Wformat-security fmtstr.c -o fmtstr
fmtstr.c:12:5: warning: format not a string literal and no format arguments [-Wformat-security]
12 | printf(input);
| ^~~~~~
the compiler literally tells you. enable these flags.
1
2
3
4
5
6
/* wrong: user input as format string */
printf(user_input);
/* right: user input as argument to a literal format string */
printf("%s", user_input);
snprintf(buf, sizeof(buf), "%s", user_input);
if you’re writing a variadic logging wrapper, mark it with the format attribute and the compiler checks callers exactly like it checks printf callers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/* wrong: format string parameter means caller can accidentally pass user input */
void log_entry(const char *fmt, const char *msg) {
snprintf(logbuf, sizeof(logbuf), fmt, msg); /* caller controls fmt */
}
/* right: no format string parameter at all */
void log_entry(const char *msg) {
snprintf(logbuf, sizeof(logbuf), "User logged in: %s", msg);
}
/* or for real variadic logging, tell gcc to typecheck callers */
__attribute__((format(printf, 1, 2)))
void my_log(const char *fmt, ...) {
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
}
Uninitialized memory read
1
2
3
UserSession session; /* space reserved on stack, no value written */
/* ... */
return session.user_id; /* read whatever bytes happen to be there */
declaring a local variable allocates space in the stack frame. no write happens. whatever bytes were at that memory location from the previous function call that used the same stack space are still physically there. session.user_id returns garbage.
this matters beyond returning a wrong value. the stack gets reused constantly. every function call reuses the same physical memory that previous calls used. if process_user got called after an authentication function that had a plaintext password in a local variable, those bytes might still be sitting exactly where session got allocated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ cat uninit.c
#include <stdio.h>
#include <string.h>
void leave_secret() {
char creds[32] = "admin:hunter2_prod";
volatile char x = creds[0]; /* volatile: optimizer can't remove this */
(void)x;
/* returns. bytes stay on stack. */
}
void read_uninitialized() {
char buf[32]; /* same stack region, never written */
printf("uninit: %s\n", buf);
}
int main() {
leave_secret();
read_uninitialized();
return 0;
}
$ gcc -O0 uninit.c -o uninit && ./uninit
uninit: admin:hunter2_prod
Heartbleed (CVE-2014-0160) was the heap version. OpenSSL’s heartbeat handler received a payload_length and payload. it allocated a response buffer, copied the actual payload into the start of it, then sent back payload_length bytes. the bug was not checking that payload_length matched the actual payload size. send 1 byte of payload, claim payload_length = 64000. handler allocates 64000 bytes, copies your 1 byte in, sends back 64000 bytes of whatever happened to be in that heap region. session tokens from other users’ TLS connections. private key material. decrypted plaintext. 64KB per request, zero auth required, no log entry. two years.
1
2
3
4
5
$ valgrind --track-origins=yes ./pipeline <<< "alice 1 5"
==1234== Use of uninitialised value of size 4
==1234== at 0x401289: process_user (pipeline.c:28)
==1234== Uninitialised value was created by a stack allocation
==1234== at 0x401189: process_user (pipeline.c:15)
Valgrind tells you the read location, the allocation line, and the type. track-origins=yes makes it trace back to where the uninitialized region came from.
1
2
3
4
5
/* fix: initialize at declaration */
UserSession session = {0};
/* or explicitly */
memset(&session, 0, sizeof(session));
1
2
/* or: compiler auto-zeroes all locals */
$ gcc -ftrivial-auto-var-init=zero pipeline.c -o pipeline
small runtime cost from zeroing stack memory you might not use. Linux kernel has this on by default since 5.15.
Signed/unsigned mismatch
1
2
3
4
5
6
int safe_read(char *buf, int max_len, int user_len) {
if (user_len < max_len) { /* passes when user_len is -1 */
memcpy(buf, "input", user_len);
}
return 0;
}
user_len is signed 32-bit int. memcpy’s third argument is size_t, unsigned 64-bit. the check user_len < max_len uses signed comparison. (-1 < 64) is true. check passes. then -1 gets passed to memcpy as size_t.
in two’s complement, -1 in 32 bits is 0xFFFFFFFF. when the CPU promotes a signed 32-bit value to 64-bit size_t, it sign-extends: high 32 bits fill with 1s. 0xFFFFFFFF becomes 0xFFFFFFFFFFFFFFFF. that’s 18,446,744,073,709,551,615. memcpy is told to copy 18 exabytes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ cat signmatch.c
#include <stdio.h>
#include <stdint.h>
int main() {
int user_len = -1;
size_t extended = (size_t)user_len; /* sign extension */
printf("signed int: %d\n", user_len);
printf("as size_t: %zu\n", extended);
printf("hex: 0x%016zx\n", extended);
return 0;
}
$ gcc signmatch.c -o signmatch && ./signmatch
signed int: -1
as size_t: 18446744073709551615
hex: 0xffffffffffffffff
the bounds check is right there in the code. it fails silently and the wrong value flows into memcpy. with the right flags the compiler tells you immediately:
1
2
3
$ gcc -Wsign-conversion -Wsign-compare safe_read.c
safe_read.c:3:22: warning: conversion to 'size_t' from 'int' may change the sign of the result [-Wsign-conversion]
safe_read.c:2:20: warning: comparison of integer expressions of different signedness: 'int' and 'int' [-Wsign-compare]
two warnings on the exact lines. default GCC settings hide both.
1
2
3
4
5
6
7
/* fix: size_t for all size/length parameters */
int safe_read(char *buf, size_t max_len, size_t user_len) {
if (user_len < max_len) { /* both unsigned, comparison is correct */
memcpy(buf, "input", user_len);
}
return 0;
}
size_t is unsigned so there’s no such thing as a negative size_t. a caller passing -1 would need an explicit cast to size_t, which makes the conversion visible.
Use-after-free
free(store) happens and then later store = NULL. that specific case looks handled. in real production functions that grow over time, the pattern is: something allocates, something else frees it in an error path, a third code path reads from it assuming it’s still valid.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cat uaf.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *token = malloc(32);
strcpy(token, "session_abc_123");
printf("before free: '%s' at %p\n", token, (void*)token);
free(token);
char *other = malloc(32); /* allocator may return the same address */
strcpy(other, "attacker_data!");
printf("dangling read: '%s' at %p\n", token, (void*)token);
printf("other: '%s' at %p\n", other, (void*)other);
free(other);
return 0;
}
$ gcc -g uaf.c -o uaf && ./uaf
before free: 'session_abc_123' at 0x55a6b1c012a0
dangling read: 'attacker_data!' at 0x55a6b1c012a0
other: 'attacker_data!' at 0x55a6b1c012a0
token and other got the same address. reading through the dangling token pointer returns whatever other put there. in real exploitation, the attacker controls the timing of allocations and controls what goes into the freed slot. put a fake function pointer there, a crafted struct the program treats as trusted, a vtable pointer.
writes through dangling pointers are worse. freed tcache chunks use their first 8 bytes for the forward pointer in the free list. write through a dangling pointer and you corrupt that forward pointer. the next malloc returns an attacker-chosen address.
ASan catches this with full context:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ gcc -fsanitize=address -g uaf.c -o uaf_asan && ./uaf_asan
=================================================================
==5678==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010
READ of size 1 at 0x602000000010 thread T0
#0 in __interceptor_printf
#1 in main uaf.c:18
0x602000000010 was freed by thread T0 here:
#0 in __interceptor_free
#1 in main uaf.c:13
previously allocated by thread T0 here:
#0 in __interceptor_malloc
#1 in main uaf.c:5
three call stacks: where it was allocated, where it was freed, where it was used after free. line numbers, everything. the overhead is a shadow memory region that tracks the state of every byte of your heap.
real world: CVE-2024-1086, Linux kernel netfilter nf_tables. netfilter expression freed while still referenced by another subsystem. local privilege escalation to root. exploited in the wild before patch.
1
2
3
4
5
free(store);
store = NULL; /* null immediately after every free */
/* then before every subsequent use: */
assert(store != NULL); /* or check explicitly */
nulling one pointer doesn’t help if two pointers reference the same allocation. the solution is one owner. one pointer that owns the allocation and is responsible for freeing it. all other references borrow from the owner and must not outlive it. C doesn’t enforce this; it’s a discipline. C++ enforces it via unique_ptr at compile time.
for multithreaded code where another thread might free the allocation: reference counting or a lock protecting teardown.
Memory leak
backup is allocated and never freed:
1
2
3
char *backup = malloc(64);
/* ... used ... */
/* function returns without free(backup) */
in a process that runs once and exits: the OS reclaims everything on exit, no problem. in a server handling ten thousand requests an hour: every call to process_user permanently loses 64 bytes. after one million requests that’s 64MB gone. after a week of production traffic it’s gigabytes. eventually the Linux OOM killer looks at memory pressure and sends SIGKILL to whatever process looks like the biggest threat. no warning. no graceful shutdown. just gone, mid-request.
the C pattern for handling multiple allocations with multiple early exit points:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int process_user(const char *username, int level, size_t count) {
int result = -1;
char *store = NULL; /* initialize ALL pointers to NULL */
char *backup = NULL;
store = calloc(count, TOKEN_SIZE);
if (!store) goto done;
backup = malloc(64);
if (!backup) goto done;
/* all the actual work */
result = 0;
done:
free(store); /* free(NULL) is a defined no-op per C99 */
free(backup); /* safe to call even if allocation never succeeded */
return result;
}
free(NULL) is defined by the C99 standard to do nothing. so you put unconditional free calls in the cleanup section and jump there from any early return. whichever allocations succeeded get freed. whichever didn’t are still NULL. every code path cleans up correctly.
Valgrind finds leaks:
1
2
3
4
5
6
7
8
9
$ valgrind --leak-check=full --show-leak-kinds=all ./pipeline <<< "alice 1 5"
==9999== HEAP SUMMARY:
==9999== in use at exit: 64 bytes in 1 blocks
==9999== total heap usage: 4 allocs, 3 frees
==9999==
==9999== 64 bytes in 1 blocks are DEFINITELY LOST
==9999== at 0x483B7F3: malloc
==9999== at 0x401289: process_user (pipeline.c:38)
==9999== at 0x401315: main (pipeline.c:57)
“definitely lost” means no remaining pointer to that allocation exists anywhere in the process. “indirectly lost” means reachable only through a directly-lost pointer. “still reachable” means you have a pointer but never freed, which is less serious but still indicates resource mismanagement.
How to find all of this before Gilfoyle does
eight bugs in code that compiled clean on default GCC settings. the compiler trusted you and you were wrong. here’s the tooling that catches it during development.
AddressSanitizer instruments every memory access at compile time. shadow memory tracks the state of every byte: valid, uninitialized, freed, out-of-bounds. it catches stack overflows, heap overflows, use-after-free, and use-after-return:
1
2
3
4
5
6
7
8
9
10
11
12
$ gcc -fsanitize=address -fsanitize=undefined -g pipeline.c -o pipeline_asan
$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*80 + b'\n')" | ./pipeline_asan
=================================================================
==1234==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd...
WRITE of size 80 at 0x7ffd... thread T0
#0 0x... in strcpy
#1 0x... in process_user pipeline.c:22
#2 0x... in main pipeline.c:48
Shadow bytes around the buggy address:
0x7ffd...00: f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7ffd...10: 00 00 00 00[f2]f2 f2 f2 f2 f2 f2 f2 f3 f3 f3 f3
the shadow bytes: 00 is accessible memory. f1 marks beginning of a stack frame. f2 is the gap between stack variables (for catching overflows that jump over the adjacent variable). f3 is end of the frame. [f2] is where the write landed, in the inter-variable gap past buf’s valid range.
ASan has 2x memory overhead and 1.5-2x runtime overhead. completely acceptable for development and CI. not for production.
to inspect symbols and library dependencies on a binary before running it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# what libraries does this binary need?
$ ldd pipeline
linux-vdso.so.1 => (0x00007fff6edd4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f67c26de000)
/lib64/ld-linux-x86-64.so.2 (0x0000561e62fe5000)
# what external functions does it call?
$ nm -D pipeline | grep " U "
U __libc_start_main@GLIBC_2.2.5
U free@GLIBC_2.2.5
U malloc@GLIBC_2.2.5
U strcpy@GLIBC_2.2.5
# what strings are embedded in the binary?
$ strings pipeline | grep -E "User|token|session"
User logged in: %s
Username:
ldd actually runs the binary to compute dependencies, which means don’t run it on untrusted binaries outside a sandbox. use readelf -d pipeline | grep NEEDED for a safer static alternative.
strace shows every system call the binary makes at runtime. useful for understanding what a binary is doing without full disassembly:
1
2
3
4
5
6
7
$ strace ./pipeline <<< "alice 1 5" 2>&1 | head -20
execve("./pipeline", ["./pipeline"], ...) = 0
brk(NULL) = 0x1053000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f703477e000
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
read(3, ...) = 832
...
the full compiler flag set you should always use:
1
2
3
4
5
6
7
8
9
10
11
12
13
$ gcc \
-Wall \
-Wextra \
-Wformat=2 \
-Wformat-security \
-Wsign-conversion \
-Wsign-compare \
-Wstrict-overflow=5 \
-D_FORTIFY_SOURCE=2 \
-O2 \
-fstack-protector-strong \
-fPIE -pie \
pipeline.c -o pipeline_hardened
-D_FORTIFY_SOURCE=2 requires -O2 or higher. at compile time it replaces strcpy, sprintf, memcpy, etc. with size-checked variants when buffer sizes are visible to the optimizer. zero runtime overhead beyond the check:
1
2
3
4
5
6
7
8
9
10
11
$ cat fortify_test.c
#include <string.h>
int main() {
char buf[8];
strcpy(buf, "this is definitely longer than 8 bytes");
return 0;
}
$ gcc -D_FORTIFY_SOURCE=2 -O2 fortify_test.c -o fortify_test && ./fortify_test
*** buffer overflow detected ***: terminated
Aborted (core dumped)
after building with hardening flags:
1
2
3
4
5
6
7
$ checksec --file=./pipeline_hardened
[*] '/home/user/pipeline_hardened'
Arch: amd64-64-little
RELRO: Full RELRO <- GOT read-only, format string/heap attacks can't corrupt it
Stack: Canary found <- stack smashing detected before ret executes
NX: NX enabled <- stack/heap not executable, no injected shellcode
PIE: PIE enabled <- randomized load address, gadget addresses change per run
none of these fix the code. canary doesn’t prevent the overflow, it detects it before exploitation completes. NX doesn’t prevent ROP chains, only injected shellcode. PIE with ASLR makes hardcoded addresses wrong, but information leaks defeat ASLR. they’re speed bumps that raise the skill required to exploit existing bugs, not eliminations of the bugs.
The fixed version
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define MAX_LOG 256
#define TOKEN_SIZE 32
#define MAX_COUNT 1024
typedef struct {
int user_id;
int level;
char tag[8];
} UserSession;
/* takes message, not format string. literal format string is inside this function. */
static void log_entry(const char *msg) {
char logbuf[MAX_LOG];
snprintf(logbuf, sizeof(logbuf), "User logged in: %s", msg);
fprintf(stderr, "%s\n", logbuf);
}
static char *init_token_store(size_t count) {
if (count == 0 || count > MAX_COUNT) return NULL;
/* calloc: checks multiplication overflow, returns zeroed memory */
return calloc(count, TOKEN_SIZE);
}
int process_user(const char *username, int level, size_t count) {
int result = -1;
char *store = NULL;
char *backup = NULL;
char buf[64];
UserSession session = {0}; /* zero-initialized at declaration */
/* strlcpy: always null-terminates, return value tells you if truncation happened */
if (strlcpy(buf, username, sizeof(buf)) >= sizeof(buf)) {
goto done; /* reject rather than silently truncate */
}
store = init_token_store(count);
if (!store) goto done;
/* strnlen won't read past sizeof(buf) even if username has no null byte */
size_t ulen = strnlen(username, sizeof(buf));
if (ulen > count * TOKEN_SIZE) goto done;
memcpy(store, username, ulen);
log_entry(username);
backup = malloc(64);
if (!backup) goto done;
session.user_id = level;
result = session.user_id;
done:
free(store); /* free(NULL) is a no-op per C99 */
free(backup);
return result;
}
/* size_t for all size/length parameters. always. */
static int safe_read(char *buf, size_t max_len, size_t user_len) {
if (user_len < max_len) {
memcpy(buf, "input", user_len);
}
return 0;
}
compile with the full flag set. run ASan. run Valgrind. run checksec. zero findings.
What Gilfoyle was saying
all eight of those bugs compiled without a warning. the binary ran. it looked correct from the outside. C is the language of operating systems, kernels, network equipment, embedded firmware, and security tools themselves. it’s not going anywhere. the reason it’s used is direct memory access and no runtime overhead. the reason it’s dangerous is exactly the same.
the skill isn’t knowing that strcpy is bad. everyone knows that. the skill is being able to look at objdump output and know immediately that buf at [rbp-0x50] means the return address is at offset 88, so any write past 88 bytes controls execution. being able to look at ptmalloc chunk headers and understand that corrupting the size field gives you an exploitation primitive. being able to read %x %x %x %x format string output and decode what it just leaked off your stack.
add ASan to your CI pipeline. add the compiler flags to your Makefile from day one. run checksec on your release builds. make it reflexive, not reactive.
part 2 is privilege separation. what you do when you’ve accepted that memory bugs exist and probably always will, and you design your process architecture so the blast radius of a compromise stays as small as possible.