Bomb Lab CMU 15-213

May 18th, 2023.

I have just finished all my final yesterday. Farewell my sophomore year! After playing the new Zelda: Tears of the Kingdom for the whole evening, which was released just a few days ago (and without any doubt, the best game ever made in human history), I decided to do something a little bit more meaningful and fulfilling.

I started on the lab assignments of 15-213 when I was still a very naive and stupid freshman (I’ve got to admit that I am kind of stupid in the freshman year…). Back then I don’t know anything about computer systems. It is a terrible terrible mistake to start Intro to Computer Sys when you don’t have a clue what C and POSIX and OS is. The labs will just gradually suck out the false confidence that you got from the undemanding NYU curriculum and the all too high level Python. Apparently, this course is by no means, as its name suggests, an intro one. We’ve got to learn to respect CMU.

After being through some pretty traumatic experience at Operating Systems and Distributed Systems classes, I decided that I am tortured enough to begin again on the lab assignment of CS:APP. Being a seasoned and perhaps too experienced C programmer, I think I can easily handle an introductory undergraduate course on Systems. (Or can I? Let’s find out!)

I will skip Data Lab (the first lab) at least for now, because it’s not a particularly interesting one. I will come back to it when I have the time to do so. Bomb lab should be the fun part. Let’s begin with that.

If you are a student enrolled in CS15-213 at CMU, DON’T MOVE ON!!! Keep in mind that this would constitute a violation of academic integrity. This material is intended to give self-study students some intuition and help when they are stuck and alone with no one to seek help from.


Phase 0: Preparation

You can either download the source code from the CS:APP website, or from the mirror on this site.

bombDownload

I find this GDB document particularly helpful. It’s clear and detailed.

gdbnotes-x86-64Download

Some commands that are not in the document, but I personally use a lot, are:

1
2
> layout src
> layout asm

This way the assembly / code are juxtaposed with the output in a way that is more convenient to read.


Phase 1: Starting with the Simple One

After downloading the bomb, first check out the source file to get a rough idea of what this program is doing. However the source file alone is not very informative.

Basically for every phase, the program will read an input, and if the input is not correct, it will trigger a signal to set off the alarm. Otherwise that phase is defused.

The real hint hides within the object file. You may way to first find out what is in the symbol table, by using:

1
> objdump -t bomb

Some interesting tags include: phase 1…6, sigalrm_handler, phase_defused… Oh look, there is even something called a secret phase. Is that a bonus?

Still not very informative, but at least we know where we can set out breakpoints.

Breaking at phase_1, and run the program, you are required to input a string, as was indicated when we examine the source code. Since I am proud of my own culture and heritage, I will just put here China. (remember that this is a 5 byte string, it is an important hint later).

Moving on, let’s examine what is in the assembly of phase 1.

seems like it moves 0x402400 to %esi. An easy guess would be that 0x402400 is a memory address, since we are talking about string here. We cannot put a string in a register, only the address of it. It is worth noting that %esi is the second argument register. So it is clear that we are passing information into a function. Phase_1+9 tells us that this function is called strings not equal.

But what about %rdi? After all %rdi is the first argument register, and if we use %rsi, that means that %rdi is also used. That is a valid question, and it seems to me that %rdi stores what we input initially, in the initialize_bomb() function call. It will be verified later.

Remember that before, in phase 1, a function called strings_not_equal is called. Now let’s step into it, and examine what it does.

The two arguments are each passed first into a function called string_length. It’s not too hard to guess it’s meaning from the name. Apparently, we will jump to strings_not_squal+99 if the length of the two arguments differ. That line will return 1 to phase_1 function, and phase_1, after seeing 1 in %rax, will set off the bomb. BOOM!

Checking the register value, we found that %r12d has value 5, which happens to be the length of the word “China”. What a wonderful coincidence!

%rax has value 52, so ah, it seems like the answer string has length 52. We are getting very close to the answer.

Remember how we stored the address of the correct answer in %rsi, and thar address is 0x402400. All that is left to do is to examine what is store at that address, and exactly 52 words from that, nothing more, nothing less.

Use the command in GDB to get what you want:

1
> x/52s 0x402400

The answer is in the picture. Honestly, I didn’t expect it to be this political LOL.


Phase 2: Let’s Talk About LOOPS!

Phase two is an easy one, took me less than 15 minutes to solve. It is pretty intuitive, given that I got stuck and trained in phase 1.

As always, let’s first set a breakpoint at phase_2, and then check what is in the assembly. What I find most intuitive to do is to unfold the loop functions in a series of commands, so that we don’t have to think too much about loop, and that is exactly what I will do.

First, the compiler saves some callee saved registers that may be altered, then it decrements the stack pointer by 0x28 (pay attention, we are talking about hex here, you will be seriously wrong if you think about decimal, like I once did).

Then the function calls a function named <read_six_numbers>. I don’t even bother to step into that particular function, because its purpose is already all too clear from the name itself. It just read 6 numbers, and it seems that this time, we need to input 6 numbers as the answer.

Six numbers, that is really a lot. Apparently, the compiler will not use the register to save these 6 numbers, what it will do is to store the 6 numbers on the stack, starting from the stack pointer %rsp obviously.

The phase_2 assembly from the previous picture is not complete, with this pic, it is complete now.

We definitely want to jump at line phase_2+18, otherwise the bomb will explode! To jump, we must make sure that the number in *(%rsp) is 0x1, so the first number we input should be 1.

Now that we have jumped to line phase_2+52, %rbx = 4 + %rsp (we are not yet dereferencing here), %rbp = 18+%rsp. And then we jump back to line phase_2 + 27.

Starting from line phase_2+27, %eax = * (%rbx-4) = * (%rsp) (we are dereferencing here), %eax = 2 * %eax, since we want to jump at line phase_2 + 34 to avoid triggering the bomb at line 36, %eax should be equal to *(%rbx), in other words, *(4+rsp) = 2 * (*%rsp). So what should be the second element in the 6 int array we input then? Apparently 2!!!

Jumping to line 41, %rbx = %rbx + 4 = %rsp + 8. If %rbx != %rsp + 24, then we loop back to line 27, where we must make sure that the latter element is twice as big as the previous element. However, if we are lucky enough to have %rbx = %rsp + 24, which will eventually happen after 6 iterations, we will just go to line 64, restore the stack, and then return. With that said, it is pretty apparent that the sequence of number should be 1 2 4 8 16 32, since we are inputting 6 numbers as specified.

We tried that out, it is indeed the solution!!!!!!! So exciting!!!!!!!!


Phase 3: Jumping Around!

Today is May, 19th, 2023. After a long hard day working on the impossible research project, I finally had some time to myself. (Maybe I will write a post about that project a little bit later, I do believe that it is a very meaningful and truly inaugurating project, but it looks VERY formidable).

Phase 3 of the bomb lab is a lot more difficult than the previous two, I have got to admit. I have to spend around an hour to solve this problem.

As always, first we will set a break point at phase_3, and run the program as usual. You can first input anything, as long as we quit GDB before the bomb is set off, we are fine.

if you do that, here should be what you see:

Stopping right before phase_3 begins.

I am utterly curious what is store at address 0x4025cf, since it is passed as an argument to the sscanf function at line 24. So I examine what was in that address.

As it shows, it stores a string, whose content is two place holders for decimal values. (Why do I print it as String? Well, the sscanf function kind of gives you a hint. sscanf function should take as input a string (out input, and a string representing the place holders like %d %d).

This belief is collaborated by the fact the at line phase_3 + 19, the program requires that the return value of sscanf should be bigger than 0, otherwise, the bomb will explode. The return value of sscanf should be the number of integers parsed successfully. Though trial and failure, you will indeed find that, no matter how many numbers you try to input, the result will still be 2 at most, since we only have 2 place holders for integers.

With that knowledge in mind, we should adjust our input accordingly, so that we can at least survive past line 34. I will just input 6 8. They are all lucky numbers in China!

Since we are moving on to deal with the content in the stack, I think that it was high time that we examine what is now in the stack.

We find that 6 and 8 that we input before is now in the stack, thanks to the sscanf function. What are the first 2 bytes, I really don’t care. I just need to know where my inputs are stored.

So apparently, we are jumping at line phase_3 + 50, to a mysterious place based on the stuff in *($rsp+8), which happens to be our first input number.

There are several lines doing similar things, storing a number in %rax, and jump to line 123. At line 123, we want to make sure that *(12+%rsp)==%eax, whatever was put into %rax at previous steps. Note that 12+%rsp is the place where we store our second input number!!!

Now things are clearing up!!!!!!!!!!!! We jump to a certain line based on out first input, the program will then put a certain number into %rax, we just need to make sure that our second input is the same as the number put into that register!!!!!!!!!! GOOD!!!!!!!!!!!!!!!

For simplicity, I will make the first number 0. There is no reason to calculate which line we jumps to by hand, use GDB’s nexti to find out where we jump to, and then adjust the second input number accordingly.

So we find that with 0 we jump to line 57, which puts 0xcf into %rax. 0xcf is 207 in decimal. Try 0 207 as a tentative answer, and phase 3 will be defused successfully!

As I alluded to, you can also try 1 as the first input, step with GDB, and you will find that the second input should be decimal 311. 1 311 is also a valid answer!

You can try more, but I will stop here. Hoped that you enjoyed it so far!!!


Phase 4: Function Calls

Today is May 21st, 2023. I am trying to finish one phase per day of this lab. This is a little ambitious, because I play too hard during the summer holiday.

Luckily, phase 4 of the lab is not a difficult one, as I get more experience with assembly along the way. It took me around half an hour to solve this problem.

As always, let’s set a breakpoint at phase_4 first, and then step through the assembly. Here is what we get when we inspect phase_4 of the code.

Okay, apparently, we are using sscanf to turn our string input into numbers, or something else. The question that remains is how many numbers we should input? We could find this out by inspecting the second argument of sscanf, which is what is stored in %esi: 0x4025cf. This is quite similar to what we did in phase_3, and because I still remembered that from two days ago, this step is quite intuitive.

It is expecting a string of the format “%d %d”.

So by inspecting the format stored at 0x4025cf, we should input two numbers. Now the central question becomes, which two numbers should we input? To answer that, we need to inspect the assembly further.

First we allocate on the stack 18 bytes. %rcx = 12 + %rsp. %rdx = 8 + %rsp. Again, these two registers are arguments passed into the sscanf function. This alludes to the fact that the two numbers that we input to the program are each stored in 8+%rsp and 12+%rsp. This fact can be checked by inputing two random numbers and then inspect the stack, which I will omit here since it has been demonstrated previously. I really cannot decipher what ws in the first 8 bytes starting at %rsp, nor do I really care about it.

Then, it is apparent that if we want to avoid triggering the bomb, then we should really make sure that we jump at line 39. In order to jump, we have to make sure that the first input is smaller than or equal to the number 0xe, which is 14 in decimal.

Line 69 also requires that the second input, which is stored at 0xc(%rsp), should be equal to 0.

It is calling a procedure called , and the arguments are: %rdx=14, %rsi=0, %rdi=first input. Let’s move on to inspect what is going on in , shall we?

Stepping into

Now we are stepping into func4, %rax = %rdx = 14, %eax = %eax - %esi = 14 - 0 = 14, %ecx = 14. Shift %ecx right by 31 bits, and store the result back in %ecx, so %ecx should be the sign bit of 14, which is 0. %eax = %eax + %ecx = %eax + 0 = %eax = 14. %eax shift arithmetic right by 1 bit, which makes %eax 7 (divide by 2). At line 17, %ecx = %rax + 1 * %rsi = 7 + 1 * 0 = 7. We don’t really want to call recursively at line 27, so %edi had better be less than or equal to %ecx, with %edi being the first input and %ecx being 7.

Assume that we jumped successfully to line 36, which moves 0 to %eax (this is what we want, since we want at line phase_4 + 65, that the return value of func4 being 0, to avoid setting the bomb off!!!). Then we really want %edi >= %ecx, to avoid the recursive calls to func4. Remember that previously, we want %edi <= %ecx, so these two should really be equal, which gives us the right first input: 7.

Just to give you the full picture what is doing…

Recall that I said that the second input should be 0, and we need two numbers exactly. The answer is here:

7 0

There may be more than 1 answer, but I will stop here.

And we are successful, it is indeed the answer!!!


Phase 5: Cracking ASCII Code

Hey guys, it’s midnight on May 21, 2023, 23:53. I have just finished phase 5 of the bomb lab (two phases in one night, because it is really super FUN!!!!!!). I am writing overnight in case I forget how to solve this problem when I wake up in the morning…. And also partly because I am so excited right now LOL.

This phase is worth extra credits, because it is supposed to be more difficult. However, I found the difficulty very acceptable, and it took me around 30 minutes to solve this problem. So don’t be terrified by the apparent complexity of this problem. If you have been following along, you have what it takes to solve this problem very easily.

As usual, let’s set a breakpoint at phase_5, and then examine the assembly via GDB.

So it allocate for the stack a space of 32 bytes, but the last 8 byte of the stack is protected by the canary to protect the process from stack overflow attack (this is a GCC default). So really the actual amount of space we can use on the stack is 24 bytes.

Then we use xor to force %eax to become 0, and then call the procedure <string_length>. We want to make sure that the return value in %rax is 6, otherwise the bomb will be triggered. So apparently the length of our input should be of a string of length 6, and we will jump to line 112.

Line phase_5 + 112 just clears the content in %eax, and then jump back to line 41, which, as we will see, is a loop that serves as the main logic.

Moving back to line 41… It looks that we are in a do-while loop.

Let’s examine this part of the ASM closely. It first stores the character at address (%rbx + 1 * %rax) = %rbx = input string starting address, to the register %ecx. It moves the lowest byte of %ecx to address (%rsp). It then moves the content at address (%rsp) to register %rdx, and AND with 0xf the last 4 bits. So now what is left in %rdx is the last four bits of the first character of our input.

Now we are at line phase_5 + 55, which is moving the byte at address 0x4024b0 + %rdx to register %edx, and moves the lowest byte to address %rsp + 1 * %rax + 16, and then reenter the loop until all 6 characters have been iterated. In other words, the i th byte that we load to the upper part of the stack frame (starting at the 16th byte and should not exceed the 24th byte because of the stack protector of GCC) depends on the last 4 bits of the ith character of our input stirng.

Then we should make sure that the new string at the upper part of the stack frame should be equal to the string stored at address 0x40245e, both address are passed as arguments into a function called <strings_not_equal>. If they are the same, then the bomb will be defused.

So let’s first find out which string we are trying to match here.

So apparently we are trying to construct a string called “flyers”. And we are selecting from a reserve of characters, starting at address 0x4024b0 from line phase_5 + 55.

So we have”maduiersnfotvbyl” to choose from, which contains all characters in the string “flyers”. Also note that we only have 16 characters here, which is the maximum number of choices that can be represented with 4 bits, since we are only using the last 4 bits of each character of our input as offset into this character reserve. This cannot be a coincidence, we are in the right place bro.

So we want our offset at %edx to be 9, 15, 14, 5, 6, 7 to get “flyers”, which translate into binary “1001 1111 1110 0101 0110 0111”, and all we need to do is to find 6 characters whose lower 4 bits are exactly that.

Checking the ASCII table, this string could be, but not necessarily, “9 ? > 5 6 7“ since the lowest 4 bits matches.

We try this, and this is indeed the answer!

Not too hard, right? Told you


Phase 6: Linked List and Sorting

The last phase of the bomb is significantly more complex than previous phases. It makes heavy use of pointers and array, which makes this procedure call a lot more complex. I will try my best to explain what is going on.

Let’s give out the answer first, because, why not? We input 4 3 2 1 6 5, and then see what will happen.

The first part of the code, from the first line to <phase_6+93>, is doing these: first read in 6 numbers from the standard input, put them at the beginning of the stack frame (as was done usually, this can be verified by printing the stack), check that all numbers are equal to or less then 6 (line phase_6+52, 56), and then check, using a for loop, that every number after is different from itself. This can be roughly reverse engineered to the following code:

1
2
3
4
5
6
7
8
9
10
11
12
read_six_numbers from stdin;

store them on the stack, starting from %rsp; (use arr to represent the six number array)

for (int i = 0; i < 6; i++){
if (arr[i] > 6) {
ABORT;
}
for (int j = i; j < 6; j++) {
if (arr[j] == arr[i]) { ABORT;}
}
}

The second part of the assembly, from <phase_6+95> to <phase_6+121>, is subtracting each number from 7, and store the result of the subtraction at the original stack position. This can be translated into the following pseudo code.

1
2
3
for (int i = 0; i < 6; i++){
arr[i] = 7 - arr[i];
}

The third part of the code puts address of each node to the stack frame, starting at position rsp+32. The most important address to understand is 0x6032d0. Print it and you will discover that the memory that this address points to store a linked list, though all its nodes are in a contiguous memory.

Printing the memory at 0x6032d0, something that I did not notice the first time is that the author gives you hint that this is a linked list data structure. Notice the yellow part of the disassembly actually says “node1”. The first field of each node seems to store the element, the second field is the index, the third field is the pointer to the next node, and the fourth field is what we call alignment, which speed up the memory fetch of a struct.

In hex, makes the address that the second word of each node points to clearer. The linked list is actually contiguous in memory here.

It takes some time to get a grasp of what the assembly is doing here. The rough idea is that it is storing the address of each node in the stack, and the sequence is in accordance of the 6 numbers we input (remember that we take 7 - i for each i we input, so the actual sequence is reversed. In other words, if the first number we input is 2, then the 3rd node will be stored in the 5th place of the array on the stack). If we take a breakpoint after all this madness has ended <at phase_6 + 183>, and then print out the stack memory, we will find that this is indeed the case.

After all madness ended, this is the memory layout. Starting at the third line, every word stores the starting address of the node. Nodes are put in a sequence that is relevant with our input to the program.

After executing through <phase_6+183> to <phase_6+220>

From line phase_6+183 to phase_6+220, the code does not do much. As we can tell from the above picture, which is the result of the execution of this part of the code, it just make the singly linked list a ring, pointing from the tail to the head.

The last part of the code checks that each node stored in the array on the stack has a bigger element than the next node (here by next I don’t mean the next node pointer, but rather the sequence we input). If it is bigger, than we can defuse the bomb successfully. Otherwise, the bomb goes off.

It is okay if you cannot decipher every line of the assembly, just focus on the result of each part of the code, and try to guess (or in a fancier term, reverse engineer) what is going on in each part. In all previous phases, I can understand each line of the disassembly with ease, but the last phase is different. I learned to focus on the result, instead of trying to decipher every line of the disassembly, because there are just too many lines.

So apparently, we kind of get the rough idea, after dividing the code into blocks, and then interpret the result of the execution of each code block, that we are sorting a linked list here, and the number we input reflects the order of the element of nodes. In the last part, we want to make sure that on the stack, nodes are put in decreasing order: 3 4 5 6 1 2. But since we take each number off 7 previously, the sequence we give to the program should be 4 3 2 1 6 5.

Put 4 3 2 1 6 5 to sol.txt, and then run the program.

Test it, and indeed, this is the result. We have defused the bomb successfully!!!

This concludes this lab assignment.


Author

Yuncheng Yao

Posted on

2023-05-18

Updated on

2024-01-09

Licensed under