Instrumenting Program Source Code for S2E

s2e-env and s2e.so automate most of the tedious tasks of setting up virtual machines, S2E configuration files, launch scripts, etc. This tutorial assumes that you have already built S2E and prepared a VM image as described on the Building the S2E Platform page, that you are familiar with s2e-env, and that you tried symbolic execution on simple Linux binaries.

In this tutorial, you will learn in more details what happens under the hood of s2e-env and s2e.so, in particular how they actually make input symbolic, and how you can do the same for arbitrary programs. This may be useful on targets and platforms that are not supported yet by S2E’s project creation scripts. We advise you to study how these tools work and adapt them to your own needs. Don’t hesitate to look at their source code.

Introduction

We want to cover all of the code of the following program by exploring all the possible paths through it.

#include <stdio.h>
#include <string.h>

int main(void) {
    char str[3];

    printf("Enter two characters: ");

    if (!fgets(str, sizeof(str), stdin)) {
        return 1;
    }

    if (str[0] == '\n' || str[1] == '\n') {
        printf("Not enough characters\n");
    } else {
        if (str[0] >= 'a' && str[0] <= 'z') {
            printf("First char is lowercase\n");
        } else {
            printf("First char is not lowercase\n");
        }

        if (str[0] >= '0' && str[0] <= '9') {
            printf("First char is a digit\n");
        } else {
            printf("First char is not a digit\n");
        }

        if (str[0] == str[1]) {
            printf("First and second chars are the same\n");
        } else {
            printf("First and second chars are not the same\n");
        }
    }

    return 0;
}

Compiling and running

First, compile and run the program as usual, to make sure that it works.

$ gcc -m32 -O3 tutorial1.c -o tutorial1
$ ./tutorial1
Enter two characters: ab
First char is lowercase
First char is not a digit
First and second chars are not the same

Then, create a new project using s2e-env . Once it is created, run it, and check that it runs properly in the guest. You will need to run S2E with graphics output enabled so that you can type the input.

s2e new_project /path/to/tutorial1

Preparing the program for symbolic execution

In order to execute the program symbolically, it is necessary to specify what values should become symbolic. There are many ways to do it in S2E, but the simplest one is to use the S2E opcodes library. This library provides a way for guest code to communicate with the S2E system.

In order to explore all possible paths through the program that correspond to all possible inputs, we want to make these inputs symbolic. To accomplish this, we replace the call to fgets() by a call to s2e_make_symbolic():

...
char str[3];
// printf("Enter two characters: ");
// if(!fgets(str, sizeof(str), stdin))
//   return 1;
s2e_make_symbolic(str, 2, "str");
str[3] = 0;
...

Finally, it would be interesting to see an example of input value that cause a program to take a particular execution path. This can be useful to reproduce a bug in a debugger, independently of S2E. For that, use the s2e_get_example() function. This function gives a concrete example of symbolic values that satisfy the current path constraints (i.e., all branch conditions along the execution path).

After these changes, the example program looks as follows:

#include <stdio.h>
#include <string.h>
#include <s2e/s2e.h>

int main(void) {
    char str[3] = { 0 };

    // printf("Enter two characters: ");
    // if (!fgets(str, sizeof(str), stdin)) {
    //     return 1;
    // }

    s2e_make_symbolic(str, 2, "str");

    if (str[0] == '\n' || str[1] == '\n') {
        printf("Not enough characters\n");
    } else {
        if (str[0] >= 'a' && str[0] <= 'z') {
            printf("First char is lowercase\n");
        } else {
            printf("First char is not lowercase\n");
        }

        if (str[0] >= '0' && str[0] <= '9') {
            printf("First char is a digit\n");
        } else {
            printf("First char is not a digit\n");
        }

        if (str[0] == str[1]) {
            printf("First and second chars are the same\n");
        } else {
            printf("First and second chars are not the same\n");
        }
    }

    s2e_get_example(str, 2);
    printf("'%c%c' %02x %02x\n", str[0], str[1],
           (unsigned char) str[0], (unsigned char) str[1]);

    return 0;
}

Note

There are alternatives to s2e_get_example to get test cases. The simplest one is using the TestCaseGenerator plugin, which is enabled by default, and outputs test cases in s2e-last/debug.txt.

Compile the program and try to run it on your host:

$ gcc -I$S2ESRC/guest/common/include -O3 tutorial1.c -o tutorial1
$ ./tutorial1
Illegal instruction

You see the Illegal instruction message because all s2e_* functions use special CPU opcodes that are only recognized by S2E.

Running the program in S2E

Now rerun the program above in S2E, using the launch scripts generated by s2e-env. You should see several states forked, one for each possible program input. Each state is a completely independent snapshot of the whole system. You can even interact with each state independently, for example by launching different programs. Try to launch tutorial1 in one of the states again!

In the host terminal (i.e., the S2E standard output), you see various information about state execution, forking and switching. This output is also saved into the s2e-last/debug.txt log file. As an exercise, try to follow the execution history of a state through the log file.

Terminating execution paths

By default, S2E runs paths forever and needs a special order in order to terminate an execution path. The s2e-env tool wraps programs in a script that will take care of terminating paths when the program returns or when it crashes. Sometimes, you may want to terminate the execution path yourself, directly from your program. This is particularly useful if you run S2E on a system that is not yet supported by s2e-env, such as embedded OSes.

Terminating an execution path is accomplished with the s2e_kill_state() function. A call to this function immediately stops the execution of the current path and exits S2E if there are no more paths to explore. Add a call to this function just before the program returns control to the OS. Before that, you may want to print example values in the S2E log using s2e_printf():

int main(void)
{
  char str[3] = { 0 };

  ...

  s2e_get_example(str, 2);
  s2e_printf("'%c%c' %02x %02x\n", str[0], str[1], (unsigned char) str[0], (unsigned char) str[1]);
  s2e_kill_state(0, "program terminated");

  return 0;
}

When you rerun the program, you will see that the logs contain the message program terminated.

You can also terminate the execution from a script, using the s2ecmd tool.

guest$ ./tutorial; ./s2ecmd kill 0 "done"