Symbolic Execution of Linux Binaries

In this tutorial, we will show how to symbolically execute Linux binaries, without modifying their source code. Before you start, make sure that you have a working S2E environment.

Getting started

We will symbolically execute the echo utility. For this, create a new analysis project as follows:

# $S2EENV is the root of your S2E environment created with s2e init
s2e new_project -i debian-11.3-i386 $S2EENV/images/debian-11.3-i386/guestfs/bin/echo abc

This command creates a new analysis project that invokes echo with one parameter abc, which would normally print abc on the standard output.

Note that this command uses the echo binary from the guest VM image as the analysis target. Do not use /bin/echo as the target, as this will take the one from your host OS, which may not be compatible with the VM image that S2E uses.

Finally, we must explicitly pass the image name using -i, because we s2e new_project may decide to use a 64-bit image to run the binary, because it is in theory possible to run a 32-bit binary on a 64-bit guest. However, this could also fail because echo might use libraries specific to the 32-bit guest.

Using `s2e.so`

In this section, we show how to make command line arguments symbolic. Open the bootstrap.sh file and locate the following line:

S2E_SYM_ARGS="" LD_PRELOAD=./s2e.so ./${TARGET} abc > /dev/null 2> /dev/null

Modify this line as follows:

S2E_SYM_ARGS="1" LD_PRELOAD=./s2e.so ./${TARGET} abc > /dev/null 2> /dev/null

This makes argument 1 (abc) symbolic. The process works like this:

The s2e.so library is preloaded in the binary using LD_PRELOAD.
s2e.so reads from the S2E_SYM_ARGS environment variable which arguments to make symbolic.
If the variable is missing, s2e.so leaves all arguments concrete and proceeds with normal execution.
If not, s2e.so overwrites the specified arguments with symbolic values. It is possible to make only some arguments symbolic and leave others concrete by specifying the corresponding argument IDs.
S2E_SYM_ARGS="<ID_0> <ID_1> .. <ID_N>" # Mark argument <ID_N> as symbolic

You may have noticed > /dev/null 2> /dev/null at the end of the command. This avoids printing symbolic characters on the screen and eliminates forks in the kernel. There are some other tricks that s2e-env enables in bootstrap.sh in order to minimize unwanted forks:

Do not print crashes in the syslog with sudo sysctl -w debug.exception-trace=0
Prevent core dumps from being created with ulimit -c 0 (you may want to re-enable them if needed)

Warning

You must specify default concrete arguments, so that s2e.so can overwrite them with symbolic data. The following command will not work because there is no argument to make symbolic (abc is missing).

S2E_SYM_ARGS="1" LD_PRELOAD=./s2e.so ./${TARGET} > /dev/null 2> /dev/null

Warning

You cannot make the content of a file symbolic by just marking the file name symbolic. In other words, the following will not have the intended consequence:

S2E_SYM_ARGS="1" LD_PRELOAD=./s2e.so /bin/cat /path/to/myfile

Instead of making the content of /path/to/myfile symbolic, it makes the file name itself symbolic. The next section explains how to make the content of the file symbolic.

Warning

Your binary must be dynamically linked, otherwise you cannot use s2e.so. In case you want to make arguments symbolic for a statically-linked binary, see workarounds below.

What about other symbolic input?

Piping symbolic data. You can also feed symbolic data to your program through stdin. The idea is to pipe the symbolic output of one program to the input of another. Symbolic output can be generated using the s2ecmd utility. The command below passes four symbolic bytes to cat:

./s2ecmd symbwrite 4 | cat

If your binary is statically linked, you could pass it symbolic arguments as follows:

/bin/echo $(./s2ecmd symbwrite 4)

Note that this may be much slower than using s2e.so as symbolic data has to go through several layers of OS and libraries before reaching the target binary.

Using symbolic files. If your binary takes a file name as a parameter and you want the content of that file to be symbolic, the simplest is to create your analysis project as follows:

# The @@ is a placeholder for a concrete file name that contains symbolic data
s2e new_project -i debian-11.3-i386 $S2EENV/images/debian-11.3-i386/guestfs/bin/cat @@

This generates a bootstrap file that creates a symbolic file in ramdisk (i.e., in /tmp on Linux), writes some symbolic data to that file, and passes the path to that file to cat. The symbolic file must be stored in RAM (hence the ramdisk, or tmpfs). Writing symbolic data to a hard drive will concretize it.

Note

In case of cat, you may not see any forks with the command above, as the standard output is redirected to /dev/null and the symbolic data is therefore never branched upon. You must tweak the command line according to the aspects of the binary you want to test.

Using seed files. This is the preferred way of using S2E. Unconstrained files created by @@ may be less efficient at guiding the program towards an interesting path. Instead, you can use the concrete data of a file to guide path exploration:

s2e new_project -i debian-11.3-i386 $S2EENV/images/debian-11.3-i386/guestfs/bin/cat /path/to/file/on/host

This commands scans the command line for arguments that look like paths (e.g., /path/to/file/on/host) and configures bootstrap.sh to download such paths into the guest. In addition to that, it creates a .symranges file in the project directory that specifies which byte ranges of the file to make symbolic.

Configuring S2E for use with `s2e.so`

s2e-env automatically configures all plugins required to use s2e.so. Read this section if you want to know more about the configuration. You do not need to worry about this during normal use and can skip the rest of this tutorial.

s2e.so requires two plugins to operate: BaseInstructions and LinuxMonitor. The first provides general infrastructure to communicate with plugins, while the second keeps track of various OS-level events (e.g., process loads or thread creation). The S2E configuration file must contain default settings for these plugins, as follows:

plugins = {
  -- Enable S2E custom opcodes
  "BaseInstructions",

  -- Track when the guest loads programs
  "LinuxMonitor",
}

Besides making command line arguments symbolic, s2e.so also reads /proc/self/maps to figure out which shared libraries are loaded by the process and communicates their location to LinuxMonitor. LinuxMonitor then broadcast this information to any interested plugins. For example, the code coverage plugin uses this information to map program counters to a module name.

Warning

There is no s2e.so for Windows yet. In order to make program arguments symbolic, you must modify the source code manually. However, writing symbolic data to the standard input or to the ramdisk works like on Linux. On Windows, programs and shared libraries are tracked by a special guest driver, s2e.sys, that communicates with WindowsMonitor.

Modifying and building `s2e.so`

If you use s2e-env and stock VM images, s2e.so is automatically copied into the guest VM each time you start the analysis. You do not have to do anything special unless you want to modify it.

The s2e.so library source can be found in the guest folder of the S2E source directory and is built during the S2E build process. It can also be built manually by running make -f $S2ESRC/Makefile guest-tools-install from the build directory. This creates guest-tools32 and guest-tools64 in $S2EDIR/build/$S2E_PREFIX/bin (by default $S2E_PREFIX is equal to opt).

The latest build of s2e.so is copied in your guest VM next time you start the analysis, so all you need is to run the make command above if you modify the source code of s2e.so.

Symbolic Execution of Linux Binaries

Getting started

Using s2e.so

What about other symbolic input?

Configuring S2E for use with s2e.so

Modifying and building s2e.so

Using `s2e.so`

Configuring S2E for use with `s2e.so`

Modifying and building `s2e.so`