Symbolic Execution of Linux Binaries

In this tutorial, we will show how to symbolically (or concolically) execute existing Linux programs, without modifying their source code. This tutorial shows how to do this directly from the program’s command line.

To do so, we use the s2e.so shared library and LD_PRELOAD. This library intercepts the call to the main function and inserts user-configured symbolic arguments.

Obtaining and compiling s2e.so

The s2e.so library can be found in the guest folder of the S2E distribution and is built during the standard S2E build process. It can also be built by running make guest-tools-install. This will create guest-tools32 and guest-tools64 in $S2EDIR/build-s2e/$S2E_PREFIX/bin (remember that by default $S2E_PREFIX is equal to opt). Copy the directory matching your guest’s architecture into the guest virtual machine.

Configuring S2E for use with s2e.so

s2e.so communicates with several S2E plugins in order to restrict symbolic execution to the program of interest. The S2E configuration file must contain default settings for these plugins, as follows:

s2e.so communicates process module information to both the RawMonitor and LinuxMonitor plugins. However, only one of these plugins can be enabled at any one time. LinuxMonitor is preferred over RawMonitor because it provides more information. However LinuxMonitor can only be used with a modified Linux kernel.

plugins = {
  -- Enable S2E custom opcodes
  "BaseInstructions",

  -- Track when the guest loads programs
  "LinuxMonitor",

  -- Alternatively, RawMonitor may be used
  -- "RawMonitor",
}

Using s2e.so

The s2e.so library must be preloaded with your binary using LD_PRELOAD. s2e.so intercepts the program’s entry point invocation, checks environment variables for the desired configuration and configures symbolic execution.

For example, the following example invokes echo from GNU Coreutils making arguments arg1 and arg2 symbolic. The execution path is terminated after echo returns:

S2E_SYM_ARGS="1 2" LD_PRELOAD=/path/to/guest/s2e.so /bin/echo arg1 arg2
/path/to/guest/s2ecmd/s2ecmd kill 0 "echo done"

The s2ecmd utility allows the user to control S2E from the guest’s command line. Here, we ask it to kill the execution path after echo returns. It is important to do so otherwise S2E will run forever and the 100s of paths generated by echo will eventually wait indefinitely at the prompt.

The s2e.so library supports the S2E_SYM_ARGS environment variable. This environment variable allows the user to specify which arguments should be considered symbolic. If S2E_SYM_ARGS is not used all arguments are considered concrete. When specifying symbolic arguments <ID_0> corresponds to the program’s name (i.e. argv[0]).

S2E_SYM_ARGS="<ID_0> <ID_1> .. <ID_N>"       Mark argument <ID_N> as symbolic

Analyzing large programs with concolic execution

Depending on the program under analysis, symbolic execution may get stuck in a large state space. In general it is better to use concolic execution. The following command runs echo in concolic mode, based on the concrete parameter abc (i.e. the first path will print abc, while the others will print other strings).

S2E_SYM_ARGS="1" LD_PRELOAD=/path/to/guest/s2e.so /bin/echo abc
/path/to/guest/s2ecmd/s2ecmd kill 0 "echo done"

You may also want to do the following to prevent the program from forking in the kernel:

  • Add > /dev/null to prevent forking when printing symbolic content
  • Don’t print crashes in the syslog via sudo sysctl -w debug.exception-trace=0
  • Prevent core dumps from being created with ulimit -c 0

What about other symbolic input?

You can easily feed symbolic data to your program via stdin. The idea is to pipe the symbolic output of one program to the input of another. Symbolic output can be generated using the s2ecmd utility.

/path/to/guest/s2ecmd/s2ecmd symbwrite 4 | cat

The command above will pass 4 symbolic bytes to cat.

The easiest way to have your program read symbolic data from files (other than stdin) currently involves a ramdisk. You need to redirect the symbolic output of s2ecmd symbwrite to a file residing on the ramdisk, then have your program under test read that file. On many Linux distributions, the /tmp filesystem resides in RAM, so using a file in /tmp works. This can be checked using the df command: it should print something similar to tmpfs 123 456 123 1% /tmp.