Binary Instrumentation in QEMU

Introduction
Binary Instrumentation
Implementing function hooking in QEMU
Defining function hooks
Adding instrumentation code
The result
Conclusion

Introduction

QEMU can be used to dynamically instrument a binary file. This comes in handy when you want to analyze the behavior of a firmware image or when doing reverse engineering. For example, if you want to observe the arguments of function calls or a specific memory value, binary instrumentation is a possible solution. The advantage of using QEMU for instrumentation is that you don’t need to modify the target binary file or the source code (if you even have access to it). All instrumentation steps can be done transparently in the emulator.

In this article, I will explain how minimal instrumentation can be achieved with QEMU. If you’re interested in advanced functionality, there are multiple projects that provide full frameworks for this kind of task (e.g. PyREBox and PANDA).

Binary Instrumentation

There are multiple ways to instrument functions in a binary file. Some techniques require modifications of the source code or changes during compile time. In my case, I did not have access to the source code of the target firmware, so such techniques were not suitable. Therefore, I looked for methods that work by only expanding QEMU.

I came up with the following three approaches. They provide simple ways to instrument any function in QEMU.

Static function hooking by address

The first and most simple option is to statically hook a function by its address in the firmware. In my article on satellite firmware fuzzing, I used this approach. When using this option, QEMU checks if the current program counter is equal to the provided target address.

The advantage of this technique is that it works with minimal modification of QEMU and without much preparation.

The disadvantage is that you need to know the exact address of the target function in the binary file. If you want to work on multiple binaries that contain the same function (e.g. a libc function or a function from a SKD), you need to modify the hooking code for each binary. This makes automatic workflows much harder.

Dynamic function hooking by symbol name

If you’re lucky and have access to an ELF-file of your target binary, you can hook a function by its name. Tools like frida also provide such options. Here, you provide the name of the target function to the instrumentation code and, the corresponding address is determined by searching in the ELF symbol table.

The advantage of this approach is that you can hook a function without knowing its address beforehand. Once this functionality is implemented, the address search can be done automatically. If you are working on multiple binaries, this makes automated workflows much easier.

However, you need to find the address of a function in the ELF-Headers. This requires access to the ELF-file of the target and the preparation of the search functions.

Dynamic function hooking by function opcodes

If you do not have access to the ELF-information of your target binary, you can still find the target function address. Here, you provide the hooking function with the opcode pattern of the function, also called Array Of Bytes (AOB). Then you can search emulated memory with the program code for the provided pattern.

This approach does not rely on any metadata or additional information and works with raw binaries. Of course, you need to find the pattern of a function. For example, you can reverse engineer the binary file, or you can look at the ELF-file of another binary that uses the same SDK and compiler as your target.

Depending on the build chain, the opcode pattern of one function might be different across multiple binaries. This approach also requires some preparation, as the AOB scan needs to be implemented.

Implementing function hooking in QEMU

To instrument a function in a binary, we first need to expand QEMU to make it able to hook emulated functions.

Let’s start by adding a hooking.c and a hooking.h file. Because I only want to work on AVR32 binaries in this example, we can place the files in target/avr32 and add them to the AVR32 build configuration.

As I already showed how static function hooking can be achieved, we will implement the two dynamic approaches I described above. Inside hooking.h, we need to define a new struct:

struct sym_hook {
    const char* name;
    uint32_t address;
    uint32_t p_len;
    const uint8_t* pattern;
};

We will use this data structure to specify our target function, either by name or by pattern. Function declarations will also be placed in hooking.h. To keep the article short, I will not write them down here.

The first function that we implement is used to populate the address fields of our hooks. The function first checks if a hook contains an opcode pattern. If so, we search for the pattern in the emulated memory. If there is no pattern, we search for the symbol address in the ELF-headers. The function should be called right at the start of the emulation, so that function hooks are performed as early as possible. For example, in the CPU initializing function.

void init_sym_hooks(CPUAVR32AState *env){
    for(int i = 0; i < sizeof(symbol_hooks)/sizeof(struct sym_hook); i++){
        if(symbol_hooks[i].pattern > 0){
            printf("[FUNCTION HOOK] Performing AOB scan for symbol %s\n", symbol_hooks[i].name);
            symbol_hooks[i].address = aob_scan(symbol_hooks[i].pattern, symbol_hooks[i].p_len, env);
            if(symbol_hooks[i].address != -1){
                printf("[FUNCTION HOOK] Found address of %s at: 0x%04x\n", symbol_hooks[i].name, symbol_hooks[i].address);
            }
            else{
                printf("[FUNCTION HOOK] Could not find AOB address for %s.\n", symbol_hooks[i].name);
            }
        }
        else{
            search_sym_tab_entry(&symbol_hooks[i]);
        }
        if(symbol_hooks[i].address == -1){
            error_report("[FUNCTION HOOK] Address of %s could not be found!\n", symbol_hooks[i].name);
            exit(1);
        }
    }
}

Maybe you noticed the function search_sym_tab_entry. It searches the symbol table of the elf-header for the specified function name. Loading and searching the ELF-header is out of scope for this article, as it would make the text way too long. Maybe I will write about it later. For now, keep in mind that an ELF-file contains a table with all symbols and their location inside the binary. If we go through this table and compare the symbol names with the name of our target function, we will find the corresponding address.

The function aob_scan receives a pattern as input and then searches the emulated memory for it. To access QEMUs emulated memory, we need a CPUAVR32AState object, which is also passed as an argument.

uint32_t aob_scan(const uint8_t *pattern, uint32_t p_len, CPUAVR32AState *env){
    int idx = 0;
    int ptr = 0xd0000000; //Start address in OPS-SAT simulator
    int match_start;

    int max_address = ptr + 32*1024*1024; //Size of flash memory in Nanomind A3200

    while(ptr < max_address){
        uint8_t data = cpu_ldub_data(env, ptr);

        //First or next pattern match
        if(data == pattern[idx]){
            if(idx == 0){
                match_start = ptr;
            }
            idx++;
            //Address is found
            if(idx >= p_len){
                return match_start;
            }
        }
        //There was a match, but now there is a miss
        else if (idx > 0){
            //Pointer is increased after else if, so no endless loop occurs
            ptr = match_start;
            idx = 0;
        }
        ptr++;
    }
    //End of memory area, no match found
    return -1;
}

As you can see, the search algorithm is good enough for a proof-of-concept, but it’s not optimal. The performance can be improved by not always jumping back to match_start + 1 but to the next potential match. If there is no new potential match, we can continue at the current pointer address if only a part of the pattern is found. There are a bunch of String-search algorithms that have a better performance than the naive approach. However, the implementation works good enough for a small memory area.

Defining function hooks

With the ability to find the address of any target function, we can now define actual instrumentation targets. I will again use the OPS-SAT firmware image as my target binary. Of course, you can use any binary file you have at hand. You only have to make sure that you target functions that are part of your binary.

I will observe the arguments of any call to printf. I could also detect when the main function is called and I will modify the arguments of the csp_sfp_recv function.

At the start of the file, add the following code:

#define SYM_HOOK_VPRINTF 0
#define SYM_HOOK_MAIN 1
#define SYM_HOOK_CSP_SFP_REV 2

const uint8_t AOB_MAIN[] = { 0xd4, 0x21, 0xe0, 0x6a, 0xe1, 0x00, ...};

struct sym_hook symbol_hooks[] = {
        {"vprintf", -1},
        {"main", -1, 16, AOB_MAIN},
        {"csp_sfp_recv", -1},
};

The symbol_hooks array was already used in the init function. You can see that we will hook vprintf and csp_sfp_recv by their names and main by its pattern. You can simply select the pattern in Ghidra and use the Copy Special function in the right-click context menu to get a formatted C array.

As C does not provide an easy way to get the size of a char array, we also need to set the pattern length at this point. In this case, I copied 16 bytes of opcode. It is possible that a pattern occurs multiple times in a binary. Therefore, it is advised to copy longer sections.

Adding instrumentation code

The last missing thing is the actual instrumentation code. Let`s define a function that is called by a QEMU helper at the start of every translation blog (look here to see how this is done).

void do_hook(CPUAVR32AState *env)
{
    init_sym_hooks(env);
    CPUState *cs = env_cpu(env);
    uint32_t cur_addr = cs->env_ptr->r[AVR32A_PC_REG];

    if(cur_addr == symbol_hooks[SYM_HOOK_VPRINTF].address){
        printf("[FUNCTION HOOK] vprintf was called with: '%s'\n", read_string(cs->env_ptr->r[12], env));
    }

    if(cur_addr == symbol_hooks[SYM_HOOK_MAIN].address){
        printf("[FUNCTION HOOK] main() was called\n");
    }

    if(cur_addr == symbol_hooks[SYM_HOOK_CSP_SFP_REV].address){
        printf("[FUNCTION HOOK] csp_sfp_recv(): performing argument substitution for r9: 0x%x => 1\n", env->r[9]);
        env->r[9] = 1;
    }
}

Here, we simply print out that our target functions were called. In the case of vprintf, we also read the first argument as a char pointer and print its content. The read_string function simply reads bytes from the emulated memory until it reaches a zero byte (in C strings end with a 0). As the first argument of vprintf is a char pointer, we can use it as an input for read_string.

We can do this with any function argument, as we have access to the emulated CPU registers.

To make use of the register contents, you need to look at the function definition or do some reverse engineering. Otherwise, a value like 0x12341234 does not provide much information.

We could also send the function arguments to an external file or change them. In the last if-clause, we modify the value of the 4th call argument and set it to 1. This is a specific workaround to skip a longer waiting time before some tasks in my target firmware are started.

The result

When I start QEMU with my target binary, I receive this output:

...
[FUNCTION HOOK] Found symbol vprintf => 0x****
[FUNCTION HOOK] Performing AOB scan for symbol main
[FUNCTION HOOK] Found address of main at: 0xd0056d04
[FUNCTION HOOK] Found symbol csp_sfp_recv => 0x*****
...
[FUNCTION HOOK] main() was called
...
[FUNCTION HOOK] vprintf was called with: 'Ram image'
[FUNCTION HOOK] vprintf was called with: 'Mounting /uffs/flash...'
[FUNCTION HOOK] vprintf was called with: 'UFFS: Mount ok'
...
[FUNCTION HOOK] csp_sfp_recv(): performing argument substitution for r9: 0x2710 => 1
[FUNCTION HOOK] vprintf was called with: 'Started ADCS server on port %u'

As you can see, our implementation finds the addresses of our target functions and then executes our instrumentation code every time one of the targets is called. Even the argument modification works as expected.

Conclusion

Binary instrumentation is a very useful tool when analyzing firmware images. QEMU provides a simple base for such tasks, and custom instrumentation code can be added without much effort.

Of course, the instrumentation I showed here is quite basic. But it can be easily expanded, and depending on the context and the specific task, it is a lightweight alternative to instrumentation frameworks. Especially custom instrumentation is simpler done this way. As I showed in my fuzzing article, even the static approach is helpful for certain tasks.