How to add a new architecture to QEMU - Part 3

TCG: the general concept
Getting binary data
Parsing an instruction
Translating an instruction
Editing the build system
The next steps

So far, we created the emulated hardware and the CPU for our new AVR32 QEMU implementation. Now it’s time for the real magic. In this article, I will show you how QEMU loads data from the firmware image, decides what instructions need to be executed, and how we tell QEMU what operations need to be performed.

TCG: the general concept

I already encouraged you to read the article about TCG internals. If you haven’t done it so far, now would be a good time to do so. But if you don’t want to do it: don’t worry. I will go over the most important aspects now.

QEMU uses the Tiny Code Generator (TCG) to emulate the instructions of another CPU architecture. When a binary file is loaded into QEMU and the setup is done, the emulated program counter should point to the start of the program text. QEMU then loads a few bytes of data and performs pattern matching to identify the next CPU instruction. If an instruction is found, QEMU executes the corresponding handler function that needs to be implemented by us. Inside the handler function, we use QEMUs TCG-operations to create an Intermediate Representation (IR) of the instructions operations. For example, there are TCG operations that add two emulated registers and place the result in a third register.

This is done until a translation block (TB) ends. A translation block is a sequence of CPU instructions. A TB ends when the CPU reaches a jump or branch instruction. Then, the IR is translated into instructions for the host CPU architecture that QEMU is executed on (probably x86). This is done automatically by QEMUs internals. Next, the host code of the TB is executed, and the process is started again at the new program counter position. This is called the execution loop.

Getting binary data

The generation of the IR is done inside the target/avr32/translate.c file. Because it is very large, I will not include every function here. I will cover the most important ones now and maybe add a few others later. You can find the full code in my GitHub repository. We create the file and start to implement the loading function. It loads 2 or 4 bytes from the memory and returns them:

static uint32_t decode_insn_load_bytes(DisasContext *ctx, uint32_t insn,
                                       int i, int n){
    //If 
    if(i == 0){
    //cpu_lduw loads an unsigned word(16 bit in QEMU) from the emulated memory
        insn = cpu_lduw_be_data(ctx->env, ctx->base.pc_next + i) << 16;
    }
    else if (i== 2){
        insn |= cpu_lduw_be_data(ctx->env, ctx->base.pc_next + i);
    }

    //No instruction was loaded. This should ne happen.
    if(insn == 0x0){
        gen_helper_raise_illegal_instruction(cpu_env);
    }
    return insn;
}

The loading function is called by avr32_tr_translate_insn. We will later tell QEMU that this is the translation function.

static void avr32_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
{
    DisasContext *ctx = container_of(dcbase, DisasContext, base);
    uint32_t insn;

    //Moce the curent location into the emulated program counter register
    tcg_gen_movi_tl(cpu_r[PC_REG], ctx->base.pc_next);

    //load the next instruction
    insn = decode_insn_load(ctx);
    //Call the decode function
    if (!decode_insn(ctx, insn)) {
        error_report("[AVR32-TCG] avr32_tr_translate_insn, illegal instr, pc: 0x%04x\n", ctx->base.pc_next);
        gen_helper_raise_illegal_instruction(cpu_env);
    }
}

The decode function is generated by QEMU during the build time. We only need to provide the correct input for this function. We come to that in a second. First, we need to tell QEMU what functions are responsible for the different steps of the translation process. This is done in the TranslatorOps:

static const TranslatorOps avr32_tr_ops = {
        .init_disas_context = avr32_tr_init_disas_context,
        .tb_start           = avr32_tr_tb_start,
        .insn_start         = avr32_tr_insn_start,
        .translate_insn     = avr32_tr_translate_insn,
        .tb_stop            = avr32_tr_tb_stop,
        .disas_log          = avr32_tr_disas_log,
};

Each of these functions needs to be implemented by us. For now, we will look at the direct translation process. You can find the implementation of the other functions in my GitHub repository.

Parsing an instruction

Now that we have an instruction loaded and the decode function called, we need to provide instruction patterns to QEMU. As mentioned earlier, QEMU used pattern matching to identify the correct instruction from an amount of bytes. A pattern contains the opcode (fixed bits that are always the same for one instruction) and fields. The fields have a fixed length and position inside the instruction. For example, there can be two 4-bit-long fields in an instruction that specify the numbers of two registers. The fields are set by the AVR32 compiler when a program is compiled. The AVR32 Architecture Document defines the operations and the bit sequence of every instruction in the instruction set. You should notice that there can be more than one definition (called format) for every instruction.

Let’s look at an example:

The ADD– Add without Carry instruction has two formats that perform the following operations:

Format 1: \(Rd \leftarrow Rd + Rs\)

Here, an addition of registers Rd and Rs is performed, and the result is stored in Rd. Rd is a placeholder for the real register number that is set at the compile time of a program.

Format 2: \(Rd \leftarrow Rx + (Ry<< sa2)\)

The second format uses three registers. One for the result, two for the operands. There is also a fourth field in this instruction that provides a shit amount for that second operand.

The opcode is provided by the Architecture Document. The listing looks like this:

As you can see, the Architecture Document specifies the bit position of the fields inside the instruction bits. It also tells us the opcode (the fixed bits) that will be used to identify the instruction.

In QEMU, we will use the file target/avr32/insn.decode to specify the opcode and fields:

&rs_rd              rs rd
&rd_rx_ry_sa        rd rx ry sa

@op_rs_rd           ... rs:4 ..... rd:4                         &rs_rd
@op_rd_rx_ry_sa     ... rx:4 ..... ry:4 .... .... .. sa:2 rd:4  &rd_rx_ry_sa

ADD_rd_rs       000 .... 00000 ....                          @op_rs_rd
ADD_rd_rx_ry_sa 111 .... 00000 .... 0000 0000 00 .. ....     @op_rd_rx_ry_sa

Let’s start from the bottom up: We first set an identifier for an instruction (ADD_rd_rs). It is always on the left side. You can use any string you want, but it is wise to use a consistent concept. Next, we need to specify the bitfield of the instruction. We replace the fields with wildcards (dots), as we do not know their value at this point. When an instruction is loaded, QEMU decodes the fields and passes their content to a format that we provided on the right side.

The definitions of the formats are above the instructions. In the formats, we replace the bits of the opcode with wildcards, as they are not relevant for now. The fields are replaced with a name and their length. On the right site, we set a corresponding argument set. These sets are defined at the top and allow us to name the fields with any string we want. Later, we can use this string to access the content of the fields.

We also need to set the instructions inside the target/avr32/disas.c file:

#define REG(x) avr32_cpu_r_names[x]

INSN(ADD_rd_rs,          ADD,      "%s, %s",                   REG(a->rd), REG(a->rs))
INSN(ADD_rd_rx_ry_sa,    ADD,      "%s, %s, %s",               REG(a->rd), REG(a->rx), REG(a->ry))

On the right side, you can see how we access the field contents by the names we choose in the format.

Translating an instruction

Now we come to the fascinating part: the translation of the operations of the instruction. By convention, each instruction needs a translation function inside translate.c. The function must start with the prefix trans_ followed by the identifier that we set for the instruction:

static bool trans_ADD_rd_rs(DisasContext *ctx, arg_ADD_rd_rs *a){
    TCGv res = tcg_temp_new_i32();
    TCGv Rd = tcg_temp_new_i32();
    TCGv Rs = tcg_temp_new_i32();
    tcg_gen_mov_i32(Rd,  cpu_r[a->rd]);
    tcg_gen_mov_i32(Rs, cpu_r[a->rs]);

We start our translation by creating a few temporary variables. We will use them to perform any calculations, as we may need the original register contents later. Again, you can see how we access the values of the fields from the instructions bitfield. We move these values into the temporary variables.

Important: You cannot access, change, or see the TCGv contents or the contents of the emulated registers inside the translation function. Remember: QEMU loads an instruction, generates the IR, and repeats this until a Translation Block ends. The TB is only executed after that. That means, if you look inside the value of a register in the translation function, the value is not known at that time because the emulation still needs to happen. You can only see the contents of the registers at the start of the translation block because the previous block just ended and the emulation of the operations was done.

Now let’s calculate the addition:

    tcg_gen_add_i32(res,  cpu_r[a->rd], cpu_r[a->rs]);
    tcg_gen_add_i32(cpu_r[a->rd],  cpu_r[a->rd], cpu_r[a->rs]);

QEMUs TCG provides various frontend-ops that allow us to generate IR code that changes the contents of registers. We just used the tcg_gen_add_i32 function, which performs an addition of two registers. res is the temporary TCGv for the result that we created above.

If you looked into the definition of the ADD instruction, you noticed that the instruction also changes the status register. We add this functionality here:

    // set N flag: N ← RES[31]
    tcg_gen_shri_i32(cpu_sflags[sflagN], res, 31);

    // set Z flag: Z ← (RES[31:0] == 0)
    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_sflags[sflagZ], res, 0); /* Zf = res == 0 */

    tcg_gen_shri_i32(Rd, Rd, 31);
    tcg_gen_shri_i32(Rs, Rs, 31);
    tcg_gen_shri_i32(res, res, 31);

    // V-flag
    set_v_flag_add(Rd, Rs, res, cpu_sflags);

    // C-flag
    set_c_flag_add(Rd, Rs, res, cpu_sflags);

    if(a->rd == PC_REG){
        ctx->base.is_jmp = DISAS_JUMP;
    }

If the destination register is the program counter, the execution will likely continue at another location. Because the register number is set in the instructions bitfield, we can check if it is equal to the PC register. If this is the case, we tell QEMU that a jump to another address should be done.

At the end, we need to increase the program counter by 2, as the instruction is 16 bits long:

    ctx->base.pc_next += 2;
    return true;
}

That’s it. We created our first translation function.

Editing the build system

We need to add our new files to the build system:

#file Kconfig
config AVR32
    bool

#file meson.build
gen = [
  decodetree.process('insn.decode', extra_args: [ '--decode', 'decode_insn',
                                                  '--varinsnwidth', '32'  ])
]
avr32_ss = ss.source_set()
avr32_softmmu_ss = ss.source_set()

avr32_ss.add(gen)
avr32_ss.add(files(
  'cpu.c',
  'disas.c',
  'translate.c'
  ))
avr32_softmmu_ss.add(files('machine.c'))
target_arch += {'avr32': avr32_ss}
target_softmmu_arch += {'avr32': avr32_softmmu_ss}

Now you should be able to actually compile QEMU with AVR32 support:

./configure --targetlist=avr32-softmmu
make -j 16

If everything is correct, you should have a working QMEU build now.

The next steps

With just one instruction, we can’t do much. You should go and try to add an instruction by your own. You should start with the MOV instruction, as we will need it for the next article. If you encounter any issues, feel free to look at my code on GitHub. Or maybe ask me directly (see contact info).

In the next article, I will show you how we can perform branch operations and use them to test our implementation.