How to add a new architecture to QEMU - Part 8
My previous article in this series explained how coprocessor operations can be implemented in QEMU. However, I did not cover the actual operations of the AT32UC floating-point unit (FPU).
Working with floating-point numbers is a bit more complex than working with integers. If you studied computer science (or anything similar), you probably remember how much fun it is to manually calculate them. Luckily, QEMU already provides the means to perform floating-point arithmetic. Hence, I will give a brief overview of how to use them.
Some background
Computers are good when it comes to calculations with discrete numbers (like 3
or 789345
).
But they are not built to calculate floating-point numbers (like 3.143
).
Some floating-point numbers have infinite digits (take Pi
), so we would need infinite memory to store them.
That’s simply not an option.
A solution for this issue is rounding. Usually, computers do not store exact floating-point numbers, but a close enough rounded value. This comes with a downside. Besides a loss of precision, there is a loss of performance.
For this reason, many processors have special hardware areas solely for the purpose of floating-point calculations.
The AR32UC processor contains an FPU that supports float values based on the C standard.
The C standard is again based on the IEEE 754 floating point standard.
If you want to learn more about this subject, I highly recommend you to read What every computer scientist should know about floating-point arithmetic
.
Floating-point numbers in QEMU
When I started working on the FPU implementation, I was afraid that I would need to manually work on binary floating-point numbers. Usually, operations with TCG values in QEMU are generated from the tiny code generator. The TCG provides various operations, but they are all intended to be used on integers. But as it turns out, there is already an IEEE 754 implementation ready to use.
As per my previous article, I used a helper for the FPU implementation. At the beginning of the helper file, I added an import for QEMU’s FPU implementation.
#include "fpu/softfloat.h"
And now, we can use various arithmetic options for float values.
For example, let’s take a look at the fmul.s
operation:
static void fmuls(CPUAVR32AState *env, uint32_t rd, uint32_t rx, uint32_t ry){
env->r[rd] = float32_mul(env->r[rx], env->r[ry], &env->fp_status);
}
Here, a simple function call is everything we need.
fmul.s
multiplies two registers and stores the result in a third register.
The function assumes that the rx
and ry
registers already contain valid IEEE 754 representations of float numbers.
How did they get there?
That’s something the firmware that calls the coprocessor has to take care of.
One option is to use the fcastsw.s
operation.
It converts a signed number into a float number.
Again, QEMU provides a function for this:
static void fcastsws(CPUAVR32AState *env, uint32_t rd, uint32_t rx){
env->r[rd] = int32_to_float32(env->r[rx], &env->fp_status);
}
If you need to implement a specific float behavior, I recommend you to take a look into the fpu/softfloat.h
file.
Likely, there is already an implementation of what you need.
Additional things
Did you notice the &env->fp_status
variable in the examples above?
We did not use it here, but it has a purpose.
Some FPU operations may result in an error. For example, if the value in a register is not a valid IEEE 754 representation.
Another case is the Floating point check (fchk.s)
operation.
It checks if a value is not a number (NaN)
.
Then the values in the specific register if 0x7FC00000
, as defined in the AT32UC manual.
Such options are set in the fp_satus
variable.
The variable is defined in the cpu.c
file:
//..
env->sr = 0;
env->fp_status.default_nan_mode = false;
env->fp_status.no_signaling_nans = 0x7FC00000;
//...
Conclusion
As always, you can find the full code of the FPU implementation in my GitHub repository. For now, most of the FPU operations are implemented. There is still a need to add some tests and a few very specific operations. But overall, the implementation should do a decent job.
In my next article, I will talk about more hardware interactions.