PS2 Linux Programming
Using Inline Assembly
The intention of this tutorial is to provide enough
knowledge to get started coding MIPS Assembly using the GCC inline assembler.
The MIPS EE core has hundreds of different instructions which can be found in
the “EE Core Instruction Set Manual”.
MIPS Assembly (EE)
A simple inline ASM routine in GCC is given below:
int iAnswer, iX = 10, iY = 30;
asm __volatile__("
addu %0,
%1, %2 # iAnswer = iX + iY
": "=r"(iAnswer) :
"r"(iX), "r"(iY));
This function adds the two integer variables iX and
iY together and stores the result in iAnswer. The first thing to notice is that
the int variables are declared normally, as if they were going to be used with
C/C++. The “__volatile__” keyword is added to ensure that the compiler doesn’t
perform any optimisation that will produce erroneous code. The asm keyword can
be thought of as a pseudo-function with the following prototype:
void asm(char * strASMCode : <output list> :
<input list>);
strASMCode is a pointer to the assembly language
code string containing the ASM instructions. The input and output list are the
variables that are to be use in the ASM code block. If a variable is to be
changed then the variable is put it in the output list. If the variable is only
to be read from, then the variable is put it in the input list.
The format of an item in the input/output list is
as follows: “type”(variable). For the moment the only type of concern is “r”.
This means that the variable should be treated as an integer. Notice that in
the output list iAnswer is type “=r”, the equals means that a value is being
written into the iAnswer variable.
The variables passed in to the ASM function are
referred to as %n, where n is the order in which the variable appear the
output/input lists. iAnswer is first and is therefore %0, iX is second and is
therefore %1, and iY will be %2. This means that the line “addu %0, %1, %2” is
the same as saying “addu iAnswer, iX, iY”.
The function of the addu instruction can be found in
the EE Core Instruction Set Manual on page 27. The important information is:
Format: “ADDU rd, rs, rt”
Description: “GPR[rd] = GPR[rs] + GPR[rt]”
The format indicates that this instruction takes
three parameters The description indicates what the instruction does. GPR[xx]
means that the general purpose register that holds the value of xx is used. It
can be seen that ADDU adds together the values of rs and rt and puts the result
in rd.
Comments can be inserted into the ASM block with
the # prefix. Anything prefixed with “#” is a comment, just like “//” in C++.
Try experimenting with simple functions like this
by performing subtract, multiply and divide with two integers (with multiply
and divide it may be necessary to look into the MFLO and MFHI instructions).
There are EE Core Specific instructions which may be better than the standard
MIPS instructions and easier to use in certain circumstances, e.g. the multiply
and add instruction (MADD) is EE core specific and may not require the use of
the Hi and LO registers when performing some types of multiplication.
Notice that in the above example, none of the
registers were accessed directly. It is however possible to access registers
directly as well as with the %n syntax.
The EE core has 32 GPRs (General Purpose
Registers). These can be accessed via the pseudo variables $0 through $31. $0
is hard-coded to 0, and cannot be change even if this register is written to.
While it is possible to write to any registers it is advisable to use $8 to $15
(called t0 to t7, t for temporary register). When a register is written to, it
should be added to what is called the “clobber” list which is an additional
list located after the input list in the ASM block as shown below:
void asm(char * strASMCode : <output list> :
<input list> : <clobber list>);
The clobber list informs the compiler that the
register contents in this list may have been changed by the function – and the
compiler can take appropriate action.
If the integer add example given at the start of
this tutorial had been written in a manner which used the t0, t1, and t2
registers as temporaries, the last line would be:
": "=r"(iAnswer) :
"r"(iX), "r"(iY) : "$8", "$9",
"$10");
Note that if a register is only being read from,
(i.e. $0) there is no need to add it to the clobber list.
The Floating Point Unit (FPU)
The FPU is connected to the EE Core via a 32-bit
coprocessor connection (cop1). The FPU performs high speed floating point
calculations.
The FPU has 32 registers ($f0 to $f31). $f0 is used
as the function return value so it is best not to write to that, you should
also avoid $f12 and $f14 as they are used for argument passing and $f20 upwards
since they should be saved between function calls. All other $fX registers can
be clobbered without worry.
An example of using the FPU to divide two floating
point numbers is given below:
float fAnswer;
float fX = 1.0f;
float fY = 3.0f;
asm __volatile__("
div.s %0,
%1, %2 # fAnswer = fX / fY
": "=f"(fAnswer) :
"f"(fX), "f"(fY));
As can be seen, this example is very similar to the
previous one. In the input/output lists “r” has been replaced by “f” as
floating point numbers are now being used.
All the FPU instructions can be found in the COP1
(FPU) Instruction Set section of the EE Core Instruction Set Manual. Try using
some of these instructions. It will be noticed that some of the instructions
mention the accumulator (ACC). These instructions will be discussed in the next
section.
In an ASM program, if the result of an instruction
is used by another instruction before the first instruction has finished it’s
execution, the program will stall (instructions don’t necessarily finish before
the next one has started). The CPU can’t do anything while it is stalling which
is obviously not good for performance. Most FPU arithmetic instructions have a
latency of 4 clock cycles. Instruction latency is basically how many clock
cycles must pass until the result of the calculation is available to be used.
For a full list of latencies see page 58 of EE Core Users Manual.
This is where the accumulator comes in to play. The
accumulator always has a latency of one so there is no need to worry about
stalling the EE if two or more accumulator instructions are executed one after
the other. Most accumulator instructions end in A.S (e.g. MADDA.S, MULA.S,
etc).
In order to write an ASM function to calculate the
dot product of two 3d vectors, it might be thought that three multiplies and
two adds were required. While this would work, there are at least 5
instructions to be performed and there is also the worry of the performance hit
associated with stalls. The accumulator instructions allow the calculation to
be done in 3 instructions without the worry of stalls. This is illustrated
below.
float fV1X = 1.0f, fV1Y = 2.0f, fV1Z = 3.0f;
float fV2X = 4.0f, fV2Y = 3.0f, fV2Z = 3.5f;
asm __volatile__("
mula.s %1, %4 # ACC = fV1X * fV2X
madda.s
%2, %5 #
ACC = ACC + (fV1Y * fV2Y)
madd.s %0, %3, %6 #
fAnswer = ACC + (fV1Z * fV2Z)
": "=f"(fAnswer) :
"f"(fV1X), "f"(fV1Y), "f"(fV1Z),
"f"(fV2X), "f"(fV2Y), "f"(fV2Z));
This function should be fairly self explanatory as
the comments show what the instructions are doing.
Unfortunately, a small document like this can never
hope to give more than a brief introduction to assembly language. For those
interested in learning more here are a few resources:
For a good introduction to MIPS Assembly Language:
http://chortle.ccsu.edu/AssemblyTutorial/tutorialContents.html
A good site to find out more about EE ASM:
http://cosmos.raunvis.hi.is/~johannos/mips/mips-howto.html
For more information about stalling and the
accumulator:
http://ps2dev.org/kb/files/sparky.html
A good place to find some more complex examples of
assembly language is in the PS2Maths.cpp file that accompany the sample code
used in this tutorial series (thanks to Sparky for these routines). The good
thing about these is that at the bottom of each function is the C/C++
equivalent so it is relatively clear to determine exactly what the overall
assembly code routine is doing.
Aside from these, the manuals that accompany the
PS2 Linux kit are an invaluable resource for learning assembly language on the
PS2.
Dr Henry S Fortuna
University of Abertay
Dundee