PS2 Linux Programming
Using Vector Unit 0 In MACRO Mode
Vector Unit 0 (VU0) is coupled to the EE core as a coprocessor
(cop2) via a 128-bit data bus and also to the main 128-bit wide system bus via
a Vector Interface unit (VIF0). VU0 has two modes of operation: it can execute
micro-programs, or execute macro-instructions as a coprocessor. This tutorial
is only concerned with VU0 operating in MACRO mode.
VU0 Macro Instructions
VU0 Macro Instructions not only allow vector operations
to be performed at very high speed, they are also a good introduction to the coding
for vector units in general. The VU0 macro instructions can be found in sections
5 and 6 of the VU Users Manual.
A simple example of a VU0 macro mode function is
given below:
float fVecAns[4] ALIGN_QW;
float fVecX[4] ALIGN_QW
= {0.14f, 0.53f, 0.22f, 1.05f};
float fVecY[4] ALIGN_QW
= {0.14f, 0.53f, 0.22f, 1.05f};
asm __volatile__("
lqc2 vf1,
0(%1) # vf1 = fVecX
lqc2 vf2,
0(%2) # vf2 = fVecY
vadd.xyzw
vf3, vf1, vf2 # vf3 = vf1 + vf2
sqc2 vf3,
0(%0) #
fVecAns = vf3
": : "r" (fVecAns) , "r"
(fVecX), "r" (fVecY));
This code adds fVecX to fVecY and stores the result
in fVecAns. As can be seen, each vector is an array of 4 floats. When the
vectors are declared, ALIGN_QW is appended at the end of the declaration. This
is a macro defined in PS2Defines.h that tells the compiler to align the vector
on a quad word boundary. While it is possible to load vectors that are not
aligned on a QW boundary, it is not recommended as it is much slower (the Linux
kernel has to emulate this).
Noticed that the floating point vectors are passed not
as “f” but as “r”. This is because it is the address of the array that is being
passed and not the actual values (addresses are integers). Note too that even
though the fVecAns array is being written to, it must still go in the inputs
section not the outputs. This is because while the data changes the address of
the variable stays the same. This leaves the outputs list empty.
The vector units have their own set of registers.
There are the floating point vector registers VF00 to VF31. These registers
hold 4 32-bit floating point numbers each. Each floating point member of a VF
register is called x,y,z and w respectively. VF00 is a special register in that
its x, y, and z members are all hard-coded to 0.0 and its w component is
hard-coded to 1.0.
There are also 16, 16-bit integer registers (VI00
to VI15). It is unlikely that these registers will be used at all in macro
mode.
There are a number of other registers, the most
important being Q which contains the result of VDIV and a few other instructions.
Like the FPU the vector unit has an accumulator (ACC) that works just like the
FPU accumulator except this ACC is a 4 element floating point vector.
In the example above, before the two vectors are
added they are first upload into VU0 registers. Since VF00 is constant, the
first writeable VF register is VF1 (this can be specified as VF1 or VF01 in the
code, both are the same register). The lqc2 instruction is used to place the
vector into vf1. Look at P234 of VU Users Manual it can be seen that the format
of lqc2 is:
lqc2 ft, offset(base)
where offset is a constant and base is an EE GPR.
The offset is added to the base to produce a final address from which the data
is loaded. In the first two lines of ASM code the vectors are loaded into vf1
and vf2 respectively.
The third line of assembly is:
vadd.xyzw vf3, vf1, vf2
This adds the two vectors together and stores the
result in vf3. The instruction “vadd” is followed by “.xyzw” to indicate that all
4 elements of the vector are to be added.
Finally the sqc2 instruction works exactly like
lqc2 except it stores the data from VFxx (in this case VF3) into main memory.
Some examples of using VU0 in macro mode are
provided in the code accompanying this tutorial.
A very simple example of using VU0 in macro mode
has been presented here. This should allow the reader to experiment further
with VU0.
Dr Henry S Fortuna
University of Abertay
Dundee