Introduction

Vector Unit 0 (VU0) is coupled to the EE core as a coprocessor (cop2) via a 128-bit data bus and also to the main 128-bit wide system bus via a Vector Interface unit (VIF0). VU0 has two modes of operation: it can execute micro-programs, or execute macro-instructions as a coprocessor. This tutorial is only concerned with VU0 operating in MACRO mode.

VU0 Macro Instructions

VU0 Macro Instructions not only allow vector operations to be performed at very high speed, they are also a good introduction to the coding for vector units in general. The VU0 macro instructions can be found in sections 5 and 6 of the VU Users Manual.

A simple example of a VU0 macro mode function is given below:

float fVecAns[4] ALIGN_QW;

float fVecX[4] ALIGN_QW = {0.14f, 0.53f, 0.22f, 1.05f};

float fVecY[4] ALIGN_QW = {0.14f, 0.53f, 0.22f, 1.05f};

asm __volatile__("

lqc2 vf1, 0(%1) # vf1 = fVecX

lqc2 vf2, 0(%2) # vf2 = fVecY

vadd.xyzw vf3, vf1, vf2 # vf3 = vf1 + vf2

sqc2 vf3, 0(%0) # fVecAns = vf3

": : "r" (fVecAns) , "r" (fVecX), "r" (fVecY));

This code adds fVecX to fVecY and stores the result in fVecAns. As can be seen, each vector is an array of 4 floats. When the vectors are declared, ALIGN_QW is appended at the end of the declaration. This is a macro defined in PS2Defines.h that tells the compiler to align the vector on a quad word boundary. While it is possible to load vectors that are not aligned on a QW boundary, it is not recommended as it is much slower (the Linux kernel has to emulate this).

Noticed that the floating point vectors are passed not as “f” but as “r”. This is because it is the address of the array that is being passed and not the actual values (addresses are integers). Note too that even though the fVecAns array is being written to, it must still go in the inputs section not the outputs. This is because while the data changes the address of the variable stays the same. This leaves the outputs list empty.

The vector units have their own set of registers. There are the floating point vector registers VF00 to VF31. These registers hold 4 32-bit floating point numbers each. Each floating point member of a VF register is called x,y,z and w respectively. VF00 is a special register in that its x, y, and z members are all hard-coded to 0.0 and its w component is hard-coded to 1.0.

There are also 16, 16-bit integer registers (VI00 to VI15). It is unlikely that these registers will be used at all in macro mode.

There are a number of other registers, the most important being Q which contains the result of VDIV and a few other instructions. Like the FPU the vector unit has an accumulator (ACC) that works just like the FPU accumulator except this ACC is a 4 element floating point vector.

In the example above, before the two vectors are added they are first upload into VU0 registers. Since VF00 is constant, the first writeable VF register is VF1 (this can be specified as VF1 or VF01 in the code, both are the same register). The lqc2 instruction is used to place the vector into vf1. Look at P234 of VU Users Manual it can be seen that the format of lqc2 is:

lqc2 ft, offset(base)

where offset is a constant and base is an EE GPR. The offset is added to the base to produce a final address from which the data is loaded. In the first two lines of ASM code the vectors are loaded into vf1 and vf2 respectively.

The third line of assembly is:

vadd.xyzw vf3, vf1, vf2

This adds the two vectors together and stores the result in vf3. The instruction “vadd” is followed by “.xyzw” to indicate that all 4 elements of the vector are to be added.

Finally the sqc2 instruction works exactly like lqc2 except it stores the data from VFxx (in this case VF3) into main memory.

Some examples of using VU0 in macro mode are provided in the code accompanying this tutorial.

Conclusions

A very simple example of using VU0 in macro mode has been presented here. This should allow the reader to experiment further with VU0.

Dr Henry S Fortuna

University of Abertay Dundee

h.s.fortuna@abertay.ac.uk