Using the DMAC in Games Programming
Introduction
This article will describe how the various Direct Memory Access Control Tags (DMATags) can be used to help manage the transfer of model and texture data through the graphics pipeline of the PlayStation2 in a typical Computer Game application.
The internal structure and main data paths within the PS2 are shown in figure 1. The DMAC is responsible for transferring data between main memory and each of the independent processors and between main memory and scratchpad RAM.
Figure 1
During the execution of typical game code, the DMAC is responsible for transferring vertex data and transformation/lighting matrices to Vector Unit 1 (VU1), and image data for primitive texturing to the Graphics Synthesiser (GS). In order to maintain an effective frame rate it is important that as much of this data as possible is pre-compiled and efficiently organised prior to run time. Such organisation frees up the main processor from this mundane task and allows it to perform other important game related functions such as AI and game logic during game execution.
Image data used for texturing is normally sent to the GS via path 2 or 3. Path 3 is a direct path to the GIF whilst Path 2 is through VIF1 to the GIF. There are a few additional overheads associated with sending data via Path 2 but Path 2 has the advantage of providing inherent synchronisation between texture and vertex data.
Typical
image data may be many KiloBytes in size and generally larger than the 4 kByte
memory block allocation size provided under SPS2. It is therefore necessary to
split the image data into 4 kByte blocks and stitch these blocks together with
appropriate DMATags. As discussed above, such organisation of the texture data
should be undertaken prior to run time. Achieving this with memory stitching is
outlined below.
Memory Stitching.
The
process of pre-compiling image data will be demonstrated using two different
methods of memory stitching. The first method uses cnt and next
tags and the second uses ref tags.
Organising data with cnt and next tags is illustrated in Figure 2. A cnt tag with it’s qword count field (QWC) set to 254 is inserted at the start of each full 4k block. The value in the address field (ADDR) is not used with cnt tags and can be cleared to zero. The cnt tag instructs the DMAC to transfer QWC of data following the tag, and read the quad word after that data as the next DMATag, which in this case is a next tag. The purpose of the next tag is to direct the DMAC to the start of the next 4k block to be transferred. This is achieved by setting the ADDR field of the next tag to point to address A1 (which is the start of the next 4k Block) and the QWC field of the tag to zero to indicate that no data is to be transferred with this tag. The DMAC therefore reads the cnt tag at address A1 as the next instruction and this process repeats until the last block is reached. The QWC of the cnt tag in the last block is set to the amount of data to be transferred and the transfer process is ended by inserting an appropriately configured end tag after the final data section.
Figure 2
It is interesting to note that the final end tag could be replaced with a ret tag if the data packet is part of a call chain, but this will be described in more detail later in this article.
Organisation
of data with ref tags is illustrated in figure 3. In this case, the 4k
block contain only the data to be transferred and there are no embedded DMATags
within the data. A separate area of memory is required to build the DMAC
command chain which is constructed using ref tags and ended with a refe
tag. The tag at address A3 is the first to be read and this instructs the DMAC
to transfer the 4k block starting at address A0 then read the tag after the one
at A3 as the next tag. This process continues until the final refe tag
is reached, this transferring the final section of data then ending the
transfer. In this case, if the DMA chain is part of a call chain the final refe
tag can be replaced by an appropriately configured ret tag.
Figure 3
There
are relative advantages to both of these methods of memory stitching. The use
of cnt and next tags requires only one area of memory to be
configured, whist the use of ref tags requires two areas of memory but
only about half the number of tags.
Each
of the DMAC channels to VIF0, VIF1 and the GIF contain tag address save
registers which can be used to facilitate the creation of data subroutines.
Data subroutines are similar to normal program subroutines in that once called,
the subroutine performs it’s function then returns control back to the main
line of execution.
An
example of a call chain is illustrated in figure 4. The data section at the
right of the figure is stitched together into as large a packet as required and
is ended with a return (ret) tag. The organisation of the data into this
format would be undertaken prior to run time. The transfer is initiated when
the DMAC reads the first call tag from the start of the call chain shown
on the left hand side of figure 4.
On
reading the first call tag from the call chain, the DMAC pushes the
following qword (which in this case is the next call tag) onto the call
stack and reads the qword pointed to by the ADDR field in the call tag as the
next tag. This action is carried out since the qword count (QWC) field of the call
tag is set to zero. DMAC control then passes to the first cnt tag in the
data section which is the first qword of the stitched data to be transferred.
When the DMAC reads the ret tag at the end of the data, it transfers the
number of qwords following this tag (which in this case is zero) then reads the
qword popped from the call stack as the next tag. The next tag will thus be the
second call tag in the call chain. This process repeats until the final end
tag is reached in the call chain and the transfer is ended.
Now
that the process of creating and transferring pre-compiled data chains has been
describes, the use of such techniques in the writing of games programs will be
discussed.
Consider
the situation of a game consisting of several animated 3D models which must be
sent down the graphics pipeline for rendering. It is advisable in such
situations to cull as many objects as possible from the pipeline as early as
possible within the pipeline thus saving valuable processing time. A simple,
first approximation method might be to generate bounding spheres round each
model and test each sphere against the view frustum. Models inside or partly
inside the frustum will require further processing whilst models fully outside
the frustum can be culled. Consider therefore the pseudo-code shown in figure
5:
Main
chain:
Test
visibility of model1;
if
visible (CALL pointing to Subchain1);
Test
visibility of model2;
if
visible (CALL pointing to Subchain2);
END
Subchain1:
REF
pointing to model1 texture;
REF
pointing to model1 matrix data;
REF
pointing to model1 vertex data;
RET
Subchain2:
REF
pointing to model2 texture;
REF
pointing to model2 matrix data;
REF
pointing to model2 vertex data;
RET
In
the main chain, the visibility of each model is checked and the appropriate sub
chain is only called if the model is visible, thus requiring further
processing.
Another
use of call chains in games programming is in the rendering of animated models
in either 2 or 3 dimensions. Consider that the data for an animated model is
precompiled and organised in the manner shown in figure 6.
All
of the data necessary to render any animation frame for the model is contained
within the model data section. Various call chains are configured within the
call chain section to call the appropriate model data needed for a specific
animation frame. For example, the call chain for frame 0 may call the model
data sections 0, 1, 2, 7, 9 and 12; the call chain for frame 6 may call the
model data sections 0, 1, 5, 7, 10 and 11. Given that the data is pre-compiled
into the correct format, it is thus possible to quickly render a specific
animation frame for a model at run time with minimal processing overheads.
This
article has illustrated the use of DMATags for the organisation of precompiled
data within a computer game application. Pre-compiling and efficiently
organising data prior to run time is essential in order to achieve effective
application performance.
Much
of the information presented here has been gleaned from various post on the
Playstation2-linux.com developer forum. The author is grateful to the many
contributors to this forum.
Lecturer in Computer Games
Technology
University of Abertay
Dundee
Scotland UK
30 March 2004