Wednesday, November 26, 2008

Triangle Optimizations

I've been looking a bit at triangle optimization - first some terminology:
  • Indexed triangles means that the vertices in a mesh are referred to by index numbers.  This is the scheme OBJ8 uses.  The advantage of indexing is that if a single vertex is used by many triangles (that share a corner) you only have to include the vertex data once, and then use that data many times by index.  (The savings from indexing depend on how often vertices are shared.)
  • Triangle strips are strips of triangles sharing common edges.  Because triangles in strips share so many common vertices, they can be stored in a compact form, for a savings of almost 3x.
Back in the old days, triangle strips were critical for performance (hence the presence of strips in the OBJ2 and OBJ7 formats).  However with modern hardware, indexing is more efficient - the slight increase in data size (due to the index) isn't as expensive as the cost of specifying "we're done with one strip, start the next one".  (Consider that if we use indexed triangles, we can submit all triangles in one batch - with strips, we need one batch per strip.)  Thus OBJ8 uses indexing and doesn't provide any strip primitives.

There is one other concept to be aware of: cache utilization.  Graphics cards remember the last few vertices they processed, so if a mesh repeats a vertex shortly after using it, the graphics card can save work.  Triangle strips naturally use a cache somewhat well because vertices occur in close succession.

Strips and DSF

DSF allows for triangle strips (and triangle fans) as a space-saving measure.  Even with indexing, the indices can be compressed if strips and fans are used, and with DSF, file size was a very high priority.

When the DSF file is loaded, the data is rebuilt into indexed triangles (and reindexed - the DSF internal structures don't provide as good indexing as the DSF loader can create) - in version 803 we first started using indexed triangles and found it to be a big win.

MeshTool will generate triangle fans (as a space saving measure) - if you build a DSF by hand (using DSF2Text), use strips/fans to compress file size.

Because DSF focuses on file size, the quality of mesh output is a function of the DSF loader, which has to run while flying.  So while I can imagine some improvements in future performance, I don't expect to be able to get huge wins because the very best mesh optimizing algorithms are much too slow for real-time use.

The DSF loader already produces full indexing and preserves cache utilization from strips and fans - the next logical optimization would be to reorder non-strip, non-fan triangles for better cache use on load; the order in the DSF file may be optimized for file size and not cache utilization.

Optimizing OBJs

Where I believe there could be real improvement is in OBJ8 generation.  The OBJ loader currently loads the indexed OBJ triangles exactly as specified in the file - build a smarter file and we can get faster framerate.  There are two possible ways to win:
  • Cache utilization - by ordering vertices for cache use, we can get better throughput.
  • Hidden surface removal - by putting the exterior triangle earlier in the OBJ, we can draw them first, occluding the interior of an object, which cuts down fill rate.  (In an airplane, you would want the exterior fuselage first in the OBJ, before the seats inside, so that only the pixels visible through the window are drawn.)
This second form of optimization may be of limited utility in that an OBJ8 optimizer has to respect authoring decisions about translucency, attributes, etc.

I am investigating OBJ optimization now - my hope would be to put optimization into a new version of the ac3d exporter and ObjConverter.

Strips and the iphone

There is one place that triangle strips do matter: the iphone.  It turns out that the iphone will process triangles a lot faster if they are presented in a strip-like order.  So the iphone DSFs are the first to use triangle strips (instead of fans), and the OBJ exporter for the iphone optimizes the OBJ mesh into triangle strip order.

My tests indicate that strip order makes no difference on modern ATI and nVidia GPUs, so there is no point in releasing these optimizations in the main X-Plane tools.  In the long term, I expect our OBJ tools will have two optimization paths - a strip-based path for the iphone and a cache utilization-based path for the desktop.

2 comments:

Anonymous said...

Hi Ben, What a coincidence! I was just about to drop you an email with question about triangles in base mesh.
My tool G2xpl after converting original base mesh outputs only triangles (skips fans and strips). Each primitive (between BEGIN_PRIMITIVE and END_PRIMITIVE) contains at most 3 triangles (someone mentioned that there is some limit of number of triangles in 1 primitive, not sure if this is correct and what's the meaning of primitive is).
You just blogged that strips don't matter so much on modern desktop machines... Are there any other means to optimize my output mesh for better performance or I should leave it all to be optimized by DSF loader?
Thanks/Rob

Benjamin Supnik said...

Hi Rob,

The limit is 255 vertices per primitive, so you could have 85 tris per primitive. This only matters for DSF size though.

But if you do have easy access for strips, it's not bad to use them; while the strips themselves don't matter (they are turned to indexed tris) as of now, the only way to control vertex order is by using strips/fans. (The vertices are ordered by the strip in this case.)

So my suspicion is that your cache coherency will be better with strips than individual tris (which get sorted by, well, lord knows what).