Monday, May 12, 2008

Multi-Core Texture Loading

In a previous post I discussed the basic ideas behind using multiple threads in an application to get better performance out of a multi-core machine.

Now before I begin, I need to disclaim some things, because I get very nervous posting anything involving hardware. This blog is me running my mouth, not buying advice; if you are the kind of person who would be grumpy if you bought a $3000 PC and found that it wouldn't let you do X with X-Plane (where X includes run at a certain rendering setting, framerate, or make your laundry) my advice is very simple: don't spend $3000. So...
  • I do not advocate buying the biggest fastest system you can get; you pay a huge premium to be at the top of the hardware curve, particular for game-oriented technologies like fast-clock CPUs and high-end GPUs.
  • I do not advocate buying the Mac Pro with your own money; it's too expensive. I have one because my work pays for it.
  • 8 cores are not necessary to enjoy X-Plane. See above about paying a lot of money for that last bit of performance.
Okay...now that I have enough crud posted to be able to say "I told you so"...

My goal in reworking the threading system inside X-Plane for 920 (or whatever the next major patch is called) is, among other things, to get X-Plane's work to span across as many cores as you have, rather than across as many tasks are going on. (See my previous post for more on this.)

Today I got just one bit of the code doing this: the texture loader. The texture loader' job is to load textures from the hard drive to the video card (using the CPU, via main memory) while you fly. In X-Plane 901 it will use up to one core to do this, that core also being shared with building forests and airports.

With the new code, it will load as many textures at a time as it can, using as many cores as you have. I tested this on RealScenery's Seatle-Tacoma custom scenery package - the package is an ENV with about 1.5 GB of custom PNGs, covering about half of the ENV tile with non-repeating orthophotos.

On my Mac Pro, 901 will switch to KSEA from LOWI in about one minute - the vast majority of the time is spent loading about 500 PNG files. The CPU monitor shows one core maxed out. With the new code, the load takes fourteen seconds, with all eight cores maxed out.

(This also means that the time from when the scenery shifts to when the new scenery has its textures loaded would be about fourteen seconds, rather than a minute, which means very fast flight is unlikely to get to the new area before the textures are loaded and see a big sea of gray.)

Things to note:
  • Even if we don't expect everyone to have eight cores, knowing that the code can run on a lot of cores proves the design - the more the code can "spread out" over a lot of cores, the more likely the sim will use all hardware available.
  • Even if you only have two or four cores, there's a win here.
  • Texture load time is only a factor for certain types of scenery; we'll need to keep doing this type of work in a number of cases.
This change is the first case where X-Plane will actually spread out to eight cores for a noticeable performance gain. Of course the long-term trend will be more efficient use of multi-core hardware in more cases.

6 comments:

Anonymous said...

Hello,

I'm happy to hear about multithreading and using all of our HW, it's very good news!. But I don't see the way it can improve loading times for 2 core machines like mine. I supposed the texture loader was already loading textures using the second core. But now it makes sense buying a 4 core machine for using tools like g2xpl.

Next big thing 64 bit? :)

Thank you for the great work. You guys are getting flying simulation a big step forward compared to the rest.

Regards!
Quino

Anonymous said...

Great news Ben! I was looking forward to this.
What about "dataref-driven texture selection" feature? Any chances it could be supported for ver 920? Or it is not so trivial enhancement?
Robert

Benjamin Supnik said...

Dual-core will speed up texture preload by a factor of 2. The reason is that while we still use your second core while you fly (the first runs the flight model and rendering), during pre-load of scenery (with the blue loading screen) the second core loaded textures and the first one was idle. The new code will recycle the core dedicated to the main flight loop when in a preload situation, benefiting even dual-core users.

Dataref-driven textures: not sure yet. A lot of the infrastructure work for the feature is now in place but I'm not sure which release it might make it into.

Anonymous said...

Ben this it great to hear! Really great! Instead of that 2x, 4x speed, we get Nx the speed - exacly the way it should be! So yeah, I'm like so glad to hear this, and as the previous commenter said, I hope next step is 64-bit. My 8gb of ram are ready for some fun =)

Anonymous said...

For fun and giggles, I gotta wonder how the Linux version would handle a PS3 Cell processor. Can the SPUs even be used for file streams?

Benjamin Supnik said...

64-bit is still not terribly high on my priority list - see the linux forum on x-plane.org, it's been discussed thoroughly.

Re Cell, I don't know. It would work about as well as the Cell can proxy for a general purpose CPU, which is to say, at best the Cell's immense powers would be totally wasted. See John Carmack's comments on google videos about the need for generally fast execution of complex and ugly C code, and the limitations of DSP-type processors to make existing problems faster. What he says about fps shooters and C code applies to X-Plane as well in a lot of cases.

(Basically everything that's easy to make DSP-friendly has already been outsourced to the GPU, the biggest baddest DSP of them all. This makes it hard to find a good use for any kind of additional DSP in the system, be it physics, sound, or a CPU with a lot of DSP-like goo like SSE or MMX or a Cell. The remaining non-graphics problems just don't need that capability. What they need to do is process ugly conditional linear code really fast.)