Friday, April 25, 2008

Threads and Cores

Now that multi-core machines are mainstream, you'll hear a lot of talk about "threads". What is a thread, and how does it relate to using more cores?

Definitions

A "core" is an execution unit in a CPU capable of doing one thing. So an 8-core machine might have two CPUs, each with four cores, and it can do eight tasks at once.

A "thread" is a single stream of work inside an application - every application has at least one thread. Basically a two-threaded application can do two things at once (think of driving and talking on your cellular phone at the same time).

Now here's the key: only one core can run a thread at one time. In other words, if you have an eight core machine and a one-thread application, only one core can run that application, and the other seven cores have to find something else to do (like run other applications, or do nothing).

Two more notes: a thread can be "blocked" - this means it's waiting for something to happen. Blocked threads don't use a core and don't do anything. For example, if a thread asks for a file from disk, it will "block" until the disk drive comes up with the data. (By CPU standards, disk drives are slower than snails, so the CPU takes a nap while it waits.)

So if you want to use eight cores, it's not enough to have eight threads - you have to have eight unblocked threads!

If there are more unblocked threads than cores, the operating system makes them take turns, and the effect is for each of them to run slower. So if we have an application with eight unblocked threads and one core, it will still run, but at one eighth the speed of an eight core machine.

It's not quite that simple, there are overheads that come into play. But for practical purposes we can say:
  • If you have more unblocked threads than cores, the execution speed of those threads slows down.
  • If you have more cores than unblocked threads, some of those cores are doing nothing.
Trivial Threads

When a thread is blocked, it does not use any cores. So while X-Plane has a lot of threads, most of them are blocked either most or all of the time. For all practical purposes we don't need to count them when asking "how many cores do we use". For example, G1000 support is done on a thread so that we keep talking to the G1000 even if the sim is loading scenery. But the G1000 thread spends about 99.9% of its time blocked (waiting for the next time it needs to talk) and only 0.1% actually communicating.

What Threads Are Floating Around

So with those definitions, what threads are floating around X-Plane? Here's a short list from throwing the debugger on X-Plane 9.0. (Some threads may be missing because they are created as needed.
  • X-Plane's "main" thread which does the flight model, drawing, and user interface processing.
  • A thread that can be used to debug OpenGL (made by the video driver, it blocks all the time).
  • Two "worker" threads that can do any task that X-Plane wants to "farm out" to other cores. (Remember, if we want to use more cores, we need to use more threads.)
  • The DSF tile loader (blocks most of the time, loads DSF tiles while you fly).
  • At least 3 threads made by the audio driver (they all block most of the time).
  • At least four threads made by the user operating system's user interface dode (they block most of the time).
  • The G1000 worker thread (blocks most of the time, or all the time if you don't have the G1000 support option).
  • The QuickTime thread (only exists when QuickTime recording is going on).
So if there's anything to take away from this it is: X-Plane has a lot of threads, but most of them block most of the time.

Core Use Now


So how many cores can we use at once? We only need to look at threads that aren't blocked to add it up. In the worst flying case I can think of:
  1. The main thread is rendering while
  2. The DSF tile loader is loading a just-loaded tile while
  3. One of the pool threads is building forests while
  4. You are recording a QuickTime movie (so the QT thread is compressing data).
Yep. If you really, really put your mind to it, you can use four cores at once. :-) Of course, two cores is a lot more common (DSF tile loading or forests, but not both at once, and no QuickTime.

Core Use In the Future

Right now some of X-Plane's threads are "task" oriented (e.g. this thread only loads DSF tiles), while others can do any work that comes up (the "pool threads", it's like having a pool car at the company, anyone can take one as needed). The problem with this scheme is that sometimes there will be too many threads and sometimes too few.
  • If you have a dual-core machine, having forests building and DSF loading at the same time is bad - with the main thread that's three threads, two cores; each one runs at two-thirds speed. But you don't want the main thread to slow down by 66%, that's a fps hit.
  • If you have a four-core machine, then when the DSF tile is not loading, you have cores being wasted.
Our future design will allow any task to be performed on a "pool thread". The advantage of this is that we'll execute as many tasks as we have cores. So if you have a dual-core machine, when a DSF tile load task comes along while there is forests being done, the one pool thread will alternate tasks, leaving one core to do nothing but render (at max fps). If you have a four-core machine, the DSF load and forests can run at the same time (on two pool threads) and you'll have faster load times.*

* Who cares about load time if you're flying? Well, if you crank up the settings a lot and fly really fast, the loader can get behind, and you'll see trees missing. X-Plane is always building trees in front of you as you fly and deleting the ones behind you. So using more cores to build the forests faster means you're less likely to fly right out of the forest zone at high settings.

No comments: