Tuesday, June 09, 2009

Multi-Threading Is a Weird Feature Request

Over and over, whether it is a feature request list for X-Plane or another simulator, I see the same thing: "multi-core support" or "multi-threading" as a feature request.

Now before I continue, I must remind everyone: X-Plane is already multi-threaded and will take advantage of multi-core hardware. How much we use those cores depends on the type of scenery loaded.

The problem is that multi-threading (as a way to use multi-core hardware) is a solution technique, not a problem statement.  What is threading going to be used for?  If I simply program the other 7 cores of your computer to calculate PI to 223,924 digits have I met the feature request?  This probably isn't what anyone wants.

Implicit in the request for multi-core is (I speculate) a request for better frame-rate.  (I did see one user who wanted multi-core to be used for a more accurate flight model.  This strikes me as a poor trade-off for hardware based on my understanding of the flightmodel - we would use a lot of hardware for only a marginal accuracy improvement - but I commend the user for stating the problem and not just a possible solution.) But is multi-threading the best way to get framerate?

If I had two patches to X-Plane, one that doubled fps by using two cores and one that doubled fps by using more efficient code, which would be better?  To me the obvious answer is: the code that is more efficient.  It will run on any hardware (not just multi-core) and if you have multi-core hardware, we still have that second core free for some other functionality.

So to me the feature request should be something like: "higher framerate - and yes I have multi-core hardware".  Or perhaps "more visual detail at the same framerate - and yes I have multi-core hardware".

All feature requests need to be in terms of problem statements, not possible solutions.  This lets us find the set of problems that can be solved together in a coherent manner, and it lets us pick a solution that meets our engineering goals.

21 comments:

sothis said...

At this point I want to point out that OpenGL allows swapping buffers only in the thread where the drawing context was created. This means that you can draw from another thread, but the 'main'-thread has to wait until all drawing is done before it can swap buffers and as such display the new frame on the screen. So regarding the raw drawing code multithreading capabilities buy you absolutely nothing. You can't trick the physics here, you can't double the performance of your graphic adapter by having 8 cores in the main cpu :-).

What you can do however is to reduce the time necessary for processing all input data needed by the drawing subsystem to lower the delay until the main thread has the chance to swap buffers again. But this of course only makes sense _if_ there is any relevant data to be processed. If you sit in your plane at an airport and watch paint dry, there is no scenery which has to be loaded and no flight physics to be calculated, stuff which consumes CPU cycles and can be accelerated by using multiple cores. At this point probably 7 of your 8 cores do nothing and there's nothing a programmer can do to raise the FPS from 60 to 140. Then it's just the grafic hardware and it's drivers setting the bar of what can be done and what not :-)

Benjamin Supnik said...

Janos is spot on. In particular, there are two ways to try to lower "latency to actually render":

1. Pipeline the actual rendering loop. I tried this. It doesn't really help. Most of the CPU to render is driver state change, that can't be pipelined; culling isn't worth off-loading. Remember that a pipelined renderer introduces overall costs in synchronization, thread-safe GL code, etc. so it would have to be a big win to overcome its own drag.

2. Async build-up of 3-d meshes - this is exactly what we do! Buidling those taxiway signs, taxiway lines, draped pavement, that's all multi-core already!

Anonymous said...

What about the flight models of the 19 Ai craft flying around you as you watch paint dry?

I distinctly remember the hit that 8.40? brought to the sim when Ai craft gained flight models...

Wouldn't it make sense to off load those to another core? Maybe they are already?

I'm also keen to hear what kind of load scenery animations (cement trucks) make on the cpu?

Keltek said...

Hello,
I don't know the computing structure of X-Plane, but what about to use of GPU for computing (CUDA)? The idea to use multicore (multithread) architecture for multiplayer is not so bad (if I run XSqwakBox, the fps goes rapidly down).

Benjamin Supnik said...

The FM could be run in parallel per plane when there are many planes. Running the FM against other parts of the sim creates a locking scenario that would probably be worse than the benefits. CUDA is not useful for the types of tasks X-plane has to compute.

Dirk Hohndel said...

Higher framerate? I don't think that's really a multi-core issue. Well, actually it is - your graphics card is TRULY multi threaded. High end cards today have hundreds of parallel execution units.

What I want X-Plane to do with the eight cores on my system is
1) render what I see
2) calculate the flight model
3) provide decent ATC (SERIOUSLY - this is the biggest disappointment switching from MSFS)
4) provide real life weather / clouds (that's separate from rendering!)
5) use the remaining cores to provide true AI for the scenery and most importantly the other airplanes - have them participate in ATC, etc

Benjamin Supnik said...

Dirk, this brings me exactly back to my original point with this post:

"5) use the remaining cores to provide true AI for the scenery and most importantly the other airplanes - have them participate in ATC, etc"

Why do you care which cores are used for which features, or that any particular multi-core strategy is employed?

Users never mention that they would like us to use octrees or bsp-trees or quad trees...why are cores different?

Daveduck said...

Ben, it strikes me that you're basically griping that people aren't using the right vocabulary.

For the 99% of XP users who don't happen to know or care about specialized programming issues, FPS is, definitely, *the* issue. Since we've been fed a steady diet of marketing B.S. that equates more processors with faster computer performance, then obviously you're going to get a lot of queries related to whether XP takes full advantage of the hardware it runs on. That's a fair question, even if you don't like how it's phrased.

Agreed with previous responses: AI aircraft cripple my FPS and I keep them turned off as a result; and ATC has so long been a joke that improving it--even at the expense of some FPS--would be most welcome.

Finally, isn't the day fast approaching that the vast majority of your users will be using multicore machines, and that you can program accordingly, perhaps even make multicores a minimum requirement?

Benjamin Supnik said...

Daveduck - that's exactly it. I am griping about vocabulary. In particular, that failure of vocabulary causes people to tell me how they want a problem solved and not what the problem is...both in areas of hardware use and SDK design.

I am not surprised that this happens, given the amount of marketing buzz that goes into pushing next-gen hardware. But that's exactly why I am trying to call attention to it.

Wanting the hardware to be fully utilized makes sense, but...is it possible to fully utilize the hardware? Maybe, maybe not. In the case of ATC, as listed below, I assure you the problem with our ATC system is _not_ CPU usage...it's developer time.

Jim Royal said...

Why do you care which cores are used for which features, or that any particular multi-core strategy is employed?

Users never mention that they would like us to use octrees or bsp-trees or quad trees...why are cores different?


The answer is that users can simply look at their Activity Monitor/Task Manager and see when their computer is not being used to its fullest potential. It's visceral. When I first ran X-Plane on my quad-core Mac Pro, I admit I was surprised to see that most of the computer's capacity was left untapped.

I do have apps that use all four cores in my machine, and those apps are blindingly fast. In comparison, I see very little difference between XPv8 running on my old G4 and XPv9 running on my Mac Pro. I know that the Mac Pro is eight to ten times faster -- but not when I run X-Plane.

I accept that some mathematical problems simply cannot be computed in parallel. And if you say this is the case with XP, I will accept that. But I don't think it surprising or wrongheaded at all for people to think something is wrong when a computationally-intensive program is using only a quarter of their CPU capacity.

Dirk Hohndel said...

Benjamin,

the reason why I care about cores is because fundamentally single core performance is not going to increase as much in the next five years as it did in the last five (which was less than the five before then). I happen to work for a CPU maker and have a pretty good idea that the majority of overall CPU performance gain will come from utilizing multiple cores.

So you are of course welcome to fly all the AI planes on the same core that you do everything else. And to do ATC on that same core as well (but please, remember, today X-Plane's ATC is complete and utter crap and its AI planes are an utter embarrassment).

The thing that you don't seem to realize is that you will max out on what you can do "per core". But this year eight cores are standard on a workstation style system, next year it will be 16 cores and 32 threads. So having X-Plane use (to be very generous) 2 of these cores to do all its CPU calculation is just silly.

You are de-coupling yourself from the direction in which computers are going to get faster.

So you are right - I don't care which data structures you use (between bsp- and quad-trees I think I'm partial to bsp-trees, but that depends on your concrete use case, I guess); I SHOULDN'T have to care about multi core support. But since the X-Plane developers apparently do care about data structures and don't care or don't understand the advantages of efficient use of multiple cores I feel that I need to speak up and request more attention to the issue.

Unless of course your goal is to keep the status quo - good flight model, decent graphics, crap ATC, crap AI, mediocre weather, etc. In that case, go ahead, ignore all of us and continue to treat a 2009 machine as if it was built in 2001.

Benjamin Supnik said...

"The thing that you don't seem to realize is that you will max out on what you can do "per core"."

Dirk, you have either misunderstood my writing, or I wrote something compeletely different from what I meant to write.

My point is NOT that X-Plane should avoid parallelization, avoid multi-core, or assume that single cores will get faster. That would be silly!

My point IS that the decision as to when and how to apply multi-core technology cannot be made without understanding the internal data structures and algorithms of the product, and thus it is not something that a casual user can meaningfully comment on.

(I have received emails from other game developers discussing techniques for parallel rendeirng, and those have been fruitful discussions. But this blog post is in response to seeing multi-core lumped up with user-level features.)

In other words, without the code, you're just shooting blind...some suggestions will turn out to be good for multi-core, some will turn out to not be good (and sometimes for obvious reasons, sometimes for surprising reasons).

The cause of my rant is that I _agree_ with you. Multi-core is where hw is going. Thus suggesting we use it is like suggesting that we use RAM or a GPU...we consider the full hardware capabilities in EVERY feature we do.

Please do not mistake my side point (a better algorithm is preferable to using a second core) with a statement that we should not fully use the hardware. I believe efficiency improvements are better than using more hardware because it's a solution that is wide spread and saves the hardware for yet another improvement. But I am not trying to avoid using the hardware.

The actual reality of X-Plane is just about opposite what you describe in your comment...a steady increase in the number of cores incrementally with each new patch, and steady improvements in the decoupling of those threads for better multi-threaded usage.

If we don't use as many cores as you would like us to, I can only argue that making X-Plane multi-core isn't easy or obvious. Threaded programming is not simple programming, and it's made worse by the state of the OpenGL API (WRT threading). But the trend in X-Plane's implementation over the last few years is: use more cores, use them more efficiently.

(By efficiently here I mean: the code that can run on other cores needs to run independently so we don't idle the cores more than necessary when there is pending work to be done.)

Right now we use multiple cores for:
- Loading new DSF tiles.
- Changing the coordinate system.*
- Building 3-d scenery from the DSF (this includes airports, forests, roads, facades, planting objects, etc.).
- Loading textures.*
- Recording QuickTime movies.
- Supporting G1000 hardware in the pro versions.
* These tasks are not bound to a finite number of cores.

That list isn't nearly as complete as it could be...and most of the tasks should someday be in the * category and aren't. But with ever patch the list gets a bit better.

Dirk Hohndel said...

Benjamin,

not to belabor the point - you don't know who I am and what I do for a living. I understand the issues of multi threaded programming (and getting the best possible performance out of multi core systems) better than you might assume.

I am happy to hear that the X-Plane developer team is striving to take better advantage of the available cores. I'll admit that looking at both the 9.30 betas and 9.22 I am rather disappointment with the success of those attempts so far. My suggestions were based on my understanding of which tasks can be implemented independently from the core of the software (yes, it's really hard to use multiple threads for the rendering - see my first comment). So putting things that are inherently independent (like the ATC or the weather engine or AI planes) on separate cores to relieve the main thread and at the same time increase the realism of the simulator seemed like a good idea.

If there are reasons in the X-Plane design that make that impossible then I think there might be something odd going on here that needs to be addressed, eventually.

Anyway, I love the software, I wish it would do some things better or differently, but I'm in no position to comment on the code, not having seen it (there's a reason why I'm mostly an open source person...)

Dirk Hohndel said...

Oh, and I forgot to agree with you.

Yes, "multi threading" in itself is indeed a weird request in that it talks not about a feature but a programming technique.

Anonymous said...

Hi ben!

Interesting discussion! I just have to pin my word down as I fell there are a -huge- amount of users who feel just the way I feel. That is, that I feel you are a bit too concerned about the fact that XP has to be possible to run on a 2001 year machine. This really halts the development although you could add options to turn this and that off. You see, the whole mentality around the programming gets influenced and in the end one makes the same conclusion as before; to not make -too- revolutionizing features etc. This is really what XP users are expecting nowadays. I don't think that many are running XP on 800 mhz PIII's with 256 mb ram. And if they are, well, how fun is it to run the sim with basically -everything- turned off or at "low"? That would just make XP look like 1997 games.

This may have been confusing to read but in the end I just want to encourage you to take some new fresh initiatives and make the sim at least require HW that is from say '05. Sim enthusiasts as myself are running with good HW already. Speaking of myself I am running the sim with a i7 920, 6 gb ram and a 285 GF card. My friends are running XP with at least dual cores and other hardware that is like maximum 2 years old.

Anyway, thanks for the development and good job you are putting into XP! It's really appreciated!

Benjamin Supnik said...

hi Dirk,

That is true - I do not know your qualifications - you could be an expert in the field. I only know that you don't have access to X-plane's source...so you may be very qualified to solve X-Plane's problems, but may not have a lot of situational awareness as to X-Plane's internal development road map.

(Of course, this isn't actually true...we did have one user who is also a developer run Shark on the shipping binary and correctly point out that we had let memory allocation code run in the main flight loop. His analysis was completely correct despite not having source access!)

Anon: I do not agree with what you have posted here, but I will save it for another blog post because it is both a complex and somewhat unrelated discussion. (That is, the lack of multi-core in x-plane isn't about wanting to run on on e core, it's about not having had time so far to FINISH the threading of the AI planes* and modern GPUs not playing with threaded renderers very well.) I do agree that we can't be too concerned about old hardware - I will post tomorrow why we can't actually get much leverage from dropping hardware.

* We have already done some work to thread the AI planes - it was in either 863 or 900. But since it isn't 100% done, you get zero threading benefit. :-(

Dirk Hohndel said...

Benjamin,

I have a couple of questions / suggestions that are not fit for the comments here - could you drop me an email to continue this, please? First at Last dot org

Jimmi said...

I can certainly understand why multi-threading is a much requested "feature".

People who just spend lots of money on a quad-core CPU don't want to see CPU usage at just 25% while the sim struggles to maintain a flyable framerate. Even if it only resulted in a 20% performance boost, seeing close to 100% CPU usage would still be reassuring. At least you know your computer is "trying" to maintain a flyable framerate - the remaining three cores aren't just twiddling their proverbial thumbs.

Single-threaded performance won't improve much over the coming years. The future is 8-core, hyperthreading CPUs (16 threads) even on the desktop. Therefore, performance improvements in X-Plane will eventually stagnate. There's only so much you can do in terms of optimizing the code. Why not do both? Write highly optimized, multi threaded code. One doesn't have to exclude the other.

Benjamin Supnik said...

Jimmi: "why not do both?" If we have a finite amount of developer time, it has to be prioritized. We can't really just write twice as much code for a given, finite, maxed out dev schedule. Something has to give. It's not a question of exclusion, rather prioritization.

As mentioned before, improving pure fps with multiple cores is nearly impossible for architectural reasons.

Jimmi said...

Impossible with the current X-Plane engine maybe, but not impossible in general.

http://www.tomshardware.com/reviews/multi-core-cpu,2280-10.html

All games show a significant (near 50%) jump from single to dual core and many show a noticeable gain going from two to three cores.

The Unreal3 engine (and games based on it) also shows a significant improvement even when going from three to four cores. As do many other modern engines.

You just can't keep adding more and more work for a single core while the performance of each core in future 6 and 8-core CPUs will remain roughly the same as today (clearly evident in the public roadmaps of both Intel and AMD). That's like driving at 150mph straight towards a concrete wall.

Benjamin Supnik said...

I do have to point out that if you were to do those benchmarks with X-Plane you'd find almost the same thing: huge improvement with the second core, a bit more with the third, not much benefit to number 4. But it depends a LOT on what you're doing.

- Put in a big texture-paged photoscenery pack...more cores the merrier.

- Crank up forests to tree hugger and airport detail to "insane"...first vs. second core becomes a big deal.

- Watch fps as you go over the DSF boundary...the third core becomes a big deal.

- Depending on your vid driver that 4th one might be nice too.

My point is this: if an application is bound by its pure ability to talk to the video card sequentially, a core isn't going to help. And X-Plane is getting close to that.

If those other games benefit from a second core, we can only conclude that when running in single-core mode, non-render-loop work is being done on the first core. This is exactly the case for X-Plane.

My original point was that there isn't much low hanging fruit on the main thread. Dirk is absolutely correct that parallelizing multiple physics simulations is a win - he's right, and it's something we've already started the architectural work on (a while ago).

But if you look at X-Plane's main loop, it's mostly sequential drawing, with a little bit of flight model (unless you have the AI planes on).

The big win in future multi-core will be using those multiple cores to further pre-process the rendering load to maximize the job the first core can do in splatting out triangles. This is already what we do for forests. (Compare v8 to v9 forests to see!)

Those who know how to use "Shark" might note that the 3-d clouds chew up some fps and speculate as to whether that can go to a thread. The answer is perhaps, but the 3-d clouds are also an algorithm dying for GPU optimization..I think we need to pick a more GPU-friendly approach first, then decide how multi-core cleans up the rest.

(As a developer, I consider multi-core to be a more flexible technology than shaders, so my suspicion is that it will be easier to find a way to thread what works on the GPU, not to find a way to use the GPU for a threaded algorithm.)