Saturday, December 05, 2009

No Raster Land Use Data

The X-Plane version 8/9 default scenery uses raster land use data (that is, a low-res grid that categorizes the overall usage of a square area of land) as part of its input in generating the global scenery. When you use MeshTool, this raster data comes in the .xes file that you must download. So...why can't you change it?

The short answer is: you could change it, but the results would be so unsatisfying that it's probably not worth adding the feature.

The global scenery is using GLCC land use data - it's a 1 km data set with about 100 types of land class based on the OGE2 spec.

Now here's the thing: the data sucks.

That's a little harsh, and I am sure the researchers tried hard to create the data set. But using the data set directly in a flight simulator is immensely problematic:
  1. With 1 km spatial resolution (and some alignment error) the data is not particularly precise in where it puts features.
  2. The categorizations are inaccurate. The data is derived from thermal imagery, and it is easily fooled by mixed-use land. For example, mixing suburban houses into trees will result in a new forest categorization, because of the heat from the houses.
  3. The data can produce crazy results: cities on top of mountains, water running up steep slopes, etc.
That's where Sergio and I come in. During the development of the v8 and v9 global scenery, Sergio created a rule set and I created processing algorithms - combined together, this system picks a terrain type from several factors: climate, land use, but also slope, elevation, etc.

To give a trivial example, the placement of rock cliffs is based on the steepness of terrain, and overrides land use. So if we have a city on an 80 degree incline, our rule set says "you can't have a city that slanted - put a rock face there instead."

Sergio made something on the order of 1800 rules. (No one said he isn't thorough!!) And when we were done, we realized that we barely use landuse.

In developing the rule set, Sergio looked for the parameters that would best predict the real look of the terrain. And what he found was that climate and slope are much better predictors of land use than the actual land use data. If you didn't realize that we were ignoring the input data, well, that speaks to the quality of his rule set.

No One Is Listening

Now back to MeshTool. MeshTool uses the rule set Sergio developed to pick terrain when you have an area tagged as terrain_Natural. If you were to change the land use data, 80% of your land would ignore your markings because the ruleset is based on many other factors besides landuse. Simply put, no one would be listening.

(We could try some experiments with customizing the land use data..there is a very small number of land uses that are keyed into the rule set. My guess is that this would be a very indirect and thus frustrating way to work, e.g. "I said city goes here, why is it not there?")

The Future

I am working with alpilotx - he is producing a next-gen land-use data set, and it's an entirely different world from the raw GLCC that Sergio and I had a few years ago. Alpilotx's data set is high res, extremely accurate, and carefully combined and processed from several modern, high quality sources.

This of course means the rules have to change, and that's the challenge we are looking at now - how much do we trust the new landuse vs. some of the other indicators that proved to be reliable.

Someday MeshTool may use this new landuse data and a new ruleset that follows it. At that point it could make sense to allow MeshTool to accept raster landuse data replacements. But for now I think it would be an exercise in frustration.

6 comments:

naoliv said...

This new data set is only available for North America and Europe, right?

Benjamin Supnik said...

Not just US and Europe - there is also new global 300m data...Andras can say more about that.

Previously it would have been very difficult to mix high res localized (US/Europe) data with global land use data because of the huge gap in quality...you couldn't design an algorithm to use both.

But I think Andras has found enoguh high quality sources and enough coverage to make something both consistent, global, and high quality! :-)

Dan said...

This is really interesting, but what role will OpenStreetMap data play in all of this?

Will OSM data be used solely for roads or will other mapping features such as landuse (forest, residental, retail etc), water (coastlines, lakes and rivers) be used as well?

Benjamin Supnik said...

I hope to use OSM for roads and in some cases water. I expect raster data to drive landuse mostly, because while OSM is rich in landuse info, it is also very inconsistent in quality and qunatity.

Wayne Conrad said...

You could, in theory, have two sets of land use data. One is the "really crappy data we often override." The other would be human overrides, with "transparent" cells that would allow the "crappy data + rules" cells to show through, and and "opaque" cells that would be used without having any rules applied.

A "a human set this one, so no rules" flag on one set of land-use data would have the same effect, but might be harder to manage, since it obliterates data (once you override a cell, you can't easily change your mind and go back).

I'm not saying that any of the above blathering is a good idea.

Nick Ward said...

This does help to explain why scenery in Canadian prairies / parkland is often totally unrealistic. Areas that in RL have about 1 or 2 grain farms per square mile appear in XP to be residential and absurdly lit up at night.
Thank you for reassuring me that this will improve.