Thursday, July 17, 2008

Why We Must Edit the Source Data

My previous post on scenery tools stirred up some discussion; y-man brought up a point fundamental enough that I think it warrants explanation.

The questions is whether to fix broken DSFs by editing the source data or the DSFs themselves.

Let me be clear: both are viable options, both have limitations, and neither are possible today. So in choosing one over the other, I am picking a strategy that I think is superior, and discounting the other not because I think it is useless, but because it is less useful and development time is very limited (doing both would take away from other good features).

Basically the two ways to fix a DSF are:
  1. Edit the DSF itself until it looks correct.
  2. Edit the source data and then rebuild the DSF from scratch.
I strongly believe we must pursue the second strategy for this simple reason: if we correct a DSF but don't fix the source data, the same mistakes will be made in the future.

We have to keep recutting the global scenery to keep up with:
  • Higher capacities for detail in newer computers.
  • New global data that becomes available.
  • Improved generation algorithms that makes better results from existing data.
To lose any of these would be a big set-back in scenery quality, so not recutting DSFs isn't a great option. (Furthermore, improvements in scenery often come from new global data, so picking user changes over new data would be a tough choice.)

By letting users change the source data, we can have the best of both worlds: problems are fixed while new technology is adopted.

To answer y-man's direct questions:
But that data is not available to us users is it? If it is, where is it, and where is the spec to work with it?
It is not available yet! I am proposing to focus on making the data available and creating the infrastructure to share data and receive improvements (choice 2) rather than providing a DSF mesh editor (choice 1).
Alternatively, a user who goes thru the trouble of correcting base DSF could send in the modified DSF, or may be even a diff between before and after states of DSF text files, and attach an explanation of purpose.
Here's the problem: we can't preserve the diff to the DSF and apply it to future renderings.

Imagine, or example, that you relocate 500 mesh vertices from the existing DSFs to correct a coastline. (I am being generous here and assuming a clear, single, unifying edit. But some authors would more likely move the vertices to make a number of improvements.)

In the meantime, another author creates an airport nearby, and someone else improves the SRTMs (there is at least one person attributed in our about box who has been collecting void-filled SRTM tiles). The effect of the nearby airport's pavement changing size and the raw elevation changing height is to change greatly how the mesh is generated, such that 400 of the 500 vertices that were moved simply no longer exist, and 450 new vertices are in the nearby area now that were not in the original DSF.

This case is known as a "merge conflict" in computer programming terms (and happens when two changes to a program are "merged"). The problem is that we can't sanely know what to do with our 500 edited vertices in this case. Do we take the 100 vertices that still exist and move them without the others? What if that produces very strange results? (Triangles might become inside-out because some vertices are moved a lot but their neighbors are not because they did not match the diff.)

We could try to apply some kind of change to the new vertices similar to what happened to the old, but how do we know if this is making things better or worse? What if that change simply deforms the outline of the airport that was added?

I can go on and on with these kinds of examples, but the point is this: you can't unbake a cake. A DSF is similar; you can't necessarily recover the sources that were processed together to form the final product.

This is why it's important that we create infrastructure to correct source data, rather than focus on editing the final DSFs. I can assure you that my negative attitude toward editing DSFs is not a negative attitude toward user participation!

There are still a lot of details to be worked out about how we can work together. Who will own the data, and on what terms? How will it be distributed? How will it be shared?

Unfortunately I can't answer those questions yet. I still need to do more research into what is technologically possible - then we can figure out how to proceed.


Benjamin Supnik said...

Rob wrote this:

"Hi Ben,
Would it be possible to establish some kind of centralised database which stored only corrections (contributed by X-Plane users) to default X-Plane scenery (base mesh, objects and landclass info)?
In this way Laminal R would secure that full X-Plane installation is required and maybe limit the amount of data to store and transfer. X-Plane users could easily download (on their request) corrections from the database with a tool similar to X-Plane installer which merged local DSF with downloaded patches.
Since database supervising and solving problems with the conflicting data would require a lot of men-hours the database would be run rather by community (like instead of Laminal R. I am sure that people can do amazing things if you provide them tools capable of doing this...
Would it be feasible? Rob"

Rob, I'm sorry I had to move your comment - I'm not sure which post you were trying to post to - I accidentally dumped a blank post up last night.

Anyway, the answer is basically: no. Please see today's post. We can't work with "diffs" to DSFs, we need to work with the source data - otherwise all the fixes will be lost on a regular basis.

We need to track changes to the SOURCE materials.

y-man said...

Thanks for explanation.

Looking forward to the time we'll be able to do corrections to base DSF.
Both corecting bad sampling errors, and making "runway follows land countours" option --which I love, but had to give up on-- usable depend on this.

Would you be kind enough to give an explanation of why the airport flattening causes weird water cliffs/hills when the airport is close to shore?

Those bad water artifacts do not occur with "runway follows land countours" on, but it has its own problems like very steep taxiways, and converted scenery objects sinking/floating.

Benjamin Supnik said...

y-man, this old post explains it I think:

krz said... can be done ;)

or u might take a look to google docs. they do it too. but i understand that this is not something that can be implemented over the weekend. ..and its not really necessary for xplan these days.

im looking forward to the way the scenery data will be shared and what kind of version-control (& sign-in sign-out) u implement.

if that is done right the sim could benefit alot.

i still dream about a joint-venture with the google earth team since both programms work on the same thing and redundancy in such case is always wasted effort.

Benjamin Supnik said...

KRZ: I agree that multiple edits to a DSF from multiple authors COULD be merged (with results that might be somewhere between good and lousy). But that's not my point!

My issue is that the edits are being made to a lossy derived data format and not the original source!

To use a coding example:

Users submitting the DSF edits would be the equivalent of users submitting their patches to Linux in ASSEMBLY LANGUAGE and not C.

It would be much harder to merge those patches because the "context" of the patch (assembly) would not be stable due to edits to the original source.

From a google documents perspective, it's as if you make an edit, but while you did, I converted the entire thing to French! Context - gone!

The fundamental problem is that the final cooked data (DSF) appears to change radically in a way that isn't easily predictable from mods to the sources. This is what invalidates DSF-level patches.

So the issue here isn't one of merging, it is one of de-compiling.

(And lest I get myself into even more trouble, yes, it IS possible to derive the original data back from the DSF - we could attempt to isolate the road mesh, the DEM, etc. But do we really want to do this? DSF is a LOSSY format? The result would be an increase in error bounds in the data every time we took a patch.)

If we were doing image editing, would we want our patches submitted as edits to the original layers in a photoshop document, or edits to a 50 KB JPEG that was optimized for the web?

Dan31 said...

OK to share Scenery, but if I use orthophoto to make a scenery wich could run only with a gigabyte video card ?
Nearly nobody could use it. So no use to share this kind of data. But I need to modify the DSF to make this scenery. So we need tool to help use modify easyly the DSF.

Benjamin Supnik said...

Dan31, I still do not agree entirely.

I do agree that orthophoto-based scenery would not be scenery that you share as part of collective work to improve the global scenery - it might require too much hardware, or too much storage, to be part of the global scenery. still have the same question: do you build the new DSF from source data + orthos, or the old DSF + orthos.

Now you CAN (today) make new DSFs out of old ones using PhotoSceneryX. I won't be providing this tool not only because it is not my preferred approach, but because Justin has already done it!!

But you can also make that new DSF using source data (coastlines, DEM, land use, photo placements).

I prefer this technique because:
- It lets you correct elevation as well as photo placement...DEMs are easy to get with SRTM around, and editing them is pretty straight forward as well.

- You don't pick up accumulated error from importing and rebuilding the finished DSF.

- The mesh will be allocated more efficiently if it is rebuilt with orthophoto placements, rather than simply having additional cuts burned in to place the photos.

Basically if you ONLY want to repaint orthophotos, then editing the DSF might make snese - but this technique isn't "scalable" - as soon as you want to edit the DEM, change coastlines, etc., rebuilding from source makes more sense.

We provide our land use data now to make MeshTool a workable options...DEMs and coastlines are up to you. (DEMs are easy to get...coastlines are a bit tricky...I hope to add shape-file coastlines to MeshTool someday so that you can use SWBD as a coastline source.)

Final note: the reason I say the mesh is "less efficient" is this...our mesh algorithm places more vertices in places where there is more mesh detail -- that is, lots of vertices on mountains and vey few on a flat valley. If you provide a NEW DEM and simply edit the existing DSF, because you are not reallocating the vertices, vertices might not be used in the most "interesting" places, resulting in less accuracy for a given triangle count.