Tuesday, August 05, 2008

Why Is Installer Scanning Slow?

When you update X-Plane, it scans about 10,000 files before it runs the update. This scan checks every single byte of every file. As a result, it's a little bit slow - it takes a few minutes.

With the new installer, when you add and remove scenery, we do the same scan on the installed scenery. This scan is very slow - 78 GB is a lot of data!

When the next installer comes out (2.05, which will be cut after 920 goes final), the add-remove scenery function will do a quick scan to see what files exist. This is a lot faster - even with full scenery the scan takes only 10-20 seconds on a slower computer.

I can't pull the same trick for updates - we need to detect any possible defect in a file during update to ensure that the install is correct!

4 comments:

Anonymous said...

Couldn't a hash be used here? Something like the following scheme perhaps...

The current X-Plane install would have a manifest file containing a tuple
(path, hash) for each known file in the install. The hash would
be the output of some hash function
(MD5, SHA-1,...) run on the file contents. The new installer would then just download the new manifest file and compare it with the one in the current installation. If the hash value differs for the same path, then that file needs to be downloaded.
This way you never need to read individual files or even traverse directory hierarchy. You only need to keep the manifest file up to date.

Benjamin Supnik said...

A hash is EXACTLY how it works now! The manifest list for an install is a set of MD5 sums of each file.

The speed question is: when do we compute the LOCAL hash for the LOCAL install?

Right now, we scan that local file and hash it when you update, e.g. we say "scanning." That's the slow part...it requires us to scan every byte of the install.

So the optimization would be to cache our local hashes...but this is the step I do not want to do - because if a file was changed, the local cache would be out of date and we could miss a file that needs updating.

Basically what I'm saying is: I don't want to get clever and cache the hashing work on an update because the scan isn't THAT slow and the consequences of getting a file wrong are unbounded tech support nightmares.

By comparison, I'm willing to play fast and loose with the hash on global scenery because file mismatch is a lot more rare in this case, and the hash is horribly slow for that much data.

Anonymous said...

Well, you're certainly right about the support nightmare in the case
of locally cached hash values. (I suppose one could provide an option in the installer that says
'yes, I know what I'm doing, just use the local cache instead of recomputing everything', but I guess this would bring support problems of its own....)

Matt Schifter said...

Hi,

Actually, since .zip files provide internal integrity checks, you could create a very large zip file, copy it over, verify it using standard zip file integrity testing, use an MD5 on that, and then unzip all the files on the destination at once. Using the largest file size for the copy from the CD will multiply your installation performance by orders of magnitude. I'm at about 170 K per second installation rate. Part of the performance lag must be that I'm having to use an external drive for storage, however, it is not a slow drive.

"Real-time" Regards,

Matt