Showing posts with label off topic. Show all posts
Showing posts with label off topic. Show all posts

Saturday, May 31, 2008

Probability and Certainty

I've been reading Fooled by Randomness (highly recommended - it will either change your life or you won't finish it because Taleb's writing style annoys you) - and it has me thinking about the nature of certainty in software development.  Consider two approaches to uncertainty and how they are completely at odds with each other.

No Weird Behavior

The "No Weird Behavior" approach goes like this: you never want a harmless behavior inside your code that you don't understand.  The reason is that if you don't understand the behavior, you don't really know that it's harmless!

In fact this isn't just a theoretical problem - I only truly started to believe in "No Weird Behavior" after fixing several very hard to find, hard to reproduce crash bugs, only to discover (once the analysis was done) that the broken code also produced a very frequent, less harmful behavior.  Had I taken the time to investigate the "weird behavior" first (despite it not being of high importance) it would have led me to a ticking time bomb.

No Weird Behavior implies that a bug isn't fixed until you know why it's fixed, and that randomly changing code until the behavior goes away is absolutely unacceptable.  This shouldn't be surprising; if a bridge was swaying would you accept an engineer who fixed it by randomly tightening and loosening screws until it stopped?

Wait And See

I get a lot of email with bug reports, questions, and reports of strange sim behavior.  These emails have some unfortunate statistical properties:
  • A good number of them are resolved by the user who emailed within a day or two.
  • A certain percentage are simply never resolved.  (Often I email a question that would point to a clear diagnosis and the user never writes back.  I can only speculate that the user answered the question, found the problem, fixed it, and promptly forgot about our emails.)
  • A certain percentage are solved by the user, who tells me what the problem was, and it was something I would never, ever be able to help them with.  (Things like "I have this program called WickedMP3Player - it turns out if I set its visualizer setting to 'Rainbow' then X-Plane stops crashing" when I've never even heard of the WickedMP3Player program to begin with.)
  • There is a high correlation between bug reports reported by a very small number of users and a resolution from the above categories, or a resolution by the user changing his or her local configuration.
So playing the odds, the right thing to do when presented by a third party with weird behavior is to wait and see who else reports it; the frequency of reports gives us information about the likely resolution.

(And for those of you who think our tech support are being lame when they ask if you've updated your drivers, they are playing the odds to the hilt - they ask you about drivers first because updating drivers fixes an absurdly huge number of the tech support calls we get.)

Getting It Wrong

So we have motivation to investigate everything (no weird behavior), motivation to ignore everything (wait and see) and a rule of thumb that the frequency of reports can help us pick which strategy is best.  Of course, sometimes the rule of thumb is completely wrong.

One user reported a crash using the current web updater (version 2.04) - I had not seen this crash anywhere and it was inside the operating system UI code, so I assumed it was a local problem, e.g. some kind of extension or add-on that caused the OS problems.

The problem, it turns out, is simply low frequency - as the incorrect code made it into 902b1, I got a few reports from more users and realized that this wasn't a local problem, it was weird behavior!  (The bug will be fixed in the 205 installer and 902b2, both of which will be out in June.)

Consider this: if you make a measurement of a known quantity with a dubious measuring device, you learn more about the measuring device than the object you are measuring.  (If you have a ruler and you don't know the units, you could determine them by measuring yourself.)

In a number of cases, we have seen serious "should-happen-all-the-time" crash bugs that get reported by very few users.  (Once we know the actual root cause of the bug, we can deduce logically whether the bug should happen to all users with the configuration or be intermittent.) We can then look back at the actual number of reports to make a judgement call on how much testing is really happening.

For example, we can make some guesses about how quickly a new Macintosh have saturated the X-Plane user base when there is a hardware-specific serious bug in just that machine.

We had this with the iMacs (where the runway lights were broken) and we could watch the machines sell - slowly at first, but then quite quickly.  (BTW, I think that 10.5.3 fixes this and anti-aliasing problems.)  We can even see who runs with anti-aliasing when there is a setting-specific bug (a lot of you do)!

In the end, I think the right approach to balancing "no weird behavior" and "wait and see" is to remember a truth about uncertainty that is very hard for humans to grasp:

The most likely outcome of an uncertain situation is not guaranteed to happen - in fact a lot of the time it won't.*

So we can play the rule of thumb and wait and see, but we always have to keep one eye toward the improbable, which is still possible!

* Blatant blog cross-promotion...the point of my big rant here was essentially the same...it's easy to expect the most likely outcome, but the unlikely outcome will happen some of the time. 

Friday, May 16, 2008

Commodification and Operating Systems

I'll warn you in advance: this is going to start off topic and go way off topic. "Catching up" with the changes to Mac OS, Windows and Linux has me thinking about the nature of technology. I feel a little bit guilty about this post because it's going to turn into a rant about Vista, and ranting about Vista is like shooting fish in a barrel. On the other hand, having used Vista, well, I have a lot of rant to give.

One of the most important things to understand about technology (and computers are no exception) is that changes in the scale of the technology change the very nature of the technology. That is, as you make computers faster and cheaper, at some point the sum of all of those small improvements changes the fundamental nature of the beast. We've seen this as the computer transformed from main frame to desktop (which is really just a change in cost and size), finding an entirely new audience, and now again as the computer changes from what we know of now as a computer to cell phones, MP3 players, and other small, mobile devices.

"Commodification" is what happens when, as things get better, cheaper, faster, etc., consumers stop caring about the marginal improvement. Back in the days of Windows 95 and 386's, there were ways you could improve the operating system and hardware in substantial ways; a doubling in processor speed and a rewrite of the operating system got you protected memory, which meant less data loss.

A few years ago we reached the point where desktop hardware became commodified. For the average user, 1.8 vs 2.2 ghz makes no difference at all. It's a question of how quickly your computer can wait for keystrokes and data from the internet. (Answer: even a lowly Celleron is light-years faster than the I/O devices it typically has to talk to. Even if you're the last kid in your class at Harvard, you're going to be bored discussing politics with a bunch of four-year-olds.) At that point things became very difficult for major vendors like IBM (sold out), HP and Compaq (merged), Gateway (bought out of it's misery), etc. The price of a desktop plummeted from over $1000 to less than $400.

I believe we've reached the point where operating systems have become a commodity as well;
  • Every major operating system has all of the features of a "real" operating system - that is, protected memory, virtual memory, plug & play driver support, etc.
  • The performance for normal applications is just about the same; there are some specific variations that matter in the server market, but for all practical purposes the operating system is not in the way, and the machine is much faster than users need anyway.
  • Every major operating system has a similarly designed GUI experience that, once you get used to the quirks of where the close box is, is just about the same, more or less. (Mac users - keep your pants on. :-)
And this is why life is not so good for Microsoft. In a non-commodified market, you can charge a premium for incremental improvements over the competition. That's a game Microsoft can play - they have a lot of capital to invest in changing their operating system, as long as they are rewarded with a lot of cash for doing it. (And normally they are - about six billion dollars for a major OS revision, I'm told.)

The problem is that operating systems are now a commodity. Simply put, users don't need a new operating system. There are no big ticket features missing from OS X 10.4, Windows XP, or Linux 2.6. This makes Microsoft's business model fundamentally vulnerable to Linux for the first time. If the name of the game is:
  • Keep costs down, as low as possible.
  • Incrementally improve quality very slowly without ever causing the pain of a major OS upgrade.
That's a game Linux, with their army of distributed bug fixers and free source code, is going to win.

When I looked at Windows XP and Ubuntu 6.06 I was afraid that Linux wouldn't make traction into the desktop market. I blamed the adoption of X11, the KDE/GNOME schism, and the Linux communities' being made up of Shell nerds for the tolerable desktop experience.

But look where we are now: Vista is a vehicle for bloat. Combine "we make money by shipping major features" and "there are no more major features to ship" and you get Vista...an attempt to change a lot of things when you should have left things alone.*

By comparison, Ubuntu pretty much just works - you put the live CD on your machine, it asks you some questions and installs...it knows about more hardware, has less bugs, more drivers, and a better user experience. In a commodified operating-system space, the only thing to do is try to avoid a bad user experience - if you can't offer a really juicy carrot to users, try to avoid hitting them with a stick.

And it is in this environment that the Mac is actually gaining market share. Apple's business model has always been at odds with the industry. Complete vertical integration meant higher costs, lack of market share, and out-of-date technology - back when having more for less meant something, that was a real weakness, and explains why the Mac never dominated in market share.

But what a difference a decade makes! Hardware is now commodified (and Apple is integrated at the system-building level, leveraging cheap third party parts like they always should have). Operating systems are commodified. But on the one frontier left, quality of user experience, Apple's vertical integration gives it an immense advantage.

The question is: why does an operating system "just work"?
  • Vista: it doesn't. There are too many systems and not enough testers and engineers trying to solve the problem.
  • Linux: massive distributed engineering. For any given hardware system, eventually a Linux nerd will integrate it. Anyone can solve the problem of poor user experience.
  • Apple: they have it easy. With only half a dozen machines in production (and maybe another two dozen legacy configurations to support) they have a much smaller configuration space to worry about than anyone else.
I don't kow what Microsoft's future is, but it can't be very good. At some point they are going to have to transition from a "major revision for cash" to an "incremental tuning" approach to operating systems. As long as they have market share, they still get the "Windows tax" - that is, their OEM pricing from major vendors on every new computer that is built. It's going to be harder and harder to convince the entire world to make a major jump (see how well XP to Vista went). In this situation, they'd be better off with a more solid operating system. It's unfortunate that they're going to have to try to sustain market share with Vista.

Their best-case scenario is that they eventually get Vista back to an XP-quality experience, in which case all they've done is spend a huge amount of R&D money and pissed off a lot of customers to maintain the status quo.

* I have mixed opinions on Vista's video-driver-model change. But that's a different post.

Saturday, April 12, 2008

Spam My Wiki, Please

I'll post more about scenery soon; this will be short and not terribly topical.

The X-Plane Plugin Wiki used to have no login requirements - anyone could just click and edit. All was good for a while, and then I logged in one day to find some of the most highly used pages stuffed to the gills with the phrase "Nigritude Ultramarine" over, and over, with links to other sites.

You can read about this phrase, why it was invented and what was going on here.

Our response was to put a user login requirement on the site, and we haven't had a spam problem since (knock on wood) although we do seem to get what appear to be bogus signup requests. (They don't really hurt anything, they just clog the user database. I'm not sure why anyone would sign up if they don't intend to actually do anything.) But a few thoughts on Nigritude Ultramarine and people's attempts to get junk spam sites into the top of Google's search listing:
  • I was pleased to see a real site (this FAQ) as the number two search hit for the term...this real link from a real blog to them can be sort of a contribution to their page rank.
  • I have faith that Google will continue to fight the technology arms race against seach engine optimizers...Google has gobs of money and an immense motivation to do so.
  • Apparently link farmers, in an attempt to raise their page rank, have been using bots to automatically steal blog content. I haven't seen this myself yet and I've never had an X-Plane query go to a junk site. But some blogs I read have complained of this happening.
Only mildly related, there is an X-Plane Wiki, and I'll try to point people toward it next week; I've been posting things there rather randomly in an attempt to get underdocumented stuff written down somewhere.