Frequently, error messages are totally uninformative -- or, worse, just plain wrong. Here, we look at how meaningful error messages can make it easier for users to correct problems without having to rely on technical support, and how poorly chosen messages can turn users into ex-users.
When something goes wrong with your computer, the worst error message is, of course, none at all. For instance, on the classic MacOS, renaming a file consisted of clicking on the file name, and waiting a second or so for the name to turn into an editable text field. A wonderful interface -- assuming you could rename the file. Sometimes you couldn't, in which case, nothing happened. No error message. No warning. You simply didn't get any feedback; you might easily have thought you'd placed the cursor a few pixels off. The explanation? Most often, Apple's file sharing protocol thought it owned the file or disk in question, so it couldn't be renamed. Unfortunately, the only way to find this out was to ask someone who knew about this quirk.
Silent failure is a dangerous, awful thing. Users depend on the knowledge that, at the very least, they will know that something has gone wrong. For example, imagine that you are backing up the data on your PDA one last time before taking it in for a repair (where there's a good chance you'll lose all your data). Wouldn't you want to know that the backup failed?
The problem is, of course, the difficulty of finding all possible errors and testing for them. Many programmers advocate a policy of not checking for errors that you can't handle. When applied judiciously, this policy works; for instance, if the error is that you are unable to produce output, it will be impossible to provide a message. However, many people extend this theory to all manner of errors -- memory allocation, file handling, and so on. One product I saw maintained hundreds of files... and never checked for disk space. If you attempted to save when there wasn't enough space, it would systematically destroy every file it managed -- with no warnings or errors of any sort. Even if you, the programmer, can't fix the error, you can at least notify the user that there is one.
Thus, the first rule of error messages: Get some.
The next worst error message is the one that tells you nothing at all. UNIX is often criticized for terse error messages, but terseness is better than nothing. Once again, Apple has won a place in my heart in this regard. From the "Type 11 error" that used to be so common in early PPC MacOS systems, to the cryptic "Error #-6986" messages that I am told are a problem with a secure socket layer library, Apple's systems are full of errors identified only by large negative numbers. Many of the numbers are documented -- some more than once. The fault for this lies partially with application developers, who should presumably be checking for errors and emitting more useful results -- but some of the onus lies with Apple, who should have produced a standard way to generate useful descriptions for errors.
Every system has some of these. Years ago, a friend of mine used a compiler that had only one diagnostic message: "Error in code." Not even a line number. Similarly, Windows systems frequently announce that an operation cannot be completed "because an unexpected error occurred." Sadly, there's nothing unexpected about this happening from time to time.
Thus, the second rule of error messages: Describe the error.
Once upon a time, I installed a new version of Windows on a brand new computer with a freshly formatted drive. During the installation, Windows said "Setup is now searching your system for installed components."
This phase of the installation took a good four or five minutes. Now, consider the circumstances: There were no files at all on the drive being searched. What part of this search took effort? I don't know what it was doing, but "searching for installed components" was probably a poor description. Programs alert you that a disk is full when, in fact, the write protect tab is enabled. (Does anyone here remember write protect tabs?) Web sites inform you that your order has been cancelled when the problem is really that their server is down.
An error message that is wrong can be worse than no message at all, but often it's still possible for the user to figure a few things out. While "disk full" may not be the problem, it suggests that the problem has something to do with writing to the disk; I may stumble across the write protect tab. A less lucid error message may leave me with no hope at all.
The UNIX C-shell wins a special prize in this category for its policy of printing "progname: No match" (where "progname" is the name of the command you've attempted to run) whenever it can't find a file to match a wildcard. This gives the mistaken impression that the program in question has actually been run, and couldn't find a file. (It does, however, produce an amusing response to "got a light?")
Thus, the third rule of error messages: Describe the error correctly.
(In Latin, "Unto the root is born a brother.")
Some error messages are simply hard to read. This is not always unacceptable, but it guarantees a lot of technical support involvement. GE's Multics system used Latin for all error messages; the intent was that this would prevent the end users from trying to outsmart the system, and would encourage them to call the adequately trained support staff to determine the nature of the problem. Furthermore, it eliminated the problem of end user paraphrasing; consider, for instance, how difficult it would be to determine which error message the user is referring to when he reports "It said there was an error." GE's Latin method is a marked improvement over tricks such as using "File not found" in one module, and "Couldn't find file" in another module. Users do paraphrase errors, so distinguishing problems should not depend on subtleties of language or phrasing.
The above story is beautiful, but unfortunately in error. In fact, the condition involved was "impossible" - a result of broken hardware - and the error message would otherwise never have been left in shipping code; it was an incredibly strange boundary condition. The general philosophy of error messages in Latin is something I heard somewhere and failed to research correctly. Thanks to Tom Van Vleck for the correction.
For most programs, however, the GE solution would be unacceptable. Imagine having to call technical support every single time a program failed to do anything, for any reason! While this might be tolerable (even fun!) if technical support were quick and responsive, and programs rarely failed, for day-to-day usage it would be insane. Many failures are the result of simple problems -- printer out of paper, no cable plugged into network card, disk full -- problems that can be solved reliably by end users.
Many, many programs produce errors of this form. Even very verbose error messages can be uninformative if they don't define enough terms. I once saw a rash of complaints about a program that aborted with an error message along the lines of "INSIST(x>0)". This tells the user nothing.
Thus, the fourth rule of error messages: Describe the error intelligibly.
When errors can occur, it's critical to check for them. (You may even want to check for errors that "can't" occur.) When you do find an error, describe it to the user as clearly and intelligibly as possible. If you want to make sure that your support staff can tell which error is being referred to, assign codes or values to errors. Do not use subtle variations in phrasing to distinguish errors. Don't use codes alone -- provide them along with descriptions of what has happened. You can get a lot of mileage from something as simple as the UNIX standard "perror()", which emits messages such as:
fopen: Permission denied.
read_file: No such file or directory
Don't be afraid to print two error messages if you think one won't communicate the problem clearly -- but remember that users may only read one or the other.
This week's action item: Introduce errors into the inputs for any programs or pages you maintain. Do the responses you get provide the information you need to fix the problems?