C: The Complete Nonsense

Revised for the 4th Edition

Last revision: February 8th, 2012

Introduction

Back in 1996, having heard horror stories about Herbert Schildt's C: The Complete Reference, I decided to check it out. I flipped the book open; I found glaring errors. I paged through it. I found more glaring errors. In short, the book had lived up to the hype; it was awful. Being a pedantic sort, and having just recently started having a web site, I wrote about it, in the previous version of this page.

Time passed. People forgot about it. A 4th edition of the book came out (back in 2000), but I didn't even know this. Until, more recently, I started hearing queries and concerns. My half-baked page, thrown together in an afternoon, had become the topic of Fearsome Disputes. People were arguing over whether it was any good. Worse, people were arguing that the flaws (and there are some, certainly) in my web page somehow showed that the book wasn't utter crap.

Well, it is. And the 4th edition, while it fixes some of the errors reported (including the infamous void main which Schildt long defended as not being an error), preserves some, introduces others, and continues to be an exceptionally bad book.

I am no longer some random guy who likes C. I spent about a decade on the C committee—and unlike Schildt, I actually showed up, submitted papers and proposals, worked to resolve defects, and otherwise contributed to the process. I am no longer the mildly autistic kid who had never really studied writing or communication; I'm now a mildly autistic adult with years of experience writing and communicating, including experience writing publishable material.

So if people want a better-written page on the topic, well, I'm willing to provide one. Readers may also appreciate Clive Feather's very helpful review of Schildt's The Annotated ANSI C Standard, which also illustrates that Schildt does not seem to have a good grasp on the C language.

Contents:

A note about editions

The previous version of this page made no mention of editions, but was based on the 3rd edition of C:TCR. This page is based primarily on the 4th edition, but I have my copy of the 3rd edition (complete with my receipt, from January of 1996) around for comparisons; in some cases, the comparisons cast light on the origins or nature of some of the errors.

There are two reasons this matters. One is that I've been criticized for not pointing out which edition I was criticizing. (One prominent critic insisted that it was the first edition, despite the clear impossibility of the claim.) The second is that the criticisms you are about to read are not merely criticisms of someone's rough draft. They are criticisms of code (or writing) which may well have been revised two or three times.

Pick a page, any page

In comp.lang.c, some recent discussions of this led to an observation. Flip C:TCR open to a random page. You will probably find an error. This game is easier with the 2nd and 3rd editions, because of the prevalence of the incorrect void return type for main(), but even in the fourth edition, there's plenty of room for fun. I've sorted these in numerical order.

While some of these (pages 259, 264, 455, and 463) were picked by randomly flipping the book open, other pages were picked because I came across them while looking up Schildt's coverage of a particular issue.

Page 51

Here's a fairly typical example of the kind of thing you get wrong if you don't have much real experience with portability:

As bits are shifted off one end, zeroes are brought in the other end. (In the case of a signed, negative integer, a right shift will cause a 1 to be brought in so that the sign bit is preserved.)

Clear, lucid, but not actually true. What happens when you right-shift a signed, negative number, is "implementation-defined" (C99, 6.5.7, paragraph 5); that is to say, the implementation has to document what happens. While the behavior Schildt describes is one of the common choices, other systems have been known to shift in zeroes. There may be other behaviors out there. This ties in somewhat with Schildt's comments about the use of two's complement representations, where the behavior he describes is common but not actually required.

Page 259

Here's what I found the first time I tried flipping the 4th edition book to a random page. This is page 259 in the 4th edition, or 251 in the 3rd edition. (The only difference between them is the declaration for main.)

The following program uses freopen() to redirect stdout to a file called OUTPUT:

#include <stdio.h>

int main(void)
{
  char str[80];
  
  freopen("OUTPUT", "w", stdout);
  
  printf("Enter a string: ");
  gets(str);
  printf(str);
  
  return 0;
}

In general, redirecting the standard streams by using freopen() is useful in special situations, such as debugging. However, performing disk I/O using redirected stdin and stdout is not as efficient as using functions like fread() and fwrite().

So, what's wrong with that, you ask? Here's a list. I do not promise that I caught everything.

  1. The printf command does not send a newline, nor is the stream flushed. On many systems, doing this to the console would not display the prompt until some later point. (See Newlines and Output.)
  2. For that matter, the file was opened in text mode, not binary mode. In text mode, you may have to provide a newline at the end of your output; since it is permissible for an implementation to require this, you should always do it, especially in sample programs in a book purporting to teach C.
  3. The gets() function is dangerous, and should never be used; it has no provision for preventing input data from overrunning the buffer given to it. (This may seem "harmless", but a reference book should not illustrate such bad habits.) Schildt acknowledges the dangers elsewhere in the book, but uses it anyway. (Note that gets() is officially deprecated, as of Technical Corrigendum 3 to C99; it's not even present in the C201X drafts anymore.)
  4. The gets() function strips the newline from a line of input; however, printf() does not add one, so the line printed by printf() will be missing its newline.
  5. For that matter, it's incredibly dangerous to pass an arbitrary string as the first parameter to printf -- if the user happens to enter something like %s somewhere in the string, your program is likely to behave unexpectedly, or crash. So that should have been printf("%s", str) or more likely puts(str).
  6. No error check on freopen().
  7. The first item in this list was misleading. See, I implied that the prompt would have gone to the console, assuming that freopen() succeeded. Of course, it wouldn't; it would have gone to the just-opened file "OUTPUT" instead.
  8. The comment about using stdin and stdout being "less efficient" has no particular basis in reality. It's not even coherent; you can use fread() and fwrite() with those streams; the presented dichotomy between redirected standard streams and the other file I/O functions is not a real dichotomy. It's like saying that using int isn't as efficient as using + and -.

What this tells you is that this code was never actually tried; printing the prompt is incoherent after the freopen(), and the mismatch between input and output would have been caught by even casual testing. This is an atrocious example. This should never have been written, let alone made it past whatever presumed technical review might have happened.

Page 264

Another "let's just flip the book open and see what we get". This example also occurred in the 3rd edition on page 262, although it wasn't quite the same.

[...] In this way, if you need to change the size of the array, you will need to change only the #define statement and then recompile your program. For example:

#define MAX_SIZE 100
/* ... */
float balance[MAX_SIZE];
/* ... */
for(i=0; i<MAX_SIZE; i++) printf("%f", balance[i]);
/* ... */
for(i=0; i<MAX_SIZE; i++) x =+ balance[i];

The last line of this example is new in the fourth edition. So, what's wrong with this one?

  1. The printf loop has no newlines or spaces, so all the numbers would be run together. Not a huge problem with the code as such, but certainly a shoddy bit of work.
  2. Usually, when the size of an array is called MAX_SIZE, that implies that the actual size may well be some smaller value. This is a nitpick; we could reasonably assume that the implication is that the whole array has been initialized.
  3. There hasn't been an =+ operator in C since the 1970s.

You might think the =+ wouldn't compile, but in fact, it will. C89 standardized the "unary +" operator, which exists only for symmetry with a leading - used on negative numbers. Thus, this is equivalent to x = +balance[i] which is in turn equivalent to x = balance[i], so the last loop is precisely equivalent to the non-loop statement x = balance[MAX_SIZE - 1]; (at least, assuming that x isn't volatile...). Oops.

Again, this kind of stuff should never have made it past any kind of review.

Page 264, again.

While looking for the previous example in the 3rd edition, I happened to look at the example after it. This is on page 264-265 of the 4th edition, and 262-263 of the 3rd edition.

This form of a macro is called a function-like macro. For example:

#include <stdio.h>

#define ABS(a)  (a) < 0 ? -(a) : (a)

int main(void)
{
  printf("abs of -1 and 1: %d %d", ABS(-1), ABS(1));
  
  return 0;
}

When this program is compiled, a in the macro definition will be substituted with the values -1 and 1. The parentheses that enclose a ensure proper substitution in all cases. For example, if the parentheses around a were removed, this expression

ABS(10-20)

would be converted to

10-20 < 0 ? -10-20 : 10-20

after macro replacement and would yield the wrong result.

Ahh, Mr. Schildt. So close, and yet, so far. Here's a little thought experiment:

printf("ABS(-3) - 1): %d\n", ABS(-3) - 1);

See how that works? (It prints 3, not 2.) Schildt forgot the most important part of parenthesizing a function-like macro; you must parenthesize the entire definition. He had a great opportunity here to cover the reasons for both parentheses around the entire definition, and parentheses around each individual macro argument. He missed it, instead claiming that the partial solution worked correctly "in all cases", which it does not.

He also had a great opportunity to point out that ABS(x++) would be inadvisable, as it would increment x twice. Again, this is stuff which someone explaining macros really ought to cover, and it gets ignored.

Page 314

In the 3rd edition (page 298), the description of fflush() was just plain wrong; it made the claim that fflush() on an input stream flushed buffers. (There exist systems where this is true, but it's not a language feature.) In the 4th edition, it's perhaps been improved some:

If stream is associated with a file opened for writing, a call to fflush() causes the contents of the output buffer to be physically written to the file. The file remains open.

There's a couple of questionable bits here, although they're arcane enough that you might reasonably call them nitpicks.

  1. Streams in C can be opened for reading, writing, or "update"—both reading and writing. You can safely use fflush() on an update stream only if the most recent operation on it was not a read (writes or seeks, for instance, are allowed). Schildt misses that distinction, although I simply can't tell whether he fails to point out that you can sometimes flush an update stream, or fails to point out that you sometimes can't flush an update stream.
  2. More perniciously, it's worth pointing out that fflush() does not cause the contents of the output buffer to be physically written to the file. It causes the contents of the output buffer to be delivered to the host operating environment, which may well have additional layers of buffering. On most modern systems, this could result in the file living in an operating-system level write cache for seconds or minutes, or for that matter, living in a cache on the disk drive's internal microcontroller for some time. (That "16MB cache!" you see advertised on hard drive packaging is, in many cases, not something the C implementation can force to be flushed out to the physical platters.)

That second point may seem like it's picky and trivial, but think about the implications for a moment. Say that you're writing financial software, or something comparable, where it is extremely important that you be absolutely sure what has or hasn't been physically stored and will survive, say, a power outage. Wouldn't it be nice to know that, if you want to be sure something's been physically written to the disk, you need to find out about some operating-system specific feature?

NEW edit: der Mouse helpfully pointed out that I'd missed a more severe error. There's an additional problem. The standard's language refers to "an output stream". Schildt refers to a stream "associated with a file open for writing". This creates a subtle error; while there's no simple way to do this within standard C, there are ways to create a stream which is open for writing on a file which is opened read-only. (Unix users will guess that fdopen() is involved.) It is perhaps relevant to note that the writing/reading distinction that fflush() cares about is at the stream level, not the file level.

But enough with the actual description of fflush(); now let's have a look at the sample program:

/*
  Assume that fp is associated with an output file.
*/

for(i=0; i<MAX; i++) {
  fwrite(buf, sizeof(some_type), 1, fp);
  fflush(fp);
}

It's hard to guess the intent here. This writes the same value to fp, doing so MAX times. Normally, you'd expect a loop to write each member of an array (something like &buf[i]). But for that matter, normally, you'd write this with fwrite(buf, sizeof(some_type), MAX, fp), since that's the entire reason fwrite() takes both a size per object and a number of objects.

There do exist cases where you might want to flush output after each object; explaining why it might matter would have been useful. So would explaining why it might be extremely slow and inefficient.

It's a little improvement from the third edition; it's still not a good example.

Page 455

This is the same as material on page 491 of the 3rd edition.

The mblen() function returns the length (in bytes) of a multibyte character pointed to by str. Only the first size number of characters are examined. It returns -1 on error.

[...]

This statement displays the length of the multibyte character pointed to by mb.

printf("%d", mblen(mb, 2));

Short, simple, to the point, and rich with errors.

  1. First, the purely grammatical one; "the first size number of characters" should just be "the first size characters". Would you write "the first two number of characters"? No? Apparently Schildt would.
  2. Secondly, the entire point of functions like mblen() is to distinguish between bytes and characters. The entire point is that the string being looked at contains a single character, which is a multibyte character. So in fact, mblen() examines at most the first size bytes.
  3. Furthermore, in some character encodings, such as UTF-8, it is possible for a single character to contain more than two bytes. In such a case, mblen(mb, 2) will not display the length of the multibyte character, because it won't examine more than 2 bytes.

Impressive, no? In just a couple of sentences, Schildt manages to completely reverse the entire point of the distinction between multibyte characters and bytes, and offer an example which doesn't really do what he says it does.

Page 463

This is the description of strtol(). Surprisingly, the code isn't more wrong than usual (it has the usual use of gets(), prompts with no newline or flush, and so on). The text, however, is especially bad.

strtol

#include <stdlib.h>
long int strtol(const char *start, char **end, int radix);

The strtol() function converts the string representation of a number stored in the string pointed to by start into a long int and returns the result. [... more description, which is mostly okay ...]

The strtol() function works as follows. [... more snipped ...] Finally, end is set to point to the remainder, if any, of the original string. This means that if strtol() is called with " 100 Pliers", the value 100L will be returned, and end will point to the space that precedes "Pliers".

Whoops. For those who missed it, Schildt just mistook a pointer for the thing pointed to. The value end (called, more correctly, endptr in the standard's description) is not set to point to anything. Rather, if the value endptr is not a null pointer, then the pointer it points to is set to point to the remainder of the string. That's a pretty significant difference! The difference between a pointer, and the thing it points to, is not something you should casually overlook, whether in production code (where it could be fatal) or in a tutorial or reference (where it could lead to readers being confused).

For extra credit, I also direct your attention to the description on page 465 of strtoll(), which states:

The strtoll() function is similar to strtol() except that it returns a long long int. If the result cannot be represented by a long integer, LLONG_MAX or LLONG_MIN is returned, and the global errno is set to ERANGE, indicating a range error.

Subtle, perhaps, but I'd think it'd be worth noting that should be "... cannot be represented by a long long integer", or better yet, "cannot be represented by a long long int". Just a copy and paste error, maybe, but more of the sort of shoddy thing we've come to expect. For reference, the equivalent sentence from the description of strtol() (page 464) is:

If the result cannot be represented by a long int, LONG_MAX or LONG_MIN is returned and the global errno is set to ERANGE, indicating a range error.

The usage of "long int", contrasted with "long integer", could be argued to somehow indicate a colloquial use for "long integer" that was not intended to refer specifically to the long int type. The same issue occurs with strtoull(), with the phrase "unsigned long integer".

The third edition book used the phrase "if a conversion error occurs", which I think is definitely worse. However, the usage here seems inconsistent, and a typical reader, looking only at the strtoll() description, is likely to view this as an error. If you have to compare with three other functions and a previous edition to conclude that something may well have been intended such that you could interpret it as being correct, it needs to be rewritten.

An in-depth look

One of the concerns with a listing like the "pick a page" list is that you can't be sure that it's particularly representative. To address this, I asked some random guy on Usenet to pick a number between 1 and 700, proposing to review the ten consecutive pages starting with that one. He picked page 168.

Page 168 starts partway through the "Function Prototypes" section of Chapter 6 (Functions). The section starts on page 166. This section is far from the worst in the book, and the complaints I have about it are largely nitpicks, but they're illustrative of the sheer density of errors, questionable choices, or poor explanations.

Page 168 begins with a description of the option of omitting parameters in function prototypes. It continues:

Function prototypes help you trap bugs before they occur. In addition, they help verify that your program is working correctly by not allowing functions to be called with mismatched arguments.

[...] Remember, although prototypes are optional in C, they are required by C++. This means that every function in a C++ program must be fully prototyped. Because of this, most C programmers also fully prototype their programs.

What the first paragraph giveth, the second paragraph taketh away. Most (experienced) C programmers fully prototype their programs because prototypes improve reliability (and in some cases efficiency). Compatibility with another programming language is not one of the primary motivators. Schildt here gives the impression that, if you were sure you did not need compatibility with C++, perhaps you wouldn't need to pay attention to the previous suggestion that you use prototypes.

We now move onto the discussion of "Old-Style Function Declarations" (pages 168-169).

In the early days of C, prior to the creation of function prototypes, there was still a need to tell the compiler in advance about the return type of a function so that the proper code could be generated when the function was called. (Since sizes of different data types differ, the size of the return type needs to be known prior to a call to a function.) [...]

Nice try, but no. The sizes are not the whole issue; the real point is that calling conventions may vary. Some systems exist where floating point numbers and integers of the same size used different calling conventions. For instance, on some systems, floating point return values would go in a floating-point register, rather than a general-purpose register (or the stack). Schildt is hamstrung here by his assumption that all the world uses a stack for all function parameters and returns.

The example program for this isn't immediately obviously awful:

#include <stdio.h>

  double div(); /* old-style function declaration */

  int main(void)
  {
    printf("%f", div(10.2, 20.0));

    return 0;
  }

  double div(double num, double denom)
  {
    return num / denom;
  }

There are a few issues here. Of course, we have the standard newline issue; Schildt relies on the assumption that the environment doesn't require a trailing newline on output.

However, there are more serious issues. The most obvious is that the name div() is reserved for use by the implementation, which provides a function div() already; Schildt documents this function on page 448 (listed as a "utility" function rather than a "mathematical" function, possibly because the standard uses that description for <stdlib.h>). The names of standard library functions are always reserved for use as identifiers with external linkage (see C99, 7.1.3). What that means is that you cannot declare a function named div() of your own, especially not one with an incompatible type!

Past that, there's a more fundamental issue. Schildt has chosen a case in which the old-style declaration is compatible with the new-style definition. Not all definitions would work that way! For instance, if div() had been declared with arguments of type float, this program would be invalid. As gcc so eloquently puts it, "an argument type that has a default promotion can't match an empty parameter name list declaration". Schildt's explanation is useless:

This old-style function type declaration tells the compiler that div() returns a value of type double. This allows the compiler to correctly generate code for calls to div(). It does not, however, say anything about the parameters to div().

Useless, but worse than that, wrong. The old-style function declaration tells the compiler that the parameters to div() should be subject to a set of promotion rules called the "default argument promotions" (C99 6.5.2, paragraph 6). One of these, for instance, is that objects of type float would be "promoted" to double. Thus, a declaration float mydiv(); would not be compatible with the definition float mydiv(float num, float denom);.

Schildt's focus on the "return type" has blinded him to the fact that, in modern C (since 1989), prototyped functions have often had different conventions for passing parameters, so the types of parameters are not merely a useful convenience, but a necessary piece of information for calling functions.

He also neglects to mention the new rules for variadic functions (like printf()), which cannot be safely called without a full prototype in scope. This is something that could easily affect a reader who has to port or modernize some old code; it should have been mentioned.

On to "Standard Library Function Prototypes" (page 169).

Any standard library function used by your program must be prototyped. To accomplish this, you must include the appropriate header for each library function. All necessary headers are provided by the C compiler. In C, the library headers are (usually) files that use the .h extension.

He gets credit for the "usually". Unfortunately, the rest isn't so good. First, it is not always necessary to prototype these functions. For functions compatible with the default argument promotions, an old-style declaration would suffice. Furthermore, you are not strictly required to include the headers; if you provide a correct external declaration, that's okay too. These are arguably nitpicks, largely because while he's not right that you must include the headers to get these prototypes, it is certainly the case that you should. Still, a reference should be precise.

On to the coverage of "implicit int" (page 170).

The most common use of the implicit int rule was in the return type of functions. Years ago, many (probably most) C programmers took advantage of the rule when creating functions that returned an int result. Thus, years ago a function such as

int f(void) {
  /* ... */
  return 0;
?

would often have been written like this:

f(void) { /* return type int by default */
  /* ... */
  return 0;
}

Not awful, but I think much more common would have been f() {...}; the void type was not added to the language until late enough that I didn't see much use of implicit int in programs that used void. Even back when implicit int was a normal part of the language, many style guides and coding standards mandated the use of explicit types.

He continues:

Remember, the implicit int rule is not supported by C99 or C++. Thus, its use in C89-compatible programs is not recommended. It is best to explicitly specify every type used by your program.

Like the previous example about prototypes, the reason given is superfluous. It was enough, for many years, to observe that implicit int was error-prone; compatibility with newer standards or other languages is not the only reason to avoid the implicit int rule. When the C99 committee decided to remove implicit int, a key component of the discussion was that every implementation anyone could think of already warned people about any use of it, so people were already long out of the habit of using it, before the feature was removed.

On to "Old-Style vs. Modern Function Parameter Declarations" (pages 171-172).

For example, this modern declaration

float f(int a, int b, char ch)
{
  /* ... */
}

will look like this in its old-style form:

float f(a, b, ch)
int a, b;
char ch;
{
  /* ... */
}

Notice that the old-style form allows the declaration of more than one parameter in a list after the type name.

Another opportunity to discuss the default argument promotions, and why these two declarations are actually different, passed up. Nothing exceptionally awful, just sort of clumsy.

And now, the end of the chapter, "The inline Keyword" (page 172).

C99 has added the keyword inline, which applies to functions. It is described fully in Part Two, but a brief description is given here. By preceding a function declaration with inline, you are telling the compiler to optimize calls to the function. Typically, this means that the function's code will be expanded in line, rather than called. However, inline is only a request to the compiler, and can be ignored.

I would have thought a quick mention of some of the implications of inline would have been useful here; maybe just a quick mention that the semantics of inline functions are different, or that, as an example, "An inline definition of a function with external linkage shall not contain a definition of a modifiable object with static storage duration, and shall not contain a reference to an identifier with internal linkage." (C99, 6.7.4, paragraph 3.)

Of course, the clever reader will note that Schildt forward-references a future discussion in which inline is "described fully". That's on pages 282-283. No mention of the various restrictions on inline functions occurs there either; this is a serious liability, as many of them are not immediately obvious to a novice programmer who has tried to declare something inline to "speed it up". Schildt instead talks about arguments being pushed onto the stack and gives some very vague, general, introductory-level advice about the use of inline. A detailed description of the problems is beyond the scope of this section (as it's outside the pages I'm looking at); the point is that the "fully described" does not, in fact, indicate that the crucial and significant limitations that inline imposes on a function are not discussed, and that the statement that inline "can be ignored" is incorrect. There exist functions which require the compiler to emit a diagnostic message if you try to declare them inline.

Page 173 is the title page of Chapter 7. I found no errors on it.

Chapter 7 is on "Structures, Unions, Enumerations, and typedef". It begins with an introduction:

The C language gives you five ways to create a custom data type:

Well, where to start. Just leaping out at me: The term aggregate data type is used to refer both to structures and to arrays. The standard describes arrays and function declarations, along with structures, unions, and enumerations, as derived types. I have never heard the term conglomerate used to refer to structures in C. Structures do not allow the grouping of "variables"; they create a new type of which each instance is an aggregate of several objects (of potentially disparate types, which is why a structure is not just a fancy name for an array). A "variable" is the thing you declare. Grouping variables would be if you could take several separate declarations and then group those things together somehow; you can't. Similarly, unions allow you to access the same hunk of memory as two or more different types of objects, not variables. Schildt's confusion about objects and variables is not new to this chapter, but it really is shown off here.

Pedant Break! In fact, the C standard never defines the noun "variable"; the word is used almost always as an adjective. The word "variables" is used in precisely three places, two of them non-normative footnotes. The phrase "a variable" occurs more often, but nearly always as a modifier, as in "a variable length array type". It occurs in footnote 171 (7.3.9.5) in the context "For a variable z of complex type...". However, on the rare occasions when it is used, it seems to be consistently used in the sense that an object which is declared with a given name is a "variable". When you declare a structure, its members are not objects which have been declared with a given name; the type declaration of the struct gives them names, but the actual declaration of the object gives a name only to the structure as a whole. For an illustrative example, consider int *ip = malloc(sizeof(*ip));. After this declaration, assuming malloc() succeeds, there are two objects, but only one variable. The variable ip is an object of type pointer-to-int, and it contains a pointer to an unnamed object with allocated storage duration, which is of a size (and alignment) suitable for holding an object of type int. However, that object is not usually called a "variable".

So, let's look at the description of structures, starting on page 174.

A structure is a collection of variables referenced under one name, providing a convenient means of keeping related information together. [...]

Same issue as before. Anyway, let's carry on.

[...] The following code fragment shows how to declare a structure that defines the name and address fields. The keyword struct tells the compiler that a structure is being declared.

struct addr
{
  char name[30];
  char street[40];
  char city[20];
  char state[3];
  unsigned long int zip;
};

Notice that the declaration is terminated by a semicolon. This is because a structure declaration is a statement. Also, the structure tag addr identifies this particular data structure and is its type specifier.

Schildt gets credit here for a couple of things. He's provided 3 characters for state, to allow for null termination, and he's chosen a type large enough to hold standard US zip codes. Assuming you're in the US, and don't need a second line in your street address, this is pretty good. I might quibble about "the name and address fields" (emphasis mine), especially because no field named address is defined, but we can generalize to "the fields to contain an address".

The explanation at the end, though, is wrong. A declaration is not a statement (in C; C++ describes declarations as a kind of a statement). Statements can occur only inside compound statements (pairs of {}, possibly with stuff inside them), or as part of a function's definition. Declarations can occur outside of any function. The reason declarations also take semicolons is one of more general consistency. Furthermore, the structure tag is not a type specifier; the type specifier is struct addr, not just addr. (Another thing which would perhaps have been correct had this been a book about C++.)

On to the description on page 175.

Figure 7-1 shows how addr_info appears in memory, assuming 4-byte long integers.

Well, not really. It shows a bunch of boxes of various sizes. He doesn't illustrate any possible padding, or mention padding at all. That would have been nice, especially because the structure in question would indeed have padding on many systems (as it has a 4-byte object at an offset which is not a multiple of 4, assuming no padding between previous components). The index has no reference to "padding", either as its own topic or as a subtopic of "structures". So far as I can tell, he simply never mentions that objects in a structure are not necessarily adjacent! This is a major oversight of something that readers are likely to need to know.

Page 176 continues the illustration of struct.

The general form of a structure declaration is

struct tag {
  type member-name;
  type member-name;
  type member-name;
  .
  .
  .
} structure-variables;

where either tag or structure-variables may be omitted, but not both.

This isn't horribly wrong, but it isn't much good. You could nitpick that the declaration char name[30] does not match this style; after all, there's more stuff after the member name but before the semicolon. For that matter, int *ptr; isn't actually of that form; the * is part of the individual declarator, not part of the type specifier (see C99, 6.7 and 6.7.5). Still, it's a sort of acceptable approximation. However, he neglects to mention that you can declare multiple members of a single type as a single declarator within a structure. Furthermore, it is permissible (according to the grammar) to declare a struct with neither tag nor structure-variables, although doing so is essentially useless.

(Pedantic break: I previously claimed he was wrong to call this a declaration, rather than a definition, but it turns out I was wrong on that; the word "definition" has a fairly specific definition in the C standard, and structure types are not defined, only declared. Only macros, objects, functions, enumeration constants, and typedef names can be "defined".)

The real issue here is simply that this isn't the "general form". It's an oversimplified summary of some common use cases, offered right where he should have been explaining the options available in more depth. A better "general form" might be something like:

struct tag {
  declarations;
} variable-names;

This could then be followed by examples of the sorts of declarations possible.

Enough about how structures are declared; let's look at the discussion of how to use structure members; this starts on page 176, but only gets really interesting on page 177:

In the same fashion, the character array addr_info.name can be used in a call to gets(), as shown here:

gets(addr_info.name);

This passes a character pointer to the start of name.

Since name is a character array, you can access the individual characters of addr_info.name by indexing name. For example you can print the contents of addr_info.name one character at a time by using the following code:

for(t=0; addr_info.name[t]; ++t)
  putchar(addr_info.name[t]);

Use of gets() is, as always, bad; this would be a great opportunity to point out the possibility of buffer overruns. It might be worth saying "pointer to the start of the name member of addr_info."—this would reinforce the point he previously made about each structure variable having its own "copies" of the members.

The example of printing character-by-character would be much improved by pointing out the need to, say, in some way verify that you haven't gone past the end of the object. Of course, that's omitted; Schildt just assumes that the array will be null-terminated.

We get one last code example on page 177:

#include <stdio.h>

int main(void)
{
  struct {
    int a;
    int b;
  } x, y;

  x.a = 10;

  y = x;  /* assign one structure to another */

  printf("%d", y.a);

  return 0;
}

After the assignment, y.a will contain the value 10.

This one is questionable. According to C99 (6.2.6.1, paragraph 6), "the value of a structure or union object is never a trap representation, even though the value of a member of the structure or union may be a trap representation". I am pretty sure the intent is that you are allowed to assign a structure to another structure even if one of its members is uninitialized, but I would in general not recommend relying on this. (This wording is new, and may not have been in the C standard when Schildt's book was written, but I think it reflects the intent of the standard.) On the other hand, even if it's permissible, I would think it would make a better example to show that assignment copies all of the members.

A book that aimed to teach everything you need to know about C, rather than a superficial subset, would also point out the issues with copying structures some of whose members are pointers—many novice programmers are surprised that, after an assignment, both structures then share a single pointed-to object.

So there you have it. An arbitrarily-selected ten page hunk of C:TCR. I would not recommend this book to anyone based on this sample. This section has crucial omissions, misses multiple opportunities to explain crucial things, has examples which work by lucky coincidence (and avoid teaching the reader about real-world cases where a similar example wouldn't work), and contain a number of outright errors. This goes a long way, in my mind, towards establishing that the sorts of complaints leveled against the rest of the book are genuinely typical and representative. The list of miscellaneous pages given above is not a complete or exhaustive list; it is merely a representative sample of what you find opening the book to a random page. Searching through an arbitrarily-selected section doesn't support the occasional claims that most of the book is just fine, and only a few examples are flawed.

Higher level problems

Of course, it can't all be fun and games. There are a number of cases where Schildt makes more fundamental errors; not individual failed code fragments, but a consistent and general mis-explanation of fundamental concepts. I hardly have a complete list available, but here are the ones I spotted easily during a quick review.

global, static, automatic

Schildt very helpfully decides to use terms like "global" and "local" rather than the formal language of the standard. This, it turns out, results in some amazingly bad material. No one is surprised.

On page 26, he tells us:

Unlike local variables, global variables are known throughout the program and may be used by any piece of code. Also, they will hold their value throughout the program's execution.

He's conflating two aspects of file-scope variables; their scope, and their storage duration. This doesn't seem like it'd be a big deal, until he starts talking about things that would affect storage duration or scope. Such as, say, the static keyword.

On page 33-34, things get confusing, in a way that follows directly from his poor (or at least, incorrect for C) definitions of "local" and "global". The C type-qualifier static is, well, confusing. It's confusing because its semantics are sort of non-obvious and differ with context. Unfortunately, Schildt only makes it worse. He uses the term "local variable" to refer to what C would call an "automatic" variable, normally. Then he explains that, if you qualify it with static, a local variable has permanent storage reserved for it. Great. Then we get to the incoherent heading static Global Variables (page 34):

Applying the specifier static to a global variable instructs the compiler to create a global variable known only to the file in which it is declared.

Come and see the rare and amazing wild oxymoron in its native habitat! Bring your kids! It's a global variable which isn't visible anywhere else.

The problem here is that he should have stuck with the way the language is actually defined—distinguishing between storage duration, scope, and linkage. That would make it easy to explain that, for variables declared in a restricted scope, static imposes the "static storage duration", and that for variables declared outside of any function, which necessarily have static storage duration, static restricts linkage. It's confusing, sure. The weirdness of the meaning of static has troubled writers trying to explain C for a long time. However, that's no excuse for nonsense such as a "global variable known only to the file in which it is declared."

("Linkage", you ask? Scope is whether something is defined only inside a particular function or block, or visible to all functions. Linkage is whether it's visible only within a given translation unit or to the whole program. Thanks to an eagle-eyed reader for spotting that I'd mistakenly called it "scope".)

More generally, the confusion results in additional issues elsewhere; for instance, on page 27-28, Schildt claims that:

File scope identifiers are visible throughout the entire file. Variables that have file scope are global.

No, they're not. They're visible within that file, and could be linked to from another file that provided a declaration for them. This is the kind of oversimplification that results in people being unable to figure out why their code doesn't compile.

For another example, describing the C99 variable-length array feature, he says that VLAs can only be local variables. No, they can only be automatic variables; you can't declare a static local VLA.

Memory layout

This one has been a bit controversial, just because, back when the 2nd and 3rd edition of C:TCR came out, a large number of users would have been on MS-DOS systems, on which the description given was moderately accurate. However, even in the 3rd edition, the book proudly claims to be usable in any environment, not just DOS or Windows.

The fun starts on page 13:

A compiled C program creates and uses four logically distinct regions of memory. The first region is the memory that actually holds the program's executable code. The next region is memory where global variables are stored. The remaining two regions are the stack and the heap. The stack is used for a great many things while your program executes. It holds the return addresses of function calls, arguments to functions, and local variables. It will also save the current state of the CPU. The heap is a region of free memory that your program can use via C's dynamic memory allocation functions.

Although the exact physical layout of each of the four regions of memory differs among CPU types and C implementations, the diagram in Figure 1-2 shows conceptually how your C programs appear in memory.

He shows, of course, the classic DOS memory map; program code at the bottom, global variables above that, heap above that growing up, stack at the top growing down. That's not an unheard of implementation, but by the time this book came out (2000 or so), it was actually sort of unusual; most operating systems I know of separate stack and heap completely, such that they can't possibly interact.

I can't tell you what he means by "save the current state of the CPU". Maybe he's talking about register saving for task switching in multitasking environments? I'm not sure that's always done on the stack. It's vague enough that I can't tell you whether it's wrong or irrelevant, though.

In the third edition, there was a section on "Stack-Heap Collisions" (page 743). In the fourth edition, it's been corrected to a somewhat improved piece about stack overruns. However, it's still got issues.

All C compilers use the stack to store local variables, return addresses, and parameters passed to functions. [...]

This is a plausible first approximation. It should not be in Chapter 28, purporting to cover "Efficiency, Porting, and Debugging", and clearly well past the "tutorial" section of the material.

Quite simply, not every compiler even has a "stack". Some systems don't really have any such feature. Every compiler for C has some kind of mechanism for handling function calls, but that doesn't mean it's a stack. More significantly, it is quite common for function parameters or local variables not to be stored on any "stack", but to be stored in CPU registers. That distinction can matter a lot, and should have been covered, rather than hand-waved away.

Several people have pointed out that, conceptually, any design sufficiently powerful to support recursion will necessarily have a data structure which is "a stack" in a more general computer science sense; a data structure which allows efficient last-in, first-out access, so the most recently stored data can be retrieved fastest, corresponding nicely to the way in which function arguments and return values would typically be used. That's true. However, there is a difference between "a stack" and "the stack"; "a stack" could be a general computer science term, but would specify nothing about the representation or layout of the data stored. Schildt refers, quite clearly, to a contiguous region of memory, in which each function's data is adjacent to the memory for the function calling it, or the functions it calls. This is not necessarily true, nor is it useful. Some systems have handled recursive function calls by dynamic allocation, meaning that memory used for function calls could later be reused by other parts of the program, and could be interspersed with dynamically allocated memory.

The issue here isn't that most readers are likely to work on such a system (I haven't, myself, but I have been told that some mainframe systems use such designs); it's that the "stack" concept, while it could be a useful analogy, is better used as an analogy to explain the effects of function calls, than offered as a flatly literal explanation of how things work. A good teaching book ought to explain in some detail what the implications of its description of this are; Schildt basically ignores them, although at least he's removed the parts where he described them and they were just plain wrong.

This is a case where I do think the fourth edition has improved noticeably from the third edition, but it's still not a good fit at all for the material. It is also a case where a specific complaint from my previous page has been addressed largely by removing things or just avoiding a difficult or interesting problem.

The claim about global variables is mostly just confused. Really, it's usually static variables (including globals, but also including static variables declared in functions) that get a block of space, and in some cases, there's a distinction between the block for variables initialized with non-zero values, and those initialized with zeroes. Similarly, "the heap" is not necessarily a single region.

It would have been better to cover this in terms of C's actual storage rules, talking about static, automatic, and allocated storage durations.

Newlines and Output

Schildt frequently omits trailing newlines on output. This is a common habit among DOS programmers, as the MS-DOS command prompt apparently used to start with a newline, so that programs which had already produced a newline were followed by a blank line.

However, it's not portable, in two ways. The first is that, in general, C does not define the behavior of output to a text stream which is never terminated by a newline. ("Whether the last line requires a terminating new-line character is implementation-defined."; C99, 7.19.2, paragraph 2.)

The second is that streams may be buffered -- output sent to a stream may not be delivered to the host environment until flushed (using fflush()) or until some other event occurs. There are three levels of buffering; unbuffered (self-explanatory), line-buffered (data are buffered until a new-line character is encountered), or fully buffered (data are buffered until a certain size of block is filled, such as "every 8,192 bytes".) The standard input and output streams are very often line-buffered, although the standard does not require this; all it says that standard input and standard output are "fully buffered if and only if the stream can be determined not to refer to an interactive device." (C99, 7.19.3, paragraph 7).

What this means is that, on many systems, printing a prompt which does not end with a newline produces no visible output until you either explicitly flush the stream or send something ending in a newline. (There may be other circumstances under which output is sent, but only those are guaranteed to deliver the output to the host environment.) One common exception is that some systems will automatically flush output streams when waiting for input; on such systems, the prompt would actually get displayed. This behavior is strongly recommended by C99. We could imagine that Schildt was aware of this, and relying on it, but the examples in the 3rd edition book did the same thing, back when it resulted in no prompt being displayed on most systems. (See C99, 7.19.3, paragraph 3, and 5.1.2.3, paragraph 5.)

This may seem fairly trivial, but in many cases the net result is that a newbie programmer trying one of the sample programs would, instead of getting a prompt, see no output at all. This kind of thing sometimes surprises users. It could easily have been avoided. Schildt does sometimes attach newlines to output, it's just not consistent. Consistency would have been useful, here.

One other issue applies to the case where a program's output ends without a newline. The C standard leaves it implementation-defined whether a trailing newline is required on a text stream. However, even on systems where output is produced correctly, it may be hard to read. For instance, most Unix-like systems do not display a newline before their prompts; a program which prints "3" may result in the prompt 3$, which isn't as readable as a 3 on a line by itself would have been.

Handling EOF

Throughout these books, in both editions, Schildt uses feof() in ways which are either incorrect or merely extremely convoluted and inefficient.

In C, the general convention is that I/O operations can indicate failure. Once an I/O operation has failed, you can use ferror() and feof() to determine whether the file ended or something else went wrong. In particular, the feof() function does not indicate that the end-of-file has been reached until after the first read which has failed as a result of being at end-of-file.

Here's what a conventional C loop for processing a file, character by character, looks like:

int c;
while ((c = getchar()) != EOF) {
  /* do stuff with c */
}

Schildt rarely uses this. Here's a sample loop he wrote (page 236-237):

char ch;
[...]
do {
  ch = getchar();
  putc(ch, fp);
} while (ch != '$');

This is very, very, wrong.

  1. getchar() returns an int, not a char. This is so that the special sentinel value EOF can have a value which can never be mistaken for any other returned character.
  2. If the file ends, or an input error occurs, getchar() returns EOF, which is then converted to a char value; this could be out of the representable range, which could in theory produce a signal, although I have yet to see an implementation where this would come up. Most likely, the call to putc() gets whatever character value you get by converting EOF to a char. This is usually wrong.
  3. If the file ends, it will keep looping forever, printing those probably-meaningless characters, because it will never find a $.

It's not as though this was simpler to show than the native C loop would have been.

He does actually show the correct loop on the next page (238)... almost.

int main(int argc, char *argv[])
{
  FILE *fp;
  char ch;
  [...]
  ch = getc(fp);   /* read one character */
  
  while (ch!=EOF) {
    putchar(ch);  /* print on screen */
    ch = getc(fp);
  }
  
  fclose(fp);
  
  return 0;
}

That's closer. However, it's still wrong. On a system where char is unsigned, this loops forever printing the character value you would get by converting EOF (a negative value) to char. On other systems, there may well exist a legitimate character which the program will mistake for EOF, because the conversion from the int returned by getc() to char loses some data. The problem here, again, is the use of a char to hold the return value.

But he's not done innovating! Here's yet another variant (page 239).

while(!feof(in)) {
  ch = getc(in);
  if(!feof(in)) putc(ch, out);
}

This one will actually work, as long as there are no input errors before you reach the end of the file, but it's gratuitously convoluted.

By the way, if you look closely at these examples, you'll note that Schildt never picked a consistent answer as to whether the leading parenthesis for an if or while condition ought to be adjacent to the keyword or separated from it by a space. A minor issue, to be sure, but it doesn't speak to much care for quality; coding style should be consistent in a book with only one author.

But back to the fun part. Let's make it all the way to page 241, and see what new madness awaits us.

char str[80];
[...]
while(!feof(fp)) {
  fgets(str, 79, fp);
  printf(str);
}

Note the superstitious passing of 79 to fgets() with an 80-character buffer. This is not needed, because fgets() reads at most one character less than the specified size into the buffer, then null-terminates the string. Schildt even gets this right in his description of fgets() (page 317).

At end of file, fgets() fails, returning a NULL value which is ignored, and leaving the string untouched. The string is printed automatically, so the last line in the file is printed twice. Of course, we see here another of Schildt's incompetent mannerisms; he passes arbitrary data to printf() without specifying his own format string. This is prone to crashing.

But, I have to say, this example does something practically unique in this barren wasteland of a book -- it correctly uses fgets() to limit the size of input strings. Too bad they were all written to the file after being read in by an unchecked gets().

Digging deeper on EOF

Nearly every input loop in the book is incorrect, because Schildt didn't actually understand how EOF works in C. (He did address one of the issues, partially; see Schildt and C:TCN for some more information.)

You can see this again in the summary (page 238) of feof():

As just described, getc() returns EOF when the end of the file has been encountered. However, testing the value returned by getc() may not be the best way to determine when you have arrived at the end of a file. First, the C file system can operate on both text and binary files. When a file is opened for binary input, an integer value that will test equal to EOF may be read. This would cause the input routine to indicate an end-of-file condition even though the physical end of the file had not been reached. [...]

This is wrong in the same two major ways. First, outside of an extreme special case (no one has yet presented a single example of such a system, and consensus seems to be that it would be a flawed implementation), there is no character which, converted to unsigned char and then further converted to int, compares equal to EOF. That is because EOF is a negative value, and on any system where int is larger than unsigned char (nearly everything) or where the range of valid character values is smaller than the range of int (everything else I've ever heard of), it will never, ever, compare equal to a genuine character value.

The second, more subtle, is the red herring of mentioning binary streams. There is no reason such characters, if they existed, couldn't occur in text streams. Indeed, in the most common case, there are many systems where the value you get by converting -1 to unsigned char is a valid text character.

In short, Schildt makes it very clear that he does not understand how the character input functions work, or how EOF works. He seems to think that text mode and binary mode are properties of files, and that binary files can contain arbitrary values, but text files can't. On most systems, however, both text and binary files can contain arbitrary values; the difference between text and binary mode is a difference primarily in how the C library interacts with a given stream, not necessarily in the underlying file.

If you wanted to show that Schildt did not understand the getc() and getchar() functions, it would be enough to point out that he always (or at least, in every case I've yet seen) stores the result in an object of type char. The consistently wrong claims about detecting end-of-file using feof() because EOF is a "valid integer value" merely emphasize that he simply doesn't get it. A responsible writer, upon fixing the problem in the description of getchar() (see the discussion of the Page 314 error), would have fixed the other related claims and updated the sample programs to correctly use an object of type int. Schildt went for the shortest path; he fixed the one single copy that was explicitly named, and did nothing about the fundamental misunderstanding.

Herbert Schildt and C: The Complete Nonsense

While working on this, I noticed a couple of things. The first was that the 4th edition of C:TCR appeared to be substantially improved—most of my complaints had been addressed. The second was that looking through the book at random, it still appeared to be generally awful.

Having researched this further, I believe I can state with reasonable confidence that this is not a coincidence; rather, it appears that Schildt specifically addressed the things complained about in the original C:TCN, but did not in any way generalize from them. The "smoking gun" was Page 314, but several of the others are instructive.

The original page (now at http://www.seebs.net/c/c_tcn3e.html) has been preserved (unaltered except for an introductory section at the top) in order to show these issues; you can browse along there. Comments there which don't seem to have been addressed are omitted here; we can assume that Schildt didn't agree.

C:TCN, "Page 53"

For an instructive example, let us look at the famous "page 53" code fragment—an exceptionally poor example that was widely panned as being illustrative of the horror of the 3rd edition of C:TCR.

/* Write 6 integers to a disk file. */
void put_rec(int rec[6], FILE *fp)
{
  int len;

  len = fwrite(rec, sizeof rec, 1, fp);
  if(len != 1) printf("write error");
}

Coded as shown, put_rec() compiles and runs correctly on any computer, no matter how many bytes are in an integer.

As you might guess, this is incorrect. There are a number of flaws. Some are mere nitpicks; there is nothing requiring fp to be a "disk file", it could just as well be stdout, or any other stream. However, that's not the big problem. The big problem is that, in C, there is really no such thing as an array argument to a function; instead, there's a pointer parameter which has been written as though it were an array. The above definition could have started as void put_rec(int *rec, FILE *fp) with no change in its meaning.

Which means that sizeof rec is the same as sizeof(int *), not the same as sizeof(int [6]). Which means that the example is completely, totally, wrong. Of course, it's better than it was in the 2nd edition of the book; in the 2nd edition, the test used <> instead of !=, which wouldn't even compile.

In the fourth edition, though, the example has been "corrected":

/* Write 6 integers to a disk file. */
void put_rec(int rec[6], FILE *fp)
{
  int len;

  len = fwrite(rec, sizeof(int)*6, 1, fp);
  if(len != 1) printf("Write Error");
}

Coded as shown, put_rec() compiles and runs correctly in any environment, including those that use 16- and 32-bit integers.

So, is it fixed? Well, it now does roughly what is described. However, this is an extremely bad illustration of sizeof. There's nothing to explain the semantics of fwrite here, but to put it bluntly, the conventional way to write that would have been fwrite(rec, sizeof(int), 6, fp);, to distinguish between the size of the object (which is an int) and the number of objects (6). This is better for error-recovery; writing a single large block of memory, if fwrite() managed to write 5 integers, but not the 6th, it would return 0, and you wouldn't know that anything had been written successfully. A clearly buggy example, which illustrated only a misunderstanding of C, has been replaced with a boring and useless example, which doesn't really illustrate anything.

Better would have been to declare the parameter as int *rec, and use sizeof(*rec) rather than sizeof(int); in real-world code, multiple copies of the same piece of information (such as the type of the members of rec) tend to lead to bugs.

That an example for something so fundamental would have gone through at least two revisions (I have no idea what it looked like in the 1st edition, if it even existed then) is a bit disturbing. That the first one wouldn't even compile on any C compiler, and the second was completely wrong, and only the third even does roughly what it says it does, is extremely disturbing.

The other comment C:TCN made about Page 53 (pointing out the use of "%f" as a format string for sizeof(float)) was partially addressed; the new code uses "%d ". He has not addressed the observation that it is incorrect to use "%d" for values of type size_t, probably because he tried it on a little-endian machine (or one where size_t and int have compatible size and representation, for reasonably-small values), and it appeared to work.

C:TCN, "Page 59"

The 3rd edition of C:TCR claimed that the shorthand of writing x = x+10 as x += 10 applied to "all the binary operators in C (those that require two operands). I pointed out that this was not true of && or ||, as well as some other cases (pedantic details to follow). In the 4th edition, this section has been removed entirely; there is no longer anything after the "Spacing and Parentheses" header (page 61).

Pedantry Break! I listed the structure member operator . as a binary operator in the previous incarnation of this page; there is some dispute as to whether this is accurately described as a "binary operator". The standard itself does refer to the structure member name as "the second operand", but I now feel that, regardless, the term is ill-chosen. Nonetheless, the boolean logic operators (&& and ||) remain clear examples. The statement was wrong, however, it would not have been hard to fix it. I'd probably have said "most of C's binary operators", or possibly "all of..." if I'd been writing a book, and thus inclined to spend the time reading through the complete list to verify that.

C:TCN, "Page 131"

Schildt did indeed fix this one. Interestingly, he also fixed another point I didn't mention. In the 3rd edition, C:TCR said (page 131, of course):

Local variables use the stack. [...]

Memory allocated by C's dynamic allocation functions is obtained from the heap—the region of free memory that lies between your program and its permanent storage area and the stack.

I pointed out that there is no such layout mandated, but didn't mention the subtle error in the first sentence quoted above. In the 4th edition (Page 138), he now says:

Nonstatic, local variables use the stack. [...]

Memory allocated by C's dynamic allocation functions is obtained from the heap. The heap is free memory that is not used by your program, the operating system, or any other program running in the computer.

This is substantially improved. The assumption that there is "the stack" is still wrong (as is the assertion that non-static local variables necessarily use it; in many implementations, such variables are quite likely to be stored in registers only, with no physical storage in memory reserved for them), but he corrected an error I missed.

Of course, the description of the heap is still wrong. Memory you have allocated using malloc() is still usually considered "part of the heap", even though you are using it. Some systems allocate a fixed block of memory to a program at startup, which that program can use as a heap, but which is not a general free memory pool. In short, the description is still wrong, merely improved.

C:TCN, "Page 163"

Ahh, the infamous void main. Schildt believed, erroneously, that it was generally permissible to declare the main() function as returning void to avoid returning a value. This was a common extension, which usually resulted in garbage values being returned to the calling environment, but some systems would magically act as though 0 had been returned.

In the 4th edition (page 164), he has removed the claim, although he now states:

If main() does not explicitly return a value, the value passed to the calling process is technically undefined. In practice, most C compilers return 0, but do not rely on this if portability is a concern.

A well-stated piece. Unfortunately, while covering C99, Schildt apparently missed that the C99 specification explicitly requires that the compiler make it look as though main() returned 0 (but see below for a pedantic interlude about this) if execution reaches the end of the function without a return or exit(). On the other hand, many compilers still default to C89 mode, and do nothing of the sort. He also failed to take this opportunity to discuss the question of what the supported values are; specifically, that 0 and the predefined constant EXIT_SUCCESS indicate successful exit, and the predefined constant EXIT_FAILURE indicates a failure. No other values may be portably used, though many systems establish additional conventions. In particular, exit(1) indicates success on at least one operating system, even though it's often used as an error indicator in Unix-like systems. Still, somewhat improved.

Pedantry Break! C99 sort of allows for non-int return types for main(), in that it is implementation-defined whether other forms of main() are accepted. If other forms are accepted, and they have a return type not compatible with int, the returned value if you fall off the end of main() is unspecified.

C:TCN, "Page 247"

This one's straight-up fixed. The fseek() example used an arbitrary value as an argument to fseek() on a text stream, with the mode SEEK_SET, which is undefined behavior. In the new edition, he correctly changed the open mode to "rb", making the behavior well-defined. (However, he still uses exit(1) to get out of the program; see the preceding discussion on "Page 163". To be fair, though, I didn't catch that either.)

C:TCN, "Page 253"

The entire section on the Unix-like I/O system is removed. I don't object to this, and I don't have any reason to believe it was related to the criticism in C:TCN.

C:TCN, "Page 283"

The example (which was useful, except for the error) has been removed. Correcting the error would have made this a good example to help users understand what prototypes are for. Removing it makes the claims about headers providing prototypes much less useful to the reader.

C:TCN, "Page 284"

Schildt fixed the issue (header file names given in all caps, when the standard uses lowercase).

C:TCN, "Page 314"

Let's start with the 4th edition's description of getc(), on page 329:

The getc() function returns the next character from the specified input stream, and increments the file position indicator. The character is read as an unsigned char that is converted to an integer.

If the end of the file is reached, getc() returns EOF. However, since EOF is a valid integer value, you must use feof() to check for the end-of-file condition. If getc() encounters an error, EOF is also returned. If working with binary files, you must use ferror() to check for file errors.

[...]

The following program reads and displays the contents of a text file:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  FILE *fp;
  char ch;
  
  if((fp=fopen(argv[1], "r"))==NULL) {
    printf("Cannot open file.\n");
    exit(1);
  }
  
  while((ch=getc(fp))!=EOF) {
    printf("%c", ch);
  }
  
  fclose(fp);
  
  return 0;
}

You'll note that he still declares ch as a char, meaning that the test is wrong in the way he's described. The problem is that he's just plain wrong. The key is that the character is converted to unsigned char and then to int. On most systems, this guarantees a positive value—and EOF is a negative value (there are theoretical exceptions, which I cover at the end of this subsection). There is no possibility of confusion if you use the right type for the return of getc().

Note that the comment about "binary" files is also wrong; this has nothing to do with binary-mode files. His sample program can fail with text-mode files too. There may well exist values which, converted to char, compare equal to EOF. At least, there may if char is signed. If char is unsigned, it is impossible for this loop to terminate, because no unsigned char will compare equal to EOF (but see the upcoming Pedantry Break in the section on the "Page 348" comments).

Finally, last but not least, note that he does not check whether argc is at least 2 before trying to access argv[1], and he uses plain printf() to display an error message, rather than taking the opportunity to show the reader what stderr is for.

Here's where it gets interesting. That same warning is present in the 3rd edition, for both getc() (page 312) and getchar() (page 314). In the 4th edition, it's been removed from getchar() but not from getc(). So what, you ask? Well. The previous version of this document pointed out the error on page 314, but didn't mention the identical error on page 312. A suspicious reader might note that it's conspicuous that most of the errors I pointed out on that page got fixed, but precisely identical wording and errors even two pages previously did not.

In fact, there are many more examples like this.

C:TCN, "Page 333"

Schildt fixed the error (missing single quotes around a character constant). With the wisdom of accumulated years of debugging, I consider it a pretty minor typo, but I do wish the 3rd edition had been better proofread. Glad to see it fixed, though.

C:TCN, "Page 348"

Another day, another program which uses something other than EOF as a sentinel value and will run forever if it runs out of input. In the previous edition, this program used ' ' (a character constant space) as a sentinel value. I complained about it. Here's the "fixed" version, from page 363 of the 4th edition:

This program checks each character read from stdin and reports all uppercase letters:

#include <ctype.h>
#include <stdio.h>

int main(void)
{
  char ch;
  
  for(;;) {
    ch = getchar();
    if(ch == '.') break;
    if(isupper(ch)) printf("%c is uppercase\n", ch);
  }
  
  return 0;
}

Try it yourself. Give it the input "HELLO. MY NAME IS BOB." and see whether it reports all the uppercase letters. The new sentinel isn't really a big improvement, and the loop is still broken (see Handling EOF for more on this.)

There's a secondary issue here, which is that, on systems where char is signed, it is undefined behavior to pass a negative-value char (with a value other than EOF, if that happens to be in the range of char) to any of the is*(). They accept values in precisely the form that getchar() returns them—converted to unsigned char and then to int. Converting back to char can break things. This is certainly a problematic design in C, but this is the sort of thing that a "Complete Reference" ought to point out about these functions.

Of course, he changed the sentinel in everything, so now the 4th edition has (page 361):

for(;;) {
  ch = getchar();
  if(ispunct(ch)) printf("%c is punctuation\n", ch);
  if(ch == '.') break;
}

Gosh, Mr. Schildt! It sure would be cool if the language had provided some kind of magic sentinel value you could use to always determine that input was done and there were no more characters! It's also interesting to note that the many such sample programs (each of the is*() functions has one) aren't quite consistent; for instance, the one for isdigit checks whether ch is a period before checking whether it's a digit.

He also wins a prize for using the same loop, complete with printing the character using %c, for iscntrl(), which by definition indicates that a character is not a printing character (C99, section 7.4, paragraph 3: "the term control character refers to a member of a locale-specific set of characters that are not printing characters."). The whole section is an embarrassment. He has addressed the complaint in a purely literal fashion, in that the code no longer uses a space as a sentinel value, but he hasn't addressed the substance, which is that the loops should have been done using EOF to terminate.

Pedantry Break! It is in theory possible for a machine to have char be larger than 8 bits, though it's not very common for hosted environments (basically, desktop computers and the like -- the only environments required to even have the <stdio.h> functions to begin with). On such a machine, contrary to many people's expectations, sizeof(char) is still 1; what changes is the value CHAR_BIT. The relevance of this is that, on such a machine, it is possible for int to still have its required range, but to be the same size as a char. Thus, on such a machine, there might exist at least one value of unsigned char such that, converted to int, it was a negative value, and compared equal to EOF. However, to the best of my knowledge, all such systems that have provided the getchar() function and related functions have ensured that, in fact, the EOF value is distinct from any value which can actually be read from a file. For instance, char could be a 32-bit type, but you would still only see values 0-255 when reading "characters" from a file.

C:TCN, "Page 735"

This one is just gob-smackingly bad. Schildt has no concept of what parentheses do. He writes, on page 57:

Parentheses are operators that increase the precedence of the operations inside them.

The really great material on this was in the 3rd edition. It's missing in the fourth. While normally I would frown on simply removing material, it was bad enough that, yes, it is better to say nothing at all.

Just because it's such a beautiful example, though, I present Schildt's masterfully incoherent claim about Order-of-Evaluation errors (C:TCR, 3rd Edition, page 735).

The way an order-of-evaluation error usually occurs is through changes to an existing statement. For example, you may enter the statement

x = *p++;

which assigns the value pointed to by p to x and then increments the pointer p. Say, however, that later you decide that x really needs the value pointed to by p squared. To do this, you try

x = *p++ * (*p);

However, this can't work because p has already been incremented. The proper solution is to write

x = *p * (*p++);

This is probably the most totally wrong thing I've ever seen in any book on C. Schildt even knows better; he correctly states, on page 58 (4th edition) or page 56 (3rd edition):

This leaves the compiler free to rearrange an expression to produce more optimal code. However, it also means that your code should never rely on the order in which subexpressions are evaluated. For example, the expression

x = f1() + f2();

does not ensure that f1() will be called before f2().

(That's the 4th edition version; the 3rd edition is substantively identical.)

So, even back in the 3rd edition days, Schildt knew that order of evaluation is not guaranteed... But then wrote the above nonsense. In the 4th edition, the example for "Order-of-Evaluation Errors" is now an unexceptional example, which doesn't point out the much, much, more common problem of the sort of code he illustrated (incorrectly) in the previous edition. The new version (page 700):

The way an order-of-evaluation error usually occurs is through changes to an existing statement sequence. For example, when optimizing a piece of code, you might change the sequence

/* original code */
x = a + b;
a = a + 1;

into the following:

/* "improved" code -- wrong! */
x = ++a + b;

The trouble is that the two code fragments do not produce the same results. The reason is that the second way increments a before it is added to b. This was not the case in the original code!

This is not an atypical response; on the rare occasions when Schildt seems to have accepted that he made a mistake, he seems to be unwilling to genuinely correct it and produce a good example, but rather, takes the shortest path to something that won't get those complaints anymore, even if it's no longer remotely useful for teaching C.

Interestingly, this looks to be one of the cases where the exact thing complained about in my older complaints about C:TCR has been addressed, but only in effect by removing the affected code. The original point was to address difficulties that could occur when differences in the order in which parts of an expression were evaluated could cause problems. The new example is in no way an illustration of an order-of-evaluation problem within an expression.

Beginning programmers often find the order of evaluation of expression components confusing. Schildt had a great opportunity here to explain exactly when C does, and does not, specify the order of evaluation of components of an expression, and how to spot possible problems caused by relying on a particular ordering. Instead, apparently not understanding the rules, he simply avoided the question.

While it's nice that the 4th edition is less wrong about this than the 3rd, it would have been nicer still to have some real discussion of these very significant issues.

Objections

Some correspondants have raised various objections about the previous version of this document, and I think they deserve some consideration and time.

Nitpicking

Some have argued that the previous version of this document picked on way too many issues which were trivial, unimportant, or disputed. For instance, Schildt's claim that numbers are "in general" represented using two's complement. The phrase "in general" is sometimes used in English to indicate that something is a reliable or safe assumption, and otherwise used specifically to call out the existence of exceptions. I interpreted the phrase more in the former sense; I am not sure whether this was right or not, but I still feel that the explanation is at most unnecessary. The right time to talk about representations would be while discussing bitwise operators, where it's reasonable to mention the variances and common implementation choices. (Interestingly, Schildt got that wrong too; see Page 51.)

Typos

Some of the errors have been dismissed as "typos". This is quite possibly true. However, usually, when there are typos in a book, one would expect to see some errata. (I'm aware that, as of this writing, some of the errata for my own book never seem to have made it through the publisher; I'm going to go submit them myself, since I still have the list floating around.) However, others are harder to dismiss as typos. Worse, typos of the sorts in question probably ought to be getting caught during technical review or proofreading; it would be understandable if there were only a few, but there are comparatively many.

Qualifications

One occasional complaint is that I am somehow "unqualified" to raise these criticisms. This is nonsense. Not because of any particular qualifications (though I am apparently now generally regarded as being qualified to comment on writing about C), but because it's a pure argumentum ad hominem; it doesn't matter who raised the criticisms, it matters whether they are correct or not. As demonstrated by the fact that Schildt has indeed corrected many of them, we can reasonably infer that even he grants that they were correct. The qualifications don't really matter at that point; what matters is whether Schildt's book is in fact accurate or inaccurate.

Seebs makes mistakes

Another objection I've seen is that, since I make mistakes, I should not criticize Schildt. Again, this is ridiculous. The question of whether these criticisms are accurate stands on its own merit. Yes, I make mistakes. Lots of them. I'm very error-prone, and will get even trivial things wrong on the first pass, often enough for it to be an issue. I address this by carefully testing code before putting it into production, using a code review process, sending my writing out for technical review, and otherwise taking steps to ensure that the final product is correct, even if the first draft was not. Schildt, by contrast, is on the fourth edition of a book, published by people who presumably have the resources to hire a technical reviewer, and yet we see no errata, no acknowledgements of errors, and very few improvements even when there are unambiguous mistakes in his work.

It's a tutorial, not a reference

I've heard a couple of times the assertion that the book is "more a tutorial than a reference". This is not particularly true, honestly; a large portion of the material in the book consists of lists of functions with descriptions of their semantics and possibly tiny sample code, but without any real explanation of how you would use them. However, even if it were true, it does not seem that it would change anything. C is not a language which rewards casual approximations of correctness. If you wish to learn C, you will be best served by learning it properly, including all the acknowledged quirks, design weaknesses, and other limitations of the language. Trying to "simplify" C to teach it produces victims who cannot work effectively in C. A good tutorial might present simplifications, but would identify them as such and correct them later, rather than presenting them as the real and whole truth. Furthermore, as revealed in the in-depth look, the book is a very poor tutorial, frequently omitting key points or concepts.

Summary

I wish the examples above had been more carefully cherry-picked than they were, but really, the examples here appear to be pretty representative. The "pick a page" section consists of things obtained by flipping the book to a random page. I did, once, flip to a random page and not find any errors. Yes, exactly once. The whole book is like this. The explanatory material is garbage. The explanations of some concepts are exceptionally bad, and it seems very clear that Schildt simply does not understand the material. This isn't just typos; this is genuine failure to understand what's going on.

14 years ago, I wrote this summary:

C: The Complete Reference is a popular programming book, marred only by the fact that it is largely tripe. Herbert Schildt has a knack for clear, readable text, describing a language subtly but quite definitely different from C.

With fourteen years' more experience, including most of a decade of active participation in C standardization, thousands of lines of code written, and a few years of work as a professional writer, I still stand by the evaluation of the book. I do not so much stand by the claim that Schildt's writing is "clear"; readable, yes, clear, not so much. The article itself was not particularly well-written, but the ultimate analysis of Schildt's writing was spot-on. If anything, I know enough more to spot a lot more of the subtle ways in which his sample programs are likely to screw over any poor fool who tries to learn to program from his books.

This is not a "retraction" of the previous page; it's a follow-up and update. That page was certainly not very well-written, and some of the complaints in it may have been a little too nitpicky. However, several of them remain solid, clear criticisms of shoddy workmanship and poor writing. Furthermore, a careful study of the 4th edition of Schildt's C: The Complete Reference reveals that, while he may have fixed a few problems after being called on them, he continues to be unwilling or unable to correctly explain many of the basics of effective and correct C programming.

One thing that really stands out after looking at ten consecutive pages in detail is that prior criticisms of Schildt's writing have focused mostly on what he says that is wrong. However, the things he fails to cover are also quite significant. This undermines substantially the widespread claim that a few errors are livable in a book which does a good job of teaching concepts; Schildt's writing omits fundamental concepts and information, and does a poor job of explaining others.

Acknowledgements

A number of people contributed to the current revision of this page. For inspiration, I must of course credit Edward Nilges, whose tireless crusade against the deficiencies of the previous version made it clear that a more complete treatment was needed. Malcolm McLean made the interesting key observation that it was possible to have a bad critique of a bad book. Many participants in comp.lang.c contributed examples of errors, analysis of some of the code fragments, and comparisons with the 2nd edition of C:TCR.

Several people have helped by contributing analysis and technical review some of this material. Special thanks to der Mouse and to Keith Thompson, both of whom found any number of mistakes or poor choices of wording in my analysis.

It's entirely possible that I have made some mistakes in this page; feel free to let me know of any. To the best of my knowledge, any typos in the quoted code fragments or text examples which I didn't comment on are my own; while I am very disappointed with the quality of the code and explanatory text, the textual proofreading in C:TCR appears to be quite good.