Many implementations of the C runtime environment
(most notably the UNIX operating system)
provide, aside from the standard I/O library
a set of unbuffered I/O services
The Committee has decided not to standardize the latter set
A suggested semantics for these functions in the UNIX world may be found in the emerging IEEE P1003 standard. The standard I/O library functions use a file pointer for referring to the desired I/O stream. The unbuffered I/O services use a file descriptor (a small integer) to refer to the desired I/O stream.
Due to weak implementations of the standard I/O library,
many implementors have assumed
that the standard I/O library was used for small records
and that the unbuffered I/O library was used for large records.
However, a good implementation of the standard I/O library can match
the performance of the unbuffered services on large records.
The user also has the capability of tuning the performance of the standard
I/O library (with
setvbuf) to suit the application.
Some subtle differences between the two sets of services can make the implementation of the unbuffered I/O services difficult:
are enumerations of the third argument to
a function adopted from UNIX System V.
have been moved to
from a header specified in the Base Document and not retained in the Standard.
are added environmental limits of some interest to
programs that manipulate multiple temporary files.
FILENAME_MAX is provided so that buffers to hold file
names can be conveniently declared. If the target system supports
arbitrarily long filenames, the implementor should provide some
reasonable value (80?, 255?, 509?) rather than something unusable
C inherited its notion of text streams from the UNIX environment in which it was born. Having each line delimited by a single new-line character, regardless of the characteristics of the actual terminal, supported a simple model of text as a sort of arbitrary length scroll or ``galley.'' Having a channel that is ``transparent'' (no file structure or reserved data encodings) eliminated the need for a distinction between text and binary streams.
Many other environments have different properties, however. If a program written in C is to produce a text file digestible by other programs, by text editors in particular, it must conform to the text formatting conventions of that environment.
The I/O facilities defined by the Standard are both more complex and more restrictive than the ancestral I/O facilities of UNIX. This is justified on pragmatic grounds: most of the differences, restrictions and omissions exist to permit C I/O implementations in environments which differ from the UNIX I/O model.
Troublesome aspects of the stream concept include:
Some environments represent text lines as blank-filled fixed-length records. Thus the Standard specifies that it is implementation-defined whether trailing blanks are removed from a line on input. (This specification also addresses the problems of environments which represent text as variable-length records, but do not allow a record length of 0: an empty line may be written as a one-character record containing a blank, and the blank is stripped on input.)
ftellreturns a file position indicator, which has no necessary interpretation except that an
fseekoperation with that indicator value will position the file to the same place. Thus an implementation may encode whatever file positioning information is most appropriate for a text file, subject only to the constraint that the encoding be representable as a
long. Use of
fsetposremoves even this constraint.
setvbuffunctions, but permitting great latitude in their implementation. A conforming library need neither attempt the impossible nor respond to a program attempt to improve efficiency by introducing additional overhead.
Thus, the Standard imposes a clear distinction between text streams, which must be mapped to suit local custom, and binary streams, for which no mapping takes place. Local custom on UNIX (and related) systems is of course to treat the two sorts of streams identically, and nothing in the Standard requires any changes to this practice.
Even the specification of binary streams requires some changes to accommodate a wide range of systems. Because many systems do not keep track of the length of a file to the nearest byte, an arbitrary number of characters may appear on the end of a binary stream directed to a file. The Standard cannot forbid this implementation, but does require that this padding consist only of null characters. The alternative would be to restrict C to producing binary files digestible only by other C programs; this alternative runs counter to the spirit of C.
The set of characters required to be preserved in text stream I/O are those needed for writing C programs; the intent is the Standard should permit a C translator to be written in a maximally portable fashion. Control characters such as backspace are not required for this purpose, so their handling in text streams is not mandated.
It was agreed that some minimum maximum line length must be mandated; 254 was chosen.
The as if
principle is once again invoked to define the nature of input and output
in terms of just two functions,
The actual primitives in a given system may be quite different.
Buffering, and unbuffering, is defined in a way suggesting the desired interactive behavior; but an implementation may still be conforming even if delays (in a network or terminal controller) prevent output from appearing in time. It is the intent that matters here.
No constraints are imposed upon file names, except that they must be representable as strings (with no embedded null characters).
The Base Document provides the
system call to remove files.
The UNIX-specific definition of this function prompted
the Committee to replace it with a portable function.
This function has been added to provide
a system-independent atomic operation
to change the name of an existing file;
the Base Document only provided the
link system call,
which gives the file a new name without removing the old one,
and which is extremely system-dependent.
The Committee considered a proposal that
should quietly copy a file if simple
renaming couldn't be performed in some context,
but rejected this as potentially too expensive at execution time.
rename is meant to give access to an underlying facility of
the execution environment's operating system.
When the new name is the name of an existing file,
some systems allow the renaming
(and delete the old file or make it inaccessible by that name),
while others prohibit the operation.
The effect of
rename is thus implementation-defined.
function is intended to allow users to create binary
The as if principle implies that the information in such a file need
never actually be stored on a file-structured device.
The temporary file is created in binary update mode, because it will presumably be first written and then read as transparently as possible. Trailing null-character padding may cause problems for some existing programs.
This function allows for more control than
a file can be opened in binary mode or text mode,
and files are not erased at completion.
There is always some time between the call to
tmpnam and the use
fopen) of the returned name.
Hence it is conceivable that in some implementations
the name, which named no file at the call to
has been used as a filename by the time of the call to
Implementations should devise name-generation strategies which minimize
this possibility, but users should allow for this possibility.
On some operating systems it is difficult, or impossible, to create a file unless something is written to the file. A maximally portable program which relies on a file being created must write something to the associated stream before closing it.
fflush function ensures that output has been
forced out of internal I/O buffers for a specified stream.
Occasionally, however, it is necessary to ensure that all
output is forced out, and the programmer may not conveniently be
able to specify all the currently-open streams (perhaps because
some streams are manipulated within library packages). [Footnote: For instance, on a system (such as UNIX) which supports
process forks, it is usually necessary to flush all output buffers just
prior to the fork.]
To provide an implementation-independent method of flushing all
output buffers, the Standard specifies that this is the result of
fflush with a NULL argument.
b type modifier has been added to deal with the text/binary dichotomy
Because of the limited ability to seek within text files (see §126.96.36.199),
an implementation is at liberty to treat the old update
+ modes as
b were also specified.
Table 4.1 tabulates the capabilities and actions
associated with the various specified mode string arguments to
r w a r+ w+ a+ file must exist before open x - - x - - old file contents discarded on open - x - - x - stream can be read x - - x x x stream can be written - x x x x x stream can be written only at end - - x - - x
setvbuffunction. (See §188.8.131.52.) An implementation may choose to allow additional file specifications as part of the
modestring argument. For instance,
file1 = fopen(file1name,"wb,reclen=80");might be a reasonable way, on a system which provides record-oriented binary files, for an implementation to allow a programmer to specify record length.
A change of input/output direction on an update file
is only allowed following a
since these are precisely the functions
which assure that the I/O buffer has been flushed.
The Standard (§4.9.2) imposes the requirement that binary files
not be truncated when they are updated.
This rule does not preclude an implementation from supporting additional
file types that do truncate when written to,
even when they are opened with the same sort of
Magnetic tape files are an example of a file type that must be
handled this way. (On most tape hardware it is impossible to write
to a tape without destroying immediately following data.)
Hence tape files are not ``binary files'' within the meaning of
A conforming hosted implementation must provide (and document) at
least one file type (on disk, most likely) that behaves exactly
as specified in the Standard.
setbuf is subsumed by
but has been retained for compatibility with old code.
setvbuf has been adopted from UNIX System V,
both to control the nature of stream buffering
and to specify the size of I/O buffers.
An implementation is not required to make actual use of a buffer
provided for a stream,
so a program must never expect the buffer's contents to reflect I/O
Further, the Standard does not require that the requested buffering
it merely mandates a standard mechanism for requesting whatever buffering
services might be provided.
Although three types of buffering are defined, an implementation may choose to make one or more of them equivalent. For example, a library may choose to implement line-buffering for binary files as equivalent to unbuffered I/O or may choose to always implement full-buffering as equivalent to line-buffering.
The general principle is to provide portable code with a means of requesting the most appropriate popular buffering style, but not to require an implementation to support these styles.
Use of the
L modifier with floating conversions has been added
to deal with formatted output of the new type
Note that the
expect a corresponding
%lx must be supplied with a
long int argument.
The conversion specification
has been added for pointer conversion,
since the size of a pointer is not necessarily the same as the size of an
Because an implementation may support more than one size of pointer,
the corresponding argument is expected to be a
(void *) pointer.
%n format has been added to permit ascertaining the number
of characters converted up to that point in the current invocation of the
Some pre-Standard implementations switch formats for
at an exponent of -3 instead of (the Standard's) -4:
existing code which requires the format switch at -3 will
have to be changed.
Some existing implementations provide
as synonyms or replacements for
The Committee considered the latter notation preferable.
The Committee has reserved lower case conversion specifiers for future standardization.
The use of leading zero in field widths to specify zero padding has been superseded by a precision field. The older mechanism has been retained.
Some implementations have provided the format
as a means of indirectly passing a variable-length argument list.
are considered to be a more controlled method of effecting this indirection,
%r was not adopted in the Standard.
The printing formats for numbers is not entirely specified. The requirements of the Standard are loose enough to allow implementations to handle such cases as signed zero, not-a-number, and infinity in an appropriate fashion.
fscanfis based in part on these principles:
fscanf. Given the invalid field ``
-.x'', the characters ``
-.'' are not pushed back.
fscanfare compatible with those performed by
%phas been added, although it is obviously risky, for symmetry with
%iformat has been added to permit the scanner to determine the radix of the number in the input stream; the
%nformat has been added to make available the number of characters scanned thus far in the current invocation of the scanner.
White space is now defined by the
An implementation must not use the
to perform the necessary one-character pushback.
In particular, since the unmatched text is left ``unread,''
the file position indicator as reported by the
must be the position of the character remaining to be read.
Furthermore, if the unread characters were themselves pushed back
ungetc calls, the pushback in
fscanf must not affect
the push-back stack in
scanf call that matches
N characters from a stream must leave the stream in the same state
as if N consecutive
getc calls had been issued.
See comments of section §184.108.40.206 above.
See comments in section §220.127.116.11 above.
See §18.104.22.168 for comments on output formatting.
In the interests of minimizing redundancy,
sprintf has subsumed the older, rather uncommon,
The behavior of
sscanf on encountering end of string has been clarified.
See also comments in section §22.214.171.124 above.
have been adopted from UNIX System V
to facilitate writing special purpose formatted output functions.
Because much existing code assumes that
are the actual functions equivalent to the macros
the Standard requires that they not be implemented as macros.
This function subsumes
which has no limit to prevent storage
overwrite on arbitrary input (see §126.96.36.199).
putc have often been implemented as unsafe macros,
since it is difficult in such a macro to touch the
stream argument only once.
Since this danger is common in prior art, these two functions
are explicitly permitted to evaluate
stream more than once.
puts(s) is not exactly equivalent to
puts also writes a new line after the argument string.
This incompatibility reflects existing practice.
The Base Document requires that at least one character be read before
ungetc is called, in certain implementation-specific cases.
The Committee has removed this requirement,
thus obliging a
to have room to store one character of pushback regardless of the state
of the buffer;
it felt that this degree of generality makes clearer the ways
in which the function may be used.
It is permissible to push back a different character than that which was read;
this accords with common existing practice.
The last-in, first-out nature of
ungetc has been clarified.
ungetc is typically used to handle algorithms, such as tokenization,
which involve one-character lookahead in text files.
ftell are used for random access, typically in binary files.
So that these disparate file-handling disciplines are not unnecessarily linked,
the value of a text file's file position indicator immediately after
has been specified as indeterminate.
Existing practice relies on two different models of the effect of
One model can be characterized as writing the pushed-back character
``on top of'' the previous character.
This model implies an implementation in which the pushed-back characters are
stored within the file buffer and bookkeeping is performed by
setting the file position indicator to the previous character position.
(Care must be taken in this model to recover the overwritten character
values when the pushed-back characters are discarded as a result of
other operations on the stream.)
The other model can be characterized as pushing the character ``between''
the current character and the previous character.
This implies an implementation in which the pushed-back characters
are specially buffered (within the FILE structure, say) and accounted
for by a flag or count.
In this model it is natural not to move the file position
The indeterminacy of the file position indicator while pushed-back
characters exist accommodates both models.
Mandating either model
(by specifying the effect of
a text file's file position indicator)
creates problems with implementations that have assumed the other model.
Requiring the file position indicator not to change after
would necessitate changes in programs which combine random access
and tokenization on text files,
and rely on the file position indicator marking the end of a token
even after pushback.
Requiring the file position indicator to back up would create severe
implementation problems in certain environments,
since in some file organizations it can be impossible to find the previous
input character position without having read the file sequentially to
the point in question. [Footnote:
Consider, for instance, a sequential file of variable-length records in which
a line is represented as a count field followed by the characters in the line.
The file position indicator must encode a character position
as the position of the count field plus an offset into the line;
from the position of the count field and the length of the line,
the next count field can be found.
Insufficient information is available for finding the previous
count field, so backing up from the first character of a line necessitates,
in the general case, a sequential read from the start of the file.]
is the appropriate type both for an object size and for an array
bound (see §188.8.131.52),
so this is the type of
fsetpos have been added to allow random access
operations on files which are too large to handle with
Whereas a binary file can be treated as an ordered sequence of bytes,
counting from zero, a text file need not map one-to-one to its
internal representation (see §4.9.2).
Thus, only seeks to an earlier reported position are permitted for
The need to encode both record position and position within a record
long value may constrain the size of text files
ftell can be used
to be considerably smaller than the size of binary files.
Given these restrictions,
the Committee still felt that this function has enough utility,
and is used in sufficient existing code,
to warrant its retention in the Standard.
fsetpos have been added to deal with files
which are too large to handle with
fseek function will reset the end-of-file flag for the stream;
the error flag is not changed unless an error occurs, when it will be set.
ftell can fail for at least two reasons:
ftellto report failure has been specified.
See also §184.108.40.206.
Resetting the end-of-file and error indicators
was added to the specification of
to make the specification more logically consistent.
At various times, the Committee considered providing a form of
that delivers up an error string version of
errno without performing any output.
It ultimately decided to provide this capability in a separate function,