Clive Feather on CPL and BCPL

BCPL to B to C: other articles by Mark Brader, Alan Watson, and Dennis Ritchie

All my attempts to locate a copy of the article that Dennis Ritchie is referring to have failed. So I have made an Orwellian attempt to recreate it; as a long-time BCPL user, these are my comments on Mark's and Alan's articles, and the BCPL parts of Dennis's paper of the development of C.

Clive D.W. Feather

Mark Brader wrote:

> To go back to the beginning, once upon a time in England there was a
> language called CPL. I've heard this acronym explained both as Cambridge
> Programming Language and as Combined ...

CPL officially stood for "Combined Programming Language", because it was originally going to be developed by a joint team from Cambridge and London Universities; it thus gained the nickname "Cambridge Plus London".

> the first version of B there [...] did not support floating-point. Later
> it was added by means of adding floating-point operators: #+, #*, and so on.

These operators appear in most versions of BCPL other than on 16-bit machines; I don't know whether B picked them up from BCPL or the other way round.

> Here is a bit of C code and its B equivalent:

>     float infact (n) int n;
>     /* or, of course, the newer float infact (int n) */
>     {
>             float f = 1;
>             int i;
>             extern float fact[];
>
>             for (i = 0; i <= n; ++i)
>                     fact[i] = f *= i;
>
>             return d;
>     }
>
>     #define TOPFACT 10
>     float fact[TOPFACT+1];

> And now in B:

>     infact (n)
>     {
>             auto f, i, j;   /* no initialization for auto variables */
>             extrn fact;     /* "What would I do differently if designing
>                              *  UNIX today?  I'd spell creat() with an e."
>                              *  -- Ken Thompson, approx. wording */
>
>             f = 1.;         /* floating point constant */
>             j = 0.;
>             for (i = 0; i <= n; ++i) {
>                     fact[i] = f =#* j;      /* note spelling =#* not #*= */
>                     j =#+ 1.;               /* #+ for floating add */
>             }
>
>             return (f);     /* at least, I think the () were required */
>     }
>
>     TOPFACT = 10;   /* equivalent of #define, allows numeric values only */
>     fact[TOPFACT];

The BCPL equivalent would be:

    MANIFEST ${ TOPFACT = 10 $)   // Equivalent of "const int TOPFACT = 10"
    LET infact (n) = VALOF
    $(
            LET f, j = 1., 0.

            FOR i = 0 TO n        // Declares i for the next block only
            $(
                    f #*:= j;     // := is assign, = is compare
                    fact!i := f;  // assignment doesn't return a value
                    j #+:= 1.
            $)
            RESULTIS f
    $)
    AND fact = VEC TOPFACT;       // As in B, allocates 0 to TOPFACT

The use of AND means that fact is declared throughout infact.

> The last line is of particular interest because it actually declares 12,
> not 10, words of storage. In B the subscripts run from 0 to the declared
> value, so [0] denoted a 1-element array. The extra word was a pointer
> initialized to the first element of the array

This worked in exactly the same way in BCPL, though the declaration used the keyword VEC, and subscripting was done with the ! operator instead of [].

> If you wanted to deal with character strings, you could either store one
> character per word in an array and index them directly, or store one
> character per byte and access them with library functions.

Early versions of BCPL had the same problem, but most compilers had a % operator which offered byte subscripting of memory, making string access much simpler.

Alan Watson wrote:
> I'm not sure it was ever fully implemented.

As far as I know, it never was. According to one of my supervisors, writing a complete CPL compiler as part of one's B.A. degree at Cambridge is either a guaranteed pass or a guaranteed fail, depending on who the chief examiner is that year.

> Further trivia is that on IBM mainframes (or at least those running Phoenix),
> "/*" serves the same purpose as ctrl-d on a Unix system -- it terminates
> input from the terminal. Was the adoption of this as a comment delimiter
> an inside joke by Ritchie?

I doubt it (though DMR may contradict me, of course). Every compiler I remember using allowed both // and /* */ type comment delimiters. Some also allowed | and \ to be used instead of /, so that || was also a comment-to-end-of-line, and \* ... *\ was an alternate block comment symbol. The latter was particularly useful, because it could be used to comment out blocks of code that included /* ... */ comments (as with C, comments do not nest). We used comments with vertical bars to implement a variety of conditional compilation:

	|**||| IF
	normal code
	|*|||| ELSE
	alternate code
	|*|||| CANCEL ELSE
	more normal code
	|*|||| ELSE
	more alternate code
	|**||| ENDIF

By default, this would compile the "normal code". To switch to the "alternate code", the first line was changed to |**||* or |*|||| instead. Because this comment symbol was used, the code could contain normal comments and the "commenting-out" reverse comments I described above.

> the Phoenix system (layered over MVT and then MVS on the IBM 3081s and 3084s
> at Cambridge and London)

Actually, Phoenix was developed for MVT on an IBM 370-165. It was only much later (a few weeks before I graduated) that the 3081 replaced it.

And now, with my heart in my mouth:
Dennis Ritchie wrote:

> With less success, they also use library procedures to specify interesting
> control constructs such as coroutines and procedure closures.

I don't know where it came from, but there was a coroutine library for BCPL floating around the Cambridge scene. The fact that everything was just a word made this quite easy: there was only one non-portable statement in the entire package, to extract and replace the return address on the stack.

> When in BCPL one writes
> let V = vec 10
> or in B,
> auto V[10];
> the effect is the same: a cell named V is allocated, then another group of
> 10 contiguous cells is set aside,

At least in BCPL, and Mark Brader thinks also in B, the group would contain 11, and not 10, cells.

> Individual characters in a BCPL string were usually manipulated by spreading
> the string out into another array, one character per cell, and then repacking
> it later; B provided corresponding routines, but people more often used other
> library functions that accessed or replaced individual characters in a
> string.

This was true in early BCPL, and if you were writing for maximal portability, but, as I said above, nearly all compilers provide a % operator to do byte subscripting:

    string = "This string ends with X"
    /* ... */
    string % (string % 0) = 'A'

This would change the last character of the string to an A.

> For example, B introduced generalized assignment operators, using x=+y to
> add y to x. The notation came from Algol 68

These operators were introduced into BCPL as well, using the exact notation of Algol (x +:= y), probably by cross-fertilization from the Algol 68C compiler work at Cambridge.

> Finally, the B and BCPL model implied overhead in dealing with pointers
[...]
> Each pointer reference generated a run-time scale conversion from the pointer
> to the byte address expected by the hardware.

As far as I know, every real compiler for byte-addressed systems did it this way, presumably on the assumption that arithmetic was more important than indexing. In reality, the opposite was more often the case, and it is surprising that no compiler generated code optimized for this, by ignoring the two least significant bits of every word (on a 32 bit system). This would make addresses be byte addresses, while still preserving the property that consecutive words have addresses differing by 1 (which would be 0...0100 in binary). The trade-off would be that the >> operator would have to be implemented as a shift followed by AND-ing out the last two bits, and the multiplication and division operators (which are relatively slow anyway) would require their arguments (and result for division) to be adjusted.

BCPL to B to C: other articles by Mark Brader, Alan Watson, and Dennis Ritchie