Dissecting the UNIX v6 Allocator

ljrk

2021-04-10

I recently stumbled upon the first known-to-me libc implementation of the alloc and free functions (before being renamed to malloc ¹). You find the source code on the website of The UNIX History Society here.

Reading the code is not that easy, since much of C has changed since 1975, which was even 3 years before the 1st edition K&R was released.

The code is nonetheless interesting, as it’s probably one of the most compact freelist allocators out there, with much care taken to make it efficient.

Early C Prerequisites

Structures

According to Brian W. Kernighan, the struct keyword was the last keyword that had to be added to C for Ken Thompson to succeed in porting the pure assembly UNIX to the more portable C. Early C structs are quite different from their modern counterparts. For one, they weren’t types.

Let’s take the following struct definition from alloc.c:

struct fb {
    char    *size;
    char    *next;
};

This looks almost as expected, although it might be a bit confusing that the size is stored in a pointer, but we will come to that later.

Unfortunately this structure definition didn’t supply us with a type, so we couldn’t just create a list from it:

struct fb *freelist = { 0, -1 };

To understand why, let me give you a valid line of historic C instead:

100->next = 42;

The struct in old C was just a way to structurally address arbitrary memory. The -> implies we are dereferencing memory as the above is equivalent to:

(*100).next = 42;

This line writes to the word (2 Byte value) on address 102–103 with the value 42. The compiler reasons that -> (or .) implies that the address to the left is an address to a structure. Furthermore, the right-hand side specifies that we want to access the second element of that structure, which is offset from the first by 2 Byte (the first element of the structure is the 2 Byte char *size).

But: How does the compiler know that we want to use struct fb and not any other structure with a next member? Well, simply because there isn’t. All structures share a namespace, but luckily, as we can see in the file, there are no headers which could mess up our namespace. The compiler searches for the first structure to have a next member and rolls with it. Simpler times.

No `size_t`

Let’s revisit the structure member char *size.

Early C didn’t have unsigned integers such as size_t (or even simply unsigned), pointers were thus the way to get unsigned integers ². Incidentally, a pointer is naturally able to hold arbitrary addresses (and thus sizes) to memory. Furthermore, a pointer to char behaves just like normal integers when it comes to arithmetics, pointer arithmetics doesn’t make a difference here.

In C, if you declare a pointer variable T *p and write p+1, the result of the expression is not the directly following “address” in memory. That is, if p = 42 then p+1 isn’t necessarily 43, but 42 + \text{sizeof} (T). More generically, p+n evaluates to the address of p+n \cdot \text{sizeof} (T).

This makes it easy to iterate over consecutive elements of p in memory. Since a char is guaranteed to be of sizeof (char) == 1 byte size, there’s no difference between usual arithmetics and pointer arithmetics for this pointer type.

K&R Function Definitions

While nowadays we are used to C function prototypes of the form:

int main(void);
int main(int argc, char *argv[]);

Early C didn’t have such a thing. In fact, a forward declaration of a function didn’t contain any information about the number of arguments, at all ³:

int main();

Only functions could specify arguments, but even then: Instead of a parameter list containing both argument name and type in the parenthesis, programmers used to only list the identifiers:

int main(argc, argv)

The types were either implicit, or given after the closing parenthesis and the opening brace of the function block ⁴:

int main(argc, argv)
int argc;
char **argv;
{
    return (0);
}

Implicit types? Yes, any variable (or function) without a (return) type was, and is, implicitly int.

Thus, the canonical way for a forward declaration would have been:

main();

And for the function definition:

main(argc, argv)
char **argv;
{
    return (0);
}

Freelist Algorithm Overview

Before diving into the details of the algorithm implementation, let’s cover the idea.

An Empty Freelist

The initial freelist is defined as

int freelist[] {
    0,
    -1,
};

For a more modern C this almost only lacks the assignment =, as in the early time of the language, global variables were not “assigned to”, since this would mean something active, something that happens on runtime.

This definition creates the initial head element of our freelist. We remember, we can’t use the struct fb type so instead an int[] is chosen. On the original PDP an int was of the same size as a char*, both one word or 2 Bytes.

Thus, the definition would nowadays have been written:

struct fb *freelist = {
    0,
    -1
};

The Structure of the Freelist

The linked list starting at the head *freelist contains blocks of struct fb (2 times 2 Byte), which store the metadata of that free block: Its size and the next in line. Used memory is not recorded in this list. The size recorded in a free-block includes the size needed for the metadata itself, with the obvious exception of the head. The last entry of the list is marked with next == -1.

If, for example, we have 100 B of contiguous free memory, the list could look like this:

[ 0 | . ]    [ 104 | -1 ||  <100 B of free memory>  ]
|     \_____/^
 \
  HEAD

Our head element’s next entry points to a not necessarily immediately following memory location. This location contains another block, with the entry size = 104 (4 Bytes extra for the block itself), and a next = -1 to terminate the list. Immediately following the block are 100 Bytes of free memory that were acquired earlier.

If we have multiple free memory locations, then they are listed in increasing order by memory location:

[ 0 | . ]    [ 104 | . ||  <100 B of free memory>  ]  [ 8 | -1 || <4 B> ]
|     \_____/^        \______________________________/^
 \
  HEAD

We guarantee that freelist->next < freelist->next->next.

Allocated Memory

But what happens if the first block of 100 Bytes is to be allocated and given to the user? We need to remove it from the list, but we will keep the metadata. However, as it’s not part of the list anymore, it doesn’t maintain a next pointer, so the only metadata that’s kept is the size:

[ 0 | . ]    [ 104 ||   <102 B of allocated memory>   ]  [ 8 | -1 || <4 B> ]
|     \______________________________________________/^
 \
  HEAD

This also means that a free-block of 104 Bytes size actually reserves enough memory for an allocated block of 102 Bytes! Thus, the right allocation request to alloc to have a perfect fit would be alloc(102).

The memory returned by alloc starts just after our size metadata of 102. An experienced user can use this knowledge to read the first two bytes of the returned memory in order to find the pointer to the next free-block ;)

Freeing Memory

The size metadata is important when we release the memory again. The user passes the pointer to the formerly allocated memory block to free.

[ 0 | . ]    [ 104 ||   <102 B of allocated memory>   ]  [ 8 | -1 || <4 B> ]
|     \_____________^___________________________________/^
 \                  |
  HEAD               passed by user

We then subtract 2 Bytes from the memory location to create a pointer to the metadata block containing the size = 102 of the block to-be-freed. We then proceed to re-link it into the list by writing a next entry which pointing to the following free list block and hooking up the HEAD to point to the newly freed block. That way, we recreate the status from before.

If our freed memory block happens to be adjacent to another free block, we will merge them. More on that later.

Giving the Program a Break

Before we can see how alloc/free manage the free program memory, we need to understand how a program actually gets hold of said memory.

On UNIX (and Linux), a program is sectioned into parts that contain code (.text), initialized data (.data) and uninitialized data (.bss).

Furthermore, when the program is mapped from the disk into memory, these sections are located at the lower addresses, with .text being the lowest and .bss being the highest. The end of .bss is also called the “program break”.

_____________ 
[ .bss  ]    ^ program break
[ .data ]
[ .text ]

Everything after that is, at first, unallocated memory. There are, in general two ways to allocate memory from this “space”. The program heap which grows starting from the program break towards the higher addresses.

|   ^   |
|   |   |    < Heap area
|_______|____ 
[ .bss  ]    ^ original program break
[ .data ]
[ .text ]

The second way to allocate memory is the stack which grows from the highest addresses downwards to the lower ones.

________
|   |   |
|   ↓   |    < Stack area
|       |
|_______|
|       |
|       |
|       |    < Yet unallocated area
|       |
|_______|____
|       |    ^ new program break
|   ^   |
|   |   |    < Heap area
|_______|____ 
[ .bss  ]    ^ original program break
[ .data ]
[ .text ]

If you go overboard and there are no security restrictions both areas can meet and overwrite each other.

The stack is only used for so-called automatic storage allocation, which is what happens if you simply declare a variable at the beginning of a function: It will be automatically allocated and when the function terminates, it will be deallocated. This means, a variable cannot be passed “out” from a function to its caller when it was allocated by the callee. The upside is, it’s fast and easy; you simply decrement the CPU stack pointer register, which points to the lowest address in the stack (i.e., top of stack). This is enough to consider the storage allocated.

We are considering dynamic allocation, however. To increase the heap area we need to move the program break up, which can only be done by the operating system kernel. The sbrk system call does just that. We pass it the amount of additional storage we need and it will move the program break accordingly.

We are returned the start of the newly allocated memory.

The Algorithm

Part I: Allocation

How Much Memory Do You Need?

According to the man page alloc(3), the complete signature of the alloc function would look like:

char *alloc(char *size);

First, three new variables, all of pointer type char* are declared. The code checks whether the allocation size asize == 0, and if so, simply returns the null pointer. In the same statement, it assigns size the value of asize. For those less comfortable with C, the following happens:

      1. assign size = asize
     ____|______
    /           \
if ((size = asize) == 0)
    \_________________/
             |
           2. check expr for 0

Any expression, such as 1+3 has a value when evaluated (in this case, 4). It turns out that a=42 is an expression, and as such, has a value when evaluated—next to the side effect of a being set to 42. This value of an assignment expression is the value that is being assigned, so a=42 evaluates to 42. We can use this value as part of another (assignment expression):

b = (a = 42); /* sets a to 42, then b to 42 */

Due to the order of evaluation, this is identical to the common idiom of:

b = a = 42;

With this knowledge we understand that the above if-statement could be rewritten as ⁵:

size = asize;
if (size == 0) /* or: asize == 0 */

What follows are the rather obscure lines

size =+ 3;
size =& ~01;

Disregarding the fact that nowadays we write += and &= ⁶, the meaning of these two lines is rather difficult to decipher. It’s clearer when we rewrite it:

size += 3;
if (size % 2 != 0) { size--; }

That is, we increment by the magic number 3 and, if the result was odd, decrement by 1 again. Another alternative writing would be:

if (size % 2) {
    size += 2;
} else {
    size += 3;
}

We add at least 2 to the value of size and, if the result is odd, we add another one for good measure.

The goal of these lines is to guarantee the following:

size >= sizeof (struct fb)
size >= asize + sizeof (char*)
size is even.

The first follows from size > 0—if it were 0 we’d have jumped the ship earlier. Thus size >= 1. If it’s indeed equal to 1 we add 3 (1 is odd), if it’s 2, we add 2; in case of 3 we add 1 and in case of 4 we are already done (the condition is fulfilled). But we still add 2!

This is because of the 2nd guarantee, a char* is one word or 2 Bytes in size.

But why do we want to assert these properties? The last property is (to my best guess) due to alignment issues, performance and/or aesthetics. Even numbers are nice!

The second point is needed as we usually need to create a new block. That is, if we don’t have a perfect fit as shown in the overview, we want/need to split a free-block.

Considering the example from before:

[ 0 | . ]    [ 104 | -1 ||         <100 B of free memory>           ]

An allocation request of 10 Bytes would result in:

[ 0 | . ]    [ 12 ||   <10 B>   ][ 92 | -1 || <88 B of free memory> ]

Due to the additional allocation block metadata to store the size of 12, we actually consumed 12 Bytes for a request of 10! Thus, when searching for free memory in the list, a space of 11 wouldn’t have sufficed!

This leaves us with a justification for the first requirement, clearly 2 Bytes [to store the size] ought to be enough for every allocation?

Indeed, for these small sizes of char* (2 Bytes) from point 2 and 3 the first one even follows: \begin{align*} &\; \text{size} > 2 \land \text{size} \equiv 0 \pmod 2 \\ \Rightarrow &\; \text{size} \geq 4 \end{align*} Yet, if the next and size members were of different types (or on an architecture with 64 bit / 8 Byte pointers), then this isn’t the case.

And, while in theory, yes, the first requirement wouldn’t be needed, however, we will stumble on code that will require the new free-block node to not take up memory previously occupied by the old free-block node.

Let’s demonstrate this with an allocation of just 3 Bytes (violating rule 3):

                 4 Bytes
              _______|_______
             /                \
[ 0 | . ]    [ 104 | -1       ||     <100 B of free memory>     ]
[ 0 | . ]    [ 3  || <1 B> ][ 101 | -1 || <99 B of free memory> ]
              \____________/\__________/
                    |            |
                 3 Bytes      4 Bytes

                            ^^^^^^\ 1 Byte overlap

The old free-block node containing the tuple (104, -1) overlaps with the memory occupied by the new one (101, -1). Although the memory isn’t used anymore by the old node, since we shrunk it to 2 Bytes plus 1 Byte of allocated memory, the code is written in a way that this poses a problem. We will come to the why later.

In Need of More Memory

We now enter an unbounded loop—but we won’t search forever, luckily.

/* outer unbounded loop */
for (;;) {
    /* inner loop:  iterate through freelist */
    for (cp = freelist; (np = cp->next) != -1; cp=np) {
        /* ... */
    }
    /* no big enough free-block found */

    /* ... */
}

We can roughly divide our problem of allocating N bytes of memory into two cases:

There’s a big enough free-block already available.
There isn’t.

The first case we will come back to later, let’s just say that the inner loop for (cp... deals with finding, allocating and returning that memory in that case. That means, if there’s enough memory available, neither will the line after said inner loop asize = size < ... execute nor will the outer unbounded loop actually loop.

Given that there wasn’t a big enough block, we need to request more memory from the operating system kernel. Since this is an expensive task, we will always request at least 1 kilo Byte, even if less are needed (note that 1024 Byte also fulfill our three rules!).

In case that size >= 1024 we allocate size, otherwise 1024. The request is done using the earlier mentioned sbrk syscall, which returns -1 on error, which we simply pass on.

The next part is interesting, let’s consider what happens if we didn’t allocate anything yet, that is, only the HEAD is in the list.

                 &cp->next
                     \
[ 0 | -1 ]    [ 1024 ||  <1022 B of allocated memory>  ]
|             ^\
 \              cp, returned by sbrk
  HEAD

We are returned with a chunk of 1024 Bytes (or more, if size > 1024) of memory. We then write the 1024 into the first two bytes of the chunk and… call free on the address that follows directly thereafter!

The idea is that to add a free block of 1024 Bytes to the freelist, we first create an artificial allocated block of 1024 Bytes and call the free function on it. The way our free function works (we will cover the details later) it will simply hook up this piece of memory with the freelist, resulting in:

[ 0 | . ]    [ 1024 | -1 ||  <1020 B of free memory>  ]
|     \_____/^
 \
  HEAD

Brilliant! We now have reduced the case of inserting a free block to first inserting an allocated block and then freeing it.

Further, we have reduced our second case “there’s not enough memory to accommodate the request” to the first! As we reach the end of the outer loop, we now enter the inner loop which searches our freelist for a fitting block.

Since we just inserted one, this should succeed!

Searching For A Heart Of Gold

Now we can consider the inner loop, how does it actually work? We start by defining a current pointer, starting at freelist. At the start of each iteration we also assign a next pointer np to be np = cp->next, making sure that np != -1 as this would mean that np has reached the end of the list.

In an empty freelist contains just the HEAD with next = -1 and will thus immediately return.

Since a non-empty free list at least has two free-blocks, the first iteration will have cp = freelist and np = freelist->next when entering the loop body.

We are greeted with a big if-statement containing the whole body of the loop. For readability, we will negate it’s condition, rewriting the code in the loop as such:

if (np->size < size) {
    continue;
}

if (size+slop >= np->size) {
    cp->next = np->next;
    return(&np->next);
}
cp = cp->next = np+size;
cp->size = np->size - size;
cp->next = np->next;
np->size = size;
return(&np->next);

This means, if our np (we skip over the HEAD pointed to by cp) is not big enough for the request, we simply skip the rest of this iteration and jump to the next.

However, if we did find a block that’s large enough we take it. This is a first-fit strategy: We don’t bother finding a more suitable block.

The next if-statement checks whether we have a “perfect fit”, that is, if the requested amount of memory is precisely the one offered by this block. But the line doesn’t read:

if (size == np->size)

To be fair, we are a bit sloppy here with the definition of “precise” or “perfect”. We do allow the free-block to be slop amount larger than requested. Thus, if size is 42, slop is 2, then a free-block of 44 would be also considered a perfect fit!

Note here, that the 42 already includes the amount of memory to possibly add another metadata block, the actual request may have been 40 or 39!

As documented in the manual of Research UNIX v5 (but oddly enough not in my v6 man page) about the slop value:

The external variable slop (which is 2 if not set) is a number such that if n bytes are requested, and if the first free block of size at least n is no larger than n+\text{slop}, then the whole block will be allocated instead of being split up. Larger values of slop tend to reduce fragmentation at the expense of unused space in the allocated block.

The “fragmentation” mentioned in this snippet refers to external fragmentation only, since the “unused space in the allocated block” is also known as internal fragmentation.

So in case of a “sloppy perfect fit” we enter the if statement and simply remove the block in question from the free list, and return its address.

Thus for an allocation request of 102 (or 100/99 before rule-of-three adjustment) in the following case:

[ 0 | . ]    [ 104 | . ||  <100 B of free memory>  ]  [ 8 | -1 || <4 B> ]
|     \_____/^        \______________________________/^
 \
  HEAD

This happens:

              &np->next
                   \
[ 0 | . ]    [ 104 ||   <102 B of free memory>     ]  [ 8 | -1 || <4 B> ]
|     \_____x         x______________________________/^
 \          \_________/
  HEAD

The special case where np is the last element of the list works as well, since we just copy the -1 from np->next to cp and thus mark the previous element as the new last.

Memory Splitting

If our block doesn’t even fit “sloppily”, we split it to conserve memory (returning a 1024 Byte block for a request of 5 Byte does seem a bit… extra).

We do that by effectively moving the old freeblock that we want to split further down in memory. Again, considering this example and an allocation request of 42 Bytes, after adjustment:

[ 0 | . ]    [ 104 | . ||  <100 B of free memory>  ]  [ 8 | -1 || <4 B> ]
|     \_____/^        \______________________________/^
 \
  HEAD

First, we calculate the position of the new freeblock node, which must be 42 Bytes farther to the right than the current freeblock of 104 Bytes size, to which np points.

                           cp
                            \
[ 0 | . ]    [ 104 | . ||  <100 B of free memory> ]  [ 8 | -1 || <4 B> ]
|     \               \_____^_______________________/^
 \     \___________________/
  HEAD

We first overwrite the next pointer of cp with it, hooking up the freeblock-to-be with its predecessor (in our case HEAD), and then make cp point to our newly-made block.

This points right into the free/unused area of the original freeblock.

As a next step, we calculate the size of our new / just moved freeblock to be np->size - size, that is, in our case, 104-42 = 62. We now proceed to store this information at the address pointed to by cp->size.

            np             cp
             \              \
[ 0 | . ]    [ 104 | . ][ 62 | ? || <58 B free> ]  [ 8 | -1 || <4 B> ]
|     \               \_____^_____________________/^
 \     \___________________/
  HEAD

And this is where the rule one is crucial: Consider the alternative case of a request of 3 Bytes (violating rule 3 and 1): Since cp->size would point to a value not after the old node, but in the new node, we’d calculate the new size of 104-3 = 101, but we will store it in a location that was previously occupied by the latter part of the original freeblock. Specifically, np->next, the pointer to the next element of the old block.

                    np->next
                    ___|___
                   /       \
[ 0 | . ]    [ 3  ||   .    ]. |     ]  [ 8 | -1 || <4 B> ]
[ 0 | . ]    [ 3  ||   ][ 101  | ... ]  [ 8 | -1 || <4 B> ]
                        \______/
                            |
                         cp->size

                        ^^^^^\ 1 Byte overlap

And indeed, overwriting np->next would prove catastrophically in the next step: Making np’s successor the successor of cp.

We take the value of np->next (hopefully not overwritten by cp->size) and copy it to cp->next. We don’t bother “deleting” the old reference of np->next, as soon as we return the memory to the user, they will likely override it anyway—and why bother (security!)?

            np             cp
             \              \
[ 0 | . ]    [ 104 ||  ][ 62 | . || <58 B free> ]  [ 8 | -1 || <4 B> ]
|     \________________/^      \__________________/^
  HEAD

What remains is to update the metadata of the allocated block. Currently it still is set to 104, but the block we will return is only 42 Bytes in size since that was the requested amount.

               &np->next
            np    \        cp
             \     \        \
[ 0 | . ]    [ 42 || <40 B> ][ 62 | . || <58 B free>] [ 8 | -1 || <4 B> ]
|     \_____________________/^      \________________/^
 \
  HEAD

We return &np->next and are done!

I want to be^Wbreak free!

We already touched upon the usage of free when discussing the case of allocating memory that we first have to request from the operating system kernel. From the memory returned by the kernel we crafted an “artifical” used-block and passed it to free in order to hook it up with our free list. And that’s pretty much what we’ll be doing—plus some tricks!

First, let’s re-consider the introductory example:

           ptr    aptr
             \     \
[ 0 | . ]    [ 104 ||   <102 B of allocated memory>   ]  [ 8 | -1 || <4 B> ]
|     \__________________________________________________/^
 \
  HEAD

The user passed the address to the 102 Bytes of allocated memory as aptr. We calculate aptr-2 to create a pointer ptr to the used-block and proceed to search through the freelist for free-block directly before the ptr to be cp (and np to be the one after, cp->next).

Note that since the last element of the list has next = -1 but pointers are treated as unsigned, this is actually the largest representable number. Thus, if cp->next = -1 (cp is the last element), the loop terminates.

In our case, cp is the HEAD, and np points to the tuple (8, -1).

The next two if-else statements check whether

The freed block is immediately followed by the next free-block np, and
The freed block is immediately precursed by the previous free-block cp.

In our example there is a gap between either, and the calculation will thus in both cases determine that the else-clause is to be taken:

 cp+cp->size                                 ptr+ptr->size
cp    \    ptr                                       \ np
\      \     \                                        \ \
[ 0 | . ]    [ 104 ||   <102 B of allocated memory>   ]  [ 8 | -1 || <4 B> ]
      \__________________________________________________/^

         ^^^^^\ gap                                    ^^\ gap

This simply results in the just following two changes:

ptr->next = np;
cp->next = ptr;

And suddenly we are back as a free block:

cp         ptr                                         np
\            \                                          \
[ 0 | . ]    [ 104 | . ||  <100 B of free memory>  ]  [ 8 | -1 || <4 B> ]
      \_____/^        \______________________________/^

(E)merging Freedom

But what if our to-be-free block is directly followed or preceeded by another free-block?

Let’s look at our allocation of 42 Bytes. The pointer to the 40 Byte memory area is passed as aptr, we subtract 2 as this is the size of the size element and reach ptr. We find the surrounding free-blocks cp and np:

cp          ptr   aptr    np
 \           \     \        \
[ 0 | . ]    [ 42 || <40 B> ][ 62 | . || <58 B free>] [ 8 | -1 || <4 B> ]
      \_____________________/^      \________________/^

We calculate ptr+ptr->size, and since starting from the address of ptr we have 2 Bytes of metadata plus 40 Bytes of use data, we reach the end of the block—which coincedes with the next block, np.

Instead of simply linking the block, as done when not merging, we use our new block and add the following block to its size, resulting in 42+62=104.

cp          ptr   aptr    np
 \           \     \        \
[ 0 | . ]    [ 104|| <40 B> ][ 62 | . || <58 B free>] [ 8 | -1 || <4 B> ]
      \_____________________/^      \________________/^

We then link the successor, but instead of linking ptr->next to np, we skip the following block and link:

ptr->next = np->next;

This is, because np will be consumed by our block of now newly 104 Byte size:

cp          ptr   aptr    np
 \           \     \        \
[ 0 | . ]    [ 104 |    .    ][ 62 | . || <58 B free>] [ 8 | -1 || <4 B> ]
      \__________________\ _/^      \________________/^
                          \_________________________/

Again, we don’t bother overwriting the earlier np, the 100 Bytes following ptr->next are considered unallocated. Thus, cleaned up, our graphic looks like:

cp          ptr   aptr    np
 \           \     \        \
[ 0 | . ]    [ 104 | . || <100 B of free memory> ] [ 8 | -1 || <4 B> ]
       \_____________\ /^                         /^
                      \__________________________/

And that’s it for the merging with the successor ⁷. Obviously, cp->next still needs to be changed, or merged if needed. Merging with predecessor is analogous to the merging of the successor and thus omitted here.

Summary

At below 80 LOC this allocator is remarkably concise with much thought given to any single character written. While nowadays we would, without a question, use braces, comments, variables etc. more freely, it’s awe-inspiring how much can be achieve with so little.

Indeed, keeping code as short as reasonable (but no more), is something that UNIX always strove for. And while we are producing ever more code, we sometimes should probably take a step back and what we are producing. Or, to use Dijkstra’s words, we shouldn’t think about how many lines of code we’ve “produced” but refer to it as “the number of lines of code spent” ⁸.

Final remarks

The value of slop is not completely arbitrary. For too small values of slop, to my understanding, memory can corrupt.

The problem is when on allocation free-blocks are splitted into two blocks, one allocated, one free, and the latter has a size smaller than one free block structure. This can trigger a similar memory violation as rule 1 tries to guard against, at least in my experiments.

The kernel already had malloc and mfree functions before. The libc’s variants were renamed in Research UNIX v7.↩︎
Pointed out by @fuzxxl.↩︎
Due to backwards-compatibility with K&R C, the standards up to C17 still interpret a declaration of foo(); outside of the implementation with the body as unspecified number of arguments. In order to tell the compiler that the function takes no arguments, a declaration of the form foo(void) is required. The former declaration would allow a call to foo(42) without warning or error (although, possibly, program crash).↩︎
Did you ever ask yourself why the {} could be omitted in statements like for, while, if and so on, but not in function definitions? Probably not, but that’s why: Disambiguating type specifications from variable declarations in the function body!↩︎
Alternatively, using the comma operator (not to be confused with the , as argument separator: if (size = asize, size == 0).↩︎
The =X operators are ambiguous: Does a=-5 mean a = (-5) or a =- 5?↩︎
The line np = ptr is unnecessary, as np is never accessed afterwards anymore.↩︎
Transcription of the talk here ↩︎