Day 030: Bridges in No Man's Land
Day 030: Bridges in No Man’s Land
Thirty days. I did not expect to still be going this strong at thirty days. Today K&R section 4.2 broke the illusion that C types are just labels. They are not. They are physical contracts. And when you break the contract, the CPU does not throw an error. It just grabs the wrong register and keeps moving.
What I Did
Worked through section 4.2, K&R’s atof implementation. The function
builds a floating-point value from a string, character by character.
Integer part, decimal part, scientific notation. The mechanics were
interesting but not the main event.
The main event was K&R’s warning about implicit declarations.
If you call atof before you declare it, the compiler assumes it returns
int. That assumption gets baked into the machine code at the call site.
Then when atof actually runs and puts a double into XMM0, the caller
is already reading RAX. Two registers. No overlap. No error. Just silence
and wrong data.
I typed out the calculator example and traced the declaration placement.
Then I started section 4.3 and got into extern and the definition versus
declaration question, which I am carrying into tomorrow.
The Questions That Came Up
Why can’t the compiler just look ahead?
My first instinct was: just scan the whole file first. But the C compiler is a transcriber, not a reader. It processes one line at a time. When it hits a function call, it has to emit machine code for that call immediately. It cannot wait to see what the function returns later. That constraint is a design decision from an era when memory was measured in kilobytes. The consequence is that omitting a prototype is not a style issue. It is an instruction to emit wrong machine code.
Does it matter if the declaration is inside main or at file scope?
Yes, but not in the way I first assumed. Inside main, the declaration is
private to that function. Every other function in the same translation unit
is back in No Man’s Land for atof. File scope fixes that for the whole
file. Neither option scales to a multi-file project. That pressure is
exactly what created header files.
The Feynman Test
I used to think a function’s return type was just information for the programmer. A note in the code.
It is not. It is a direction on a map.
On x86_64, when a function finishes and hands back a result, there are
rules for where that result goes. Integer comes back in RAX. Floating
point comes back in XMM0. The caller has to know which register to look
in before the function even runs. That decision is locked into the machine
code when the file compiles.
If the caller thinks it is getting an integer, it looks in RAX. If the
function actually returned a double and put it in XMM0, the caller misses
it completely. It reads whatever happened to be in RAX last. A memory
address. A loop counter. Anything.
The program does not crash. It just continues with hallucinated data. The CPU did exactly what it was told to do. It was told the wrong thing.
That is what a missing prototype costs you.
Hacker Connection
Entry 034 in the vulnerability notebook: Register-Level Type Confusion, CWE-843.
The exploitation angle is subtle but real. If an attacker can influence a value that gets used after an ABI mismatch, the corrupted result is deterministic based on register state. That is not random garbage. It is predictable garbage, which is a different problem.
The broader pattern carries past implicit declarations. Any time caller and callee disagree on type, the ABI breaks. Mismatched headers. Wrong prototype in a build system. Variadic function called without a proper format string. The register confusion is the underlying hardware reality behind all of them.
C99 technically killed implicit int. But legacy codebases, embedded
toolchains, and -std=c89 builds kept it alive. The pattern matters beyond
the specific rule that created it.
What Is Next
Section 4.3 formally introduces extern and the distinction between a
definition and a declaration. I have a working intuition from today and
from the overnight question I answered this morning. Tomorrow I need to
find the precise line K&R draws between the two and why it matters for
everything that comes after.
Overnight question: what exactly does it mean to define a variable versus declare one, and what does storage allocation have to do with it?
Day 30 of 365. One month down. The machine is less of a black box, and No Man’s Land is ours.