On Hacker news, imh asks,

I never understood the step about how a system that can do basic arithmetic can express the "I am not provable in F" sentence. Does anyone have an ELI30 version of that?

I think this is a great question and I would like to try to answer it.

First of all, it is important to understand that we have quite a bit more than just basic arithmetic.
Let us define basic arithmetic as a language with symbols for the constants `0`

and `1`

, the operations for `+`

and `×`

, and the `=`

relation (anything similar to this will work as well).

On top of basic arithemetic we add the language of first-order logic.
That is we add logical operations for `and`

, `or`

, `not`

, etc., we add an infinite number of logical variables `x`

, `y`

, etc., and we add universal quantification, `∀`

, and existential quantification, `∃`

.

We provide logical and arithmetic axioms that defines these symbols to be our usual interpretation of them. The domain of the quantifiers is implicit in first-order logic. We intend our domain to be the natural numbers, and our axioms are compatible with this interpretation. Technically there will be other valid interpretations of our axioms, but that will not matter for our purposes today.

Our language of first-order logic over basic arithmetic is vastly more expressive than basic arithmetic alone. Building a formula in this language that expresses “I am not provable in F” requires three key ingredients.

Cantor pairing function

The first ingredient is the ability to define data structures. To do this, I will be using the Cantor pairing function, to define a functional relation that maps ordered pairs of natural numbers to a single natural number:

Pair(x, y, p) := p = ((x + y + 1) × (x + y)) ÷ 2 + y

To encode a pair of numbers, say 2 and 3, we plug them into the `x`

, `y`

parameters.
Then the variable `p`

is forced to be the value 18, which represents the ordered pair, 〈2, 3〉.
To decode a number, say 7, we plug it into the `p`

parameter.
Then the variables `x`

, and `y`

, which are natural numbers, are forced to be the values 2 and 1, which is the ordered pair that 7 represents.

We have a problem that the our definition of `Pair`

does not quite fit our definition of basic arithemtic because of the `÷`

operator.
However this is easily fixed by providing an equivalent definition that does not use `÷`

.

Pair(x, y, p) := 2 × p = (x + y + 1) × (x + y) + 2 × y

where `2 := 1 + 1`

(in general we will let

denote the sum of `n``n` occurances of the numeral `1`

).

Using first-order logic we can compose this ‘`Pair`

’ relation to encode and decode nested pairs and start building fancy data structures that you find in computer programming.
In particular, we can build data structures like lists or structures for the language of basic arithmetic or the language of first-order logic.
For example we may choose to represent a formula from basic arithemetic with the following recursive structure.

⸢0⸣ := 〈0, 0〉 ⸢1⸣ := 〈1, 0〉 ⸢X + Y⸣ := 〈2, 〈⸢X⸣, ⸢Y⸣〉〉 ⸢X × Y⸣ := 〈3, 〈⸢X⸣, ⸢Y⸣〉〉 ⸢X = Y⸣ := 〈4, 〈⸢X⸣, ⸢Y⸣〉〉

By using the Cantor pairing function, we can encode and decode these sturctures, letting us represent any basic arithemetic formula as a single natural number.

I should note that Gödel did not use the Cantor pairing function, and instead use prime decomposition. However, the computer scientist in me perfers the pairing function, and it has a simpler arithmetic relation that defines the encoding.

Gödel’s β function

The second ingredient is Gödel’s β function. While the Cantor pairing function can be use to build lists, the problem is that it only possible to access fixed location of that list. The β function encodes lists in such a way that it is easy to access a variable locations within the list. The β function is defined as

β(x, y, i) := x mod (1 + (i + 1) × y)

Gödel’s β funciton lets us encode a list of numbers, `a`

_{0}, `a`

_{1}, ..., `a`

_{n} with a pair of numbers `x`

and `y`

.
in such a way that for all `i`

less than `n`

, the `β(x, y, i) = a`

.
It is possible to choose _{i}`y`

’s value to find arbitrarly large and arbitrarily long arithemetic sequences of mutually coprime numbers.
Then, given a sufficent `y`

value, the Chinese remainder theorem lets us encode the content of the list as a single value `x`

.

Below we define β as functional arithemetic relation.

BETA(x, y, i, a) := ∃q. x = q × (1 + (i + 1) × y) + a and ∃z. a + z = (i + 1) × y

`BETA`

is defined so that `BETA(x, y, i, a)`

holds if and only if `β(x, y, i) = a`

.

Using the Cantor pairing function we could pair up the values `x`

and `y`

and `n`

into a single number that represents any list of numbers.
Using this new ability we can now define a structure that represents a trace for arbitrary computations.
For example we can define a relation that defines the substitutions for our encoding of formulas of first-order logic.

SUB(f, i, t, a) := “there exists a trace of the computation of substituting the arithmetic term encoded by t

into the variable labeled by i within the formula f and the result of that computation is a.”

the result being that the functional relation `SUB(⸢φ⸣, i, ⸢t⸣, a)`

that holds if and only if `a`

equals `⸢φ[x`

.
_{i}↦t]⸣

Similarly we can define a functional relation `CODE(x, y)`

such that it holds if an only if `y`

equals ⸢1 + 1 + … + 1⸣ where there are `x`

number of occurances of `1`

. We can also define predicate `PROVABLE_IN_F(x)`

that defines what it means for an encoding of a formula to be provable in some axiomatic system `F`, as long as it is decidable what the axioms of `F` are.

Quines

We now have a basic idea how we can specify datastructures and computations of those data structures using arithemtic. But how do we build a self-referential formula? While we can define arbitrary computations using first-order logic, it is not like we can write a computer program with a variable that has a contains a copy of the entire program itself. Or can we?

A Quine is a program that that prints out its own source code. However, it is just as easy to write a program that creates a variable contains the entire source code of program itself; a self-referential program! The technique to do this is the same as found in the proof of Kleene’s recursion theorems, and the implementation of the Y Combinator. In this way we can build self-referential arithemtic formulas.

Theorem. For any formula `ψ(y)`

, there exists a formula `φ`

such that `φ`

is equivalent to `ψ(⸢φ⸣)`

.

In other words, if we have a formula with a free variable `y`

, we can build a self-referential formula by subsituting that variable with the encoding of the resulting formula.
Like the Y Combinator, the proof of this theorem is both short and confusing.

Proof.
Let `θ := ∃y. ψ(y) and ∃z. SUB(x`

.
Let _{0}, 0, z, y) and CODE(x_{0}, z)`φ := θ[x`

.
Then _{0}↦⸢θ⸣]`φ`

is equal to `∃y. ψ(y) and ∃z. SUB(⸢θ⸣, 0, z, y) and CODE(⸢θ⸣, z)`

.
By the property of `CODE`

, the variable `z`

, must be equal to `⸢⸢θ⸣⸣`

,
and by the property of `SUB`

, the variable `y`

, must be equal to `⸢θ[x`

, which is equal to _{0}↦⸢θ⸣]⸣`⸢φ⸣`

.
Thus `φ`

is equivalent to `ψ(⸢φ⸣)`

.
Qed.

Now we can finally build the self-referential Gödel formula by applying `ψ(y) := not (PROVABLE_IN_F(y))`

to the above theroem to get a formula `φ`

such that `φ`

is equivalent to `not (PROVABLE_IN_F(⸢φ⸣))`

.