Infinities as numbers: purging the epsilons and deltas from proofs

Part 6 of a six-part series of adaptations from Terence Tao’s book “Structure and Randomness“.

[1.5] Ultrafilters, non-standard analysis, and epsilon management

In the previous article we saw that fruitful analogies between finitary and infinitary mathematics can allow the techniques of one to shed light on the other. Here we borrow the power of infinitary math—in particular, ultrafilters and non-standard analysis—to simplify proofs of finitary statements.

Arguments in hard analysis are notorious for their profusion of “epsilons and deltas”, a more familiar example being the $(\epsilon, \delta)$-definition of convergence from high school calculus. One may have to keep track of a whole army of epsilons, some of which are “small”, “very small” (i.e. negligible as compared to even the “small” epsilons), “very very small” and so on. This “epsilon management” is exacerbated with those unsightly quantifiers (“for every $\varepsilon$ there exists $N$ such that…”) sprinkled within any statement. These quantifers need care to weave together, and need careful untangling to comprehend. To borrow a rather mild example from the previous article,

Finite convergence principle. If $\varepsilon > 0$ and $F$ is a function from the positive integers to the positive integers, and $0 \leq x_1 \leq x_2 \leq \dotsb \leq x_M \leq 1$ is such that $M$ is sufficiently large depending on $F$ and $\varepsilon$, then there exists an integer $N$ with $1 \leq N < N + F(N) \leq M$ such that $\lvert x_n - x_m\rvert \leq \varepsilon$ for all $N \leq n, m \leq N + F(N)$.

(Anyone knows of more convoluted examples?)

“Automating” epsilon management has progressed in “asymptotic notation” like the $O()$ family of notations, as well as the $\ll$– and $\sim$-type symbols, which rigorously formulate the respective qualitative ideas of “bounded by”, “much smaller than” and “comparable in size to”, without resorting to explicit quantities like $\varepsilon$ and $N$.

However, the absence of actual quantities inhibits detailed study; for instance, sums and products of “bounded numbers” (i.e. $O(1)$) are also bounded, but it’s meaningless (obstructed by an axiom of set theory) to say that the set of $O(1)$ is closed under addition and multiplication. Non-standard analysis solves this problem by adding new numbers into our number system, including infinities and infinitesimals.

Such “new numbers” are defined using non-principal ultrafilters, which are method to find the $p$-limit of any sequence $(x_n) = x_1, x_2, \dotsc$ of real numbers. If $(x_n)$ converges then the $p$-limit is the usual limit. However, the sequence $1, 2, 3, \dotsc$ has a $p$-limit of the ordinal $\omega$, which you can think of as “the smallest infinity”. You can get bigger infinities from the $p$-limits of sequences like $\omega, 2\omega, 3\omega, \dotsc$ (which understandably converges to $\omega^2$).

The standard real number system together with all possible $p$-limits forms the set of non-standard numbers, or hyperreal numbers. The analogue of an $O(1)$ number is then a hyperreal number that is smaller than some standard real number. In fact, this set is a ring and we can readily apply every insight from ring theory to it. This is made possible from the principle that non-standard numbers can be manipulated just like standard numbers, also known as:

Transfer principle. Every proposition valid over real numbers is also valid over the hyperreals.

This allows us to take reciprocals of infinities to get infinitesimals for use in calculus. In fact, allowing calculus to rigorously work with infinitesimals was a major motivation for the development of non-standard analysis. For any infinitesimal $\varepsilon$, the  $p$-limit of the sequence $1, \varepsilon, \varepsilon^2, \dotsc$ is much, much smaller than $\varepsilon$. This process can be iterated to churn out a heirarchy of infinitesimals that shrink at a ridiculous pace, simplifying epsilon management. Tao says that:

“it lets one avoid having to explicitly write a lot of epsilon-management phrases such as ‘Let $\eta_2$ be a small number (depending on $\eta_0$ and $\eta_1$) to be chosen later’ and ‘… assuming $\eta_2$ was chosen sufficiently small depending on $\eta_0$ and $\eta_1$‘, which are very frequent in hard analysis literature…”

I guess ultrafilters do not change a proof in essence but greatly simplifies its language, freeing one’s attention for the big picture, as opposed to wading in a swamp of $\forall\varepsilon_1\exists\varepsilon_2\forall N_1 \exists N_2 \gg N_1/(\varepsilon1\varepsilon_2)$.

Further discussion

The original blog post details several important limitations to the above properties. It also develops some interesting properties of ultrafilters, such as their connection to the usual limits (and identities concerning them) and propositional logic. Many of these properties are explained with a wonderfully illustrative analogy of an ultrafilter as a “voting system”: In a sequence $(x_n)$, each integer $i = 1, 2, \dotsc$ “votes” on some real number $x_i$, and the $p$-limit is the elected candidate. Different ultrafilters are distinguished by how much influence each integer has on the final decision (voting is unfair!)

The connection with propositional logic comes with asking each integer a yes-no question and evaluating the $p$-limit, which would be either yes ($1$) or no ($0$). A property $P(n)$ of integers (i.e. $n > 0$) is “$p$-true” (i.e. almost surely true) if the “decision” after asking the integers “do you satisfy $P$?” is “yes”. $p$-truths satisfy the laws of logic, but tautologically true statements are $p$-true!

References

[1.5] Tao, Terence. Ultrafilters, non-standard analysis, and epsilon management. In Structure and Randomness: Pages from Year One of a Mathematical Blog. American Mathematical Society (2008).