Question on this proof?

Question

anonymous · Answer

Let I_1,I_2,I_3,\dots be half-open intervals in the real line, with the length of I_n equal to 2^{-n}. In other words, I_n is an interval of the form [a_n,a_n+2^{-n}). Can the union of these intervals contain every real number in the interval [0,2)?

At first sight, the answer seems obvious: the lengths of the intervals I_n add up to 1 (since they are \frac 12, \frac 14, \frac 18, and so on). Since the interval from 0 to 2 has length 2, we don't have enough length to cover this interval.

Since that reasoning sounds quite convincing, let us try to turn it into a rigorous proof. It might go something like this. Let us imagine that we place the intervals I_n down one by one, possibly overlapping, on to the real line. When we put down the first one, then four things can happen: we may miss the interval [0,2) altogether, we may overlap it and contain some numbers larger than 2, we may overlap it and contain some numbers less than 0, or our interval may be contained in [0,2). In the first case, what is left uncovered is all of [0,2); in the second it is an interval of the form [0,a) with a\leq 2; in the third it is an interval of the form [a,2) with a\geq 0; and in the fourth it is a set of the form [0,a)\cup[b,2) with b-a=1/2. In all four cases the uncovered points form a union of at most two intervals, with total length at least 3/2.

Continuing in this vein, we can prove quite easily (and rigorously) that after we have placed the intervals I_1,\dots,I_n, we are left with a finite union of disjoint intervals of total length at least 2-\frac 12-\frac 14-\dots-\frac 1{2^n}. If we write R_n for this union (the letter R standing for "residue") then we find that R_1\supset R_2\supset \dots and that every R_n is a finite union of disjoint intervals with total length greater than 1.

Surely in that case it follows that the intersection of the sets R_n has total length at least 1. Why can't we just say that and have done with it? The problem is that we have to say what we mean by "length", and while that is easy enough for finite unions of disjoint intervals (just add up the lengths of the intervals, where the length of the interval (a,b) is defined to be b-a, and similarly for closed and half-open intervals), it is not quite so obviously easy for intersections of infinitely many such unions, even if they are nested. For example, the Cantor set is a pretty complicated set, but there is no difficulty in representing it as the intersection of a nested sequence of finite unions of intervals.

Perhaps we could define the length of such a set to be the limit of the (decreasing sequence of the) lengths of the finite unions of disjoint intervals. Doesn't that solve the problem in a nice simple way? Unfortunately not, because if we do that then we must prove that this definition is consistent. That is, we must prove that if we represent the same set in two different ways as a nested intersection of finite unions of disjoint intervals, as we undoubtedly can, then we will arrive at the same answer for the length. This is directly important to us, since otherwise it is conceivable that we could assign a length of 1 to the empty set, and if we could do that then we could not say with any confidence that the intersection of the sets R_n is non-empty, which is what we are trying to prove.

Here is another possible argument. Let us place the intervals down in such a way that each one covers as much of [0,2) as possible. Clearly, the best way to do this is to make them disjoint. For example, we could set I_1=[0,1/2), I_2=[1/2,3/4), I_3=[3/4,7/8), and so on. But then the union of all the I_n is [0,1). Since this union was clearly as big as it could possibly be, it is impossible to cover all of [0,2).

What is wrong with this argument? The answer is more or less the same as before. What was meant by "as much ... as possible"? That suggests some notion of size for subsets of [0,2), and for the argument to work, that notion of size must apply to the union of the intervals I_n, and once again we find ourselves needing to prove that such an assignment can be done in a consistent way.

Now this can be done: the appropriate notion is called Lebesgue measure. However, it is not at all straightforward to set up Lebesgue measure, and if you try to do it you will find that at some point you need to convince yourself of a fact that is very similar to the fact that we are trying to prove. So even this answer is somewhat question-begging.♦

It is at this point that we should try to follow the advice in the title of this article: we shall see if we can prove in a rigorous way that our approaches so far cannot establish the statement that we are trying to prove. But to do this we need to be very clear about what our approaches are. Here is an attempt to do this.

    First approach: if R_1\supset R_2\supset R_3\supset\dots are finite unions of disjoint intervals, and the total length of the intervals that make up R_n is always at least 1, then the intersection of all the R_n clearly cannot be empty.

    Second approach: if a disjoint collection of intervals does not cover [0,2), then you obviously cannot make it cover [0,2) by translating the intervals.

As ever, if you want to find the weak point in an argument, look out for words and phrases such as "clearly", "obviously", or "it is easy to see that". Often they indicate the place that the writer is not quite 100% sure about.

Let's focus on the second argument for a while, and try to see why it seems plausible. One possible reason is this. Imagine that you arranged your intervals to be [0,1/2), [1/2,3/4), [3/4,7/8), and so on. Now imagine that we translate these intervals one by one to their new places. Then at each stage there is no hope of covering [0,2), so we don't cover [0,2) by the end of the process.

If we examine that last sentence carefully, we see that the first assertion can be justified (because at each stage we have a disjoint finite union of intervals of total length at most 1) but the "so" part involves something more. In general it's absolutely not permitted to say that if something happens at every finite stage of an infinite process then it happens for the whole process. To give an immediate example, we have a finite union of disjoint intervals at every finite stage of our process, but we do not necessarily have a finite union of disjoint intervals at the end of the process.

We seem to keep running into similar problems. But what does it mean to prove rigorously that our approach cannot work? It can in fact mean various different things, but one way of demonstrating convincingly that an approach cannot work is to show that the same approach in a different context gives rise to a false result. So let's see if we can think of a different context. We'll need a context in which we can make sense of the notion of an interval, or at least of a type of set where we can say how large it is, and where the largeness doesn't change when we translate a set. And we'll want it to be the case that the class of disjoint finite unions of sets of the given type is closed under finite unions and intersections. And we'll want the sizes of all these sets to be well-defined. And so on.

The easiest example in this case is obtained by looking at intervals of rational numbers rather than intervals of real numbers. How does one come up with this example? Just by looking at the given example and seeing whether anything in it could be generalized. One candidate is "real number", which one might consider generalizing to "element of some number system". But for intervals to be sensible objects, we need the number system to be one-dimensional and "dense", so the rational numbers are the obvious example (even if one could imagine other examples such as \mathbb{Q}(\sqrt{2})).

And sure enough, the result is false for rational numbers. To see why, just enumerate all the rationals in [0,2) and make sure that the nth rational is contained in I_n.

What this example demonstrates is that any proof must use some feature of the reals that is not true of the rationals. And it's not hard to guess what this feature will be: the completeness property. This gives us a big clue about what a correct proof will have to look like. Almost certainly it will prove that the intervals don't cover [0,2) not by directly coming up with a number that is not covered, but by coming up with a sequence of numbers such that the limit is not covered. And once we've had that thought, we should recognise that we are probably looking for a just-do-it proof.

Let's go back then to our sequence of sets R_1\supset R_2\supset R_3\supset \dots, each of which was a finite union of disjoint intervals of total length at least 1. We'd like to find a point that belongs to every single R_n. A natural way to try to do this would be to build a sequence of nested closed intervals J_1\supset J_2\supset\dots with J_n\subset R_n, but this doesn't work because there is no reason for R_{n+1} to intersect J_n. But we can use the more general fact that a nested sequence of closed bounded sets has a nonempty intersection. Then our task is to find subsets F_n\subset R_n with F_n closed. Recall that what we know about R_n is that it is a finite union of disjoint intervals.

The obvious way of finding a closed subset F_n of R_n, given that it suits us to have F_n as big as possible, is to take the intervals that make up R_n and make them closed, and, if necessary, very slightly smaller. For instance, if one of the intervals is [a,b) then we'll change it to [a,b-\epsilon] for some very small \epsilon. But that doesn't quite work because we want the sets F_n to be nested too. Just to express more easily what we shall do about this, let us write \lambda(R) for the total length of the intervals that make a finite union R of disjoint intervals.

We now define S_n to be a finite union of disjoint closed intervals such that \lambda(S_n)\geq\lambda(R_n)-\epsilon_n. Then we define F_n to be the intersection of the sets S_1,\dots,S_n. This will be R_n with some intervals removed, the total length of which is at most \epsilon_1+\dots+\epsilon_n. So if we choose the \epsilon_n to sum to 1/2, say, then \lambda(F_n) is always at least 1/2, which implies that F_n is nonempty. And now we are done.

For the purposes of this article, the main point of interest in the above proof is that it used the fact that the intersection of a nested sequence of closed bounded sets is nonempty, which is a theorem that's very much about the real numbers. (It follows easily from the Bolzano-Weierstrass theorem.) And we were led to it by realizing that we had to use such a theorem because the result was false for the rationals.