10. Balanced binary reduction

Let us introduce the problem of finding not just the smallest of n elements, but the smallest and second smallest. The problem has a very distinguished pedigree, it was first addressed by a well-known British mathematician Charles Dodgson (Lewis Carroll). If you haven’t heard of him, you should. There is a very important book which he wrote, not the mathematical book, but the book called “Alice’s Adventures in Wonderland”. If you haven’t read it, do¹. No person should be hired ever unless they read “Alice in Wonderland”. In any case, he was also a mathematician. He also dabbled in all kind of games. Apparently he invented Scrabble and a bunch of other games².

At some point he decided that there is a clear problem with lawn tennis tournaments. He observed that with a very high probability, if you have say a tournament with 64 players, the guy who gets the second prize is actually not the second strongest. He observed that the strongest and second strongest could be paired in the first round. Therefore the second strongest guy gets eliminated and doesn’t get the second prize, in spite of his prowess. This is why they now use a technique known as seeding to assure that people of similar ability are spread out to different parts of the tree. But he wanted to come up with an algorithm which assures that the second-placed guy is truly the second best player. He published it in 1883³. The algorithm wasn’t quite an algorithm⁴ and it was clearly not optimal. It took 50 more years before the problem was stated correctly. People realized that you could talk about the minimum number of comparisons but it took another thirty years, until 1964 when a Russian mathematician Sergei S. Kislitsyn published a paper which proved there is an optimal algorithm and described it.

By the way, all of this information is available in a book called “The Art of Computer Programming” by Donald Knuth⁵. You really should buy it. You make a certain commitment. You spend $150 saying that you really care about programming. This is not a book which is “useful”, meaning it’s not “programming in Python for idiots”, or “information retrieval for 21st century” or something like that. This is one of the fundamental books which you buy and then spend your lifetime getting the information out of it. You get beautiful things which you then could use for programming in Python or information retrieval or other things. I’m going to be mentioning Knuth throughout the course. It’s not a perfect book, it is just the greatest book we’ve got. Some people think it’s a good reference book. No, it’s not, because you have to basically do linear search to find what you’re interested in. Another important thing; do not spend too much time solving problems. Read the solutions. They are right at the end. Lots of very important algorithms are described in the solutions to his problems. Reading Knuth has to become a lifelong activity.

Smallest and second smallest element

The problem of finding the smallest and second smallest element is described fully in “The Art of Computer Programming”, but somehow Knuth does not implement it. As we shall see it’s actually a little tricky.

How many comparisons do you need to solve this problem? Same as minmax_element from last time? No. You can do it in fewer. Let us try to use some logic. How many comparisons do we need to find the winner of the tournament? n - 1 because it is necessary to find the winner in order to find the second place guy. We could sketch a proof of this. Let us assume there are two potential guys greater than the second place guy. If neither are greater than him, he is first place, not second place. If both of them are, he isn’t second place.

What do we know about the second place guy, specifically about the games he lost? He only lost one game, and it was to the winner. This is a very important property which tells us why we don’t need to do many comparisons. If the winner remembers all the games he won, and who he played how do we find second place? We determine the best from the subset of players he beat.

How many people does the winner beat to win? For example, Wimbledon has 64 players who are admitted. The winner doesn’t play 63 games. The tournament is structured as a binary tree. This tree is how deep? If you have n elements? It’s ceil(log_2(n)). We could somehow arrange our tournament so that the winner will defeat this many people.

Now that we have a list of the ceil(log_2(n)) people who played the winner, we just find the best out of them which is ceil(log_2(n)) - 1 comparisons.

To review, we have n - 1 comparisons to get the winner. ceil(log_2(n)) - 1 to get the second best, from the list he played. So an upper bound on the comparisons for the algorithm is:

Actually implementing this algorithm effectively will require us to create several components.

What about divide and conquer?

It might appear you could use divide and conquer. First split the list of elements in two, find the min and second min of the first half, and the second half, and then merge them together doing two comparisons. It sounds very elegant because it’s all recursive. But, let us think about how many comparisons it’s going to do using simple mathematics.

That’s not what we’re trying to accomplish, so divide and conquer doesn’t always do what we think.

Tournament tree shapes

To get the number of comparisons that we want, we need to rearrange the tournament we play. Right now min_element plays a tree structure that looks like this:

It has n - 1 internal nodes. But we don’t want the winning element to be compared n - 1 times. We need to transform that into the way they play tennis tournaments. We need to balance the tree.

How do we do it? One way is to just pair up elements and build up. But then we need lots of memory to save the intermediate results⁶. Note that once a bottom-level round has been played, they are ready to move up. Our goal is basically to become eager. Whenever elements are ready to be paired together, we want to pair and compare them.

So if we store only the winner at each level, we only need to store log(n) things. We can define the power of each element to be the number of games they have played.

Realize that suddenly we see something which has nothing to do with our problem. The foundation of our algorithm is the ability to take a tree, like the linear (unbalanced) tree, and transform it into a balanced tree. What mathematical property allows us to do such a transformation? Why can we convert one kind of computation to the other? Associativity⁷. As long as our operation is associative, what property don’t we need? Commutativity⁸. We keep the elements in the same order, we’re just rebalancing parentheses⁹.

Binary counting and reduction

Here we come to the amazing idea of how to do this transformation. This is one of the most beautiful ideas which they kept secret from you. They should have taught it in high school. But, they want to publish papers themselves and not tell you the general mechanism.

Let us assume we have elements of type T that need to be paired or combined in some way, whether with min, +, merge, or any other associative operation on T. What we can do is create an array called a “counter”.

The nth slot of the counter will store the element that has had n “victories” so far. So if there is a guy in slot 0 he hasn’t played any games yet. If there is a guy in slot 2 he has won 2 games, and so on. This structure will help us to only pair up elements that have the same power.

The following example using min as the operation should make this clear. Initially the counter has zero in every entry:

Take a new guy x who has never played any games, and look at the guy in the first slot of the counter. The existing guy is either zero or not. If it’s zero, put the new guy x in the counter at index 0 (he has not played any games).

Now take another guy y. Since x is in the first slot of the counter, we must pair them up. The winner moves on up to the next slot in the counter, as they have now won a game. So if y wins:

What if the index 1 slot was non-zero, after comparing x and y? Then the guy there already won one game. So, we must carry propagate¹⁰. Repeat the same process all the way up the counter, until we find a slot which is zero. What if the counter is full, and has no zero slots? That’s called an overflow.

We borrow terminology from binary integer arithmetic because our counter works just like a binary integer counting up:

But instead of 0 and 1 in each slot or “bit” we have arbitrary elements that are combined with an associative operation.

Handling overflow

What do we do if the counter overflows? Whenever we don’t know how to proceed, do something sensible and let whomever uses it figure out what is a sensible thing to do. Return the carry. If the return is non-zero the programmer who called the counter will know it overflowed and can decide what to do. Maybe they will extend the counter or throw an error. It’s his business not ours.

Let us be lazy. The great success in programming comes because there are lazy people who say, “I don’t want to know now, I’ll find out later.” Right now we are solving this problem. We have an associative binary operation of some kind on type T and what we discovered is if we have associativity, we can make this counter and it will work for us.

Implementation

Now we have to write the code. The first function will add an element to the counter using the process we just described.

op is not commutative, so we need to be careful about the order it is evaluated in. Elements in the counter got there before, so they were to the left of the element we are inserting.

Notice that zero is const T& reference because we don’t plan to modify it, but we do modify carry, so it should be passed by value.

Reduction

After we finish adding all our elements to the counter, they might not all be reduced to one element. There may be several elements left sitting at various levels of the counter. We need to do one more pass of the operation to combine them into the final result.

This second function does that. It applies the operation, in the same manner to the elements left sitting in the counter.

We have to be careful with zero. Sometimes it will work with the operation so apply op(x, zero) gives you x, but sometimes that won’t happen. So we can’t really initialize to zero.

Binary counter class

Start with algorithms

We want to take these two algorithms and combine them into an object. We have two beautiful algorithms but they’re stateless. Everything is outside, and this is what we should always do, no matter what we have been taught in software engineering classes by very wise object-oriented professors. You start with algorithms. You don’t start with objects. Figure out what you’re going to do first. But you don’t have to stop there. Because you can then put things together into an object.

It’s very easy when you write an algorithm to have a minimal iterator interface. In this case, the iterators externalize the counter. We say, “we don’t want to know about him, we’re just algorithms people”. We assume the principle that we will have no state for about five minutes, and stay very functional, but then turn around and deal with state. We look at the whole thing.

Counter storage

So what do we think is the state of this counter? How should we store the counter? Don’t we have millions of elements to reduce? The counter is size log(n) which will never be greater than 64. So, it’s actually a small fixed size. We will store it in a std::vector.

counter is private, we don’t want people to muck up our counter. Same with our operation.

Using initializer lists instead of assignment in constructor is important. If you initialize in the body, it will first call default constructors for members, then you overwrite all the work with an assignment.

I think it is very beautiful. We could compete with Steve Jobs for elegance of our design¹¹.

Exercise: In numerical analysis, whenever you sum up large numbers you don’t really want to add small quantities to big quantities. Bad things happen to the errors¹². Use this code to write a function which sums arrays of double.

Exercise: Rewrite min_element using this code (just min_element, don’t worry about second best).

Exercise: If you want to implement merge sort you can use exactly the same device, since merge is associative. The idea with merge sort, is you only want to merge lists if they are roughly the same length and this helps you do it (see Chapter 12). Write the associative binary operation merge which can combine two sorted arrays into a sorted array.

Exercise: When we become grownups we learn about advanced data structures, such as binomial forest. They use the same idea. Learn about this data structure and try to figure out where the counter could be used.

What is in-place memory usage?

How significant is the storage of our counter? We use the term in-place to indicate the memory usage of an algorithm is not significant. A long time ago people thought algorithms were in-place if they didn’t require any extra memory. Then they decided constant memory was enough. But, then they said it doesn’t quite work because our favorite algorithm doesn’t work. Of course the most important algorithm is quicksort. It’s not in-place because it’s recursive. Recursive algorithms tend to require O(log(n)). Quicksort splits roughly in the middle let’s assume on average.

So people say log(n) is good so that will count as in-place. Then they said, “what if we have nested things?” So is log(n)^2 ok? In our universe log(n) <= 64 so this is 4096. Then theoreticians said it’s really alright if we have “poly-logarithmic” storage Basically as long as the memory requirement is O(p(log(n))) where p is a polynomial, it’s alright. Polynomials get a lot bigger than square, but let’s go with “poly-logarithmic” being “in-place”.

Code

The book is freely available from Project Gutenberg.↩
The invention of Scrabble is attributed to Lewis Carroll’s brief journal entry: “A game might be made of letters, to be moved about on a chess board till they form words” (Dec 19th, “The Life and Letters of Lewis Carroll” ). See also “History of Scrabble” and “The games of Lewis Carroll”.

His book “The Game of Logic” teaches formal logic using a board game. It is also available from Project Gutenberg.
↩
“Lawn tennis tournaments; the true method of assigning prizes with a proof of the fallacy of the present method”. London, Macmillan and co., 1883.↩
Knuth: it is not formulated precisely enough to qualify as an algorithm.↩
See chapter 5.3.3 in Volume 3 of “The Art of Computer Programming”.↩
Alex: What do we mean when we say lots of memory? O(n) is bad, O(sqrt(n)) is pretty bad. See the definition of “in-place memory usage” at the end of the chapter.↩
A binary function f is associative if the following holds for all a, b, c in its domain:
```
f(f(a, b), c)) = f(a, f(b, c))
```
Informally, f can be applied in any order. For example, addition of integers is associative:
```
(a + b) + c = a + (b + c)
```
Subtraction of integers is not associative:
```
(3 - 2) - 1 != 3 - (2 - 1)
```
You can see why associativity is often referred to as being able to “re-parenthesize” expressions.
↩
A binary function f is commutative if for all a, b in its domain:
```
f(a, b) = f(b, a)
```
Informally, f gets the same result, regardless of the order of the inputs. For example, multiplication of integers is commutative:
```
a * b = b * a
```
In Chapter 9.1 of “From Mathematics to Generic Programming”, Alex gives a neat visual proof of this fact for integer multiplication:
```
              * * *
* * * * *     * * *
* * * * *  =  * * *
* * * * *     * * *
              * * *
```
Or as Dirichlet put it: “Whether you arrange soldiers in rows or columns, you still have the same number of soldiers”.

An example of an operation which is not commutative is string concatenation.
```
"Hello, " + "World!" != "World!" + "Hello, "
```
↩
Alex: If you think about it, our min is not quite commutative. In mathematics min is commutative. But because we want to preserve stability, it is not. We distinguish between the left and right argument.↩
The terms carry and overflow are closely associated with the implementation of binary counting or addition as an electrical circuit, called an adder.

A single bit adder has two inputs a and b for the two digits to add together. It outputs a digit s which is the digit to display for this place value. There is a supplementary output called the carry, which is the amount that needs to move to the higher place value.

The circuit behavior is determined by the following input/output table:
```
a  b   c  s
----------
1  0 | 0  1
0  1 | 0  1
1  1 | 1  0
```
This adds a single digit, but we want to add entire numbers. To do so we add an additional input l for carried digits from lower place values.
```
a  b  l    c  s
----------------
1  0  0  | 0  1
0  1  0  | 0  1
1  1  0  | 1  0
1  0  1  | 1  0
0  1  1  | 1  0
1  1  1  | 1  1
```
Now we can chain n of them together to be able to add n digits
```
    s_1 c_1-----|      s_2 c_2----  ...
    |   |       |      |   |
 -----------    |   ------------
 |    |    |    |   |    |    |
l_1  a_1  b_1   |_ l_2   a_2  b_2
```
An overflow is when the last adder has a non-zero carry.
↩
Alex: Maybe we should make it in China. That’s a necessary prerequisite for beautiful design. Designed in Cupertino, assembled in China. So let’s try to assemble our machine in Palo Alto.↩
Alex is referring to the following issue: When floating point numbers become very large they can no longer represent small nearby increments. So if many small numbers are accumulated in order, the sum may grow large enough to ignore the contribution of any particular element. To see for yourself, observe that:
```
double x = 1.45 * pow(2, 60);
assert(x == x + 1.0);
```
This is because floats are represented as a base and exponent, and the magnitude of the exponent constrains the precision.

If we forget about computers, and think about science or engineeing, it’s natural to consider:
```
1.45 * 2^60 + 1 ~= 1.45 * 2^60
```
because the 1 is not very significant to the overall magnitutde.
↩