Divide-and-conquer algorithm
In
The divide-and-conquer technique is the basis of efficient algorithms for many problems, such as
Designing efficient divide-and-conquer algorithms can be difficult. As in mathematical induction, it is often necessary to generalize the problem to make it amenable to a recursive solution. The correctness of a divide-and-conquer algorithm is usually proved by mathematical induction, and its computational cost is often determined by solving recurrence relations.
Divide and conquer
The divide-and-conquer paradigm is often used to find an optimal solution of a problem. Its basic idea is to decompose a given problem into two or more similar, but simpler, subproblems, to solve them in turn, and to compose their solutions to solve the given problem. Problems of sufficient simplicity are solved directly. For example, to sort a given list of n natural numbers, split it into two lists of about n/2 numbers each, sort each of them in turn, and interleave both results appropriately to obtain the sorted version of the given list (see the picture). This approach is known as the merge sort algorithm.
The name "divide and conquer" is sometimes applied to algorithms that reduce each problem to only one sub-problem, such as the
An important application of divide and conquer is in optimization,[
Early historical examples
Early examples of these algorithms are primarily decrease and conquer – the original problem is successively broken down into single subproblems, and indeed can be solved iteratively.
Binary search, a decrease-and-conquer algorithm where the subproblems are of roughly half the original size, has a long history. While a clear description of the algorithm on computers appeared in 1946 in an article by John Mauchly, the idea of using a sorted list of items to facilitate searching dates back at least as far as Babylonia in 200 BC.[5] Another ancient decrease-and-conquer algorithm is the Euclidean algorithm to compute the greatest common divisor of two numbers by reducing the numbers to smaller and smaller equivalent subproblems, which dates to several centuries BC.
An early example of a divide-and-conquer algorithm with multiple subproblems is Gauss's 1805 description of what is now called the Cooley–Tukey fast Fourier transform (FFT) algorithm,[6] although he did not analyze its operation count quantitatively, and FFTs did not become widespread until they were rediscovered over a century later.
An early two-subproblem D&C algorithm that was specifically developed for computers and properly analyzed is the merge sort algorithm, invented by John von Neumann in 1945.[7]
Another notable example is the
numbers in operations (in Big O notation). This algorithm disproved Andrey Kolmogorov's 1956 conjecture that operations would be required for that task.As another example of a divide-and-conquer algorithm that did not originally involve computers,
Advantages
Solving difficult problems
Divide and conquer is a powerful tool for solving conceptually difficult problems: all it requires is a way of breaking the problem into sub-problems, of solving the trivial cases, and of combining sub-problems to the original problem. Similarly, decrease and conquer only requires reducing the problem to a single smaller problem, such as the classic Tower of Hanoi puzzle, which reduces moving a tower of height to move a tower of height .
Algorithm efficiency
The divide-and-conquer paradigm often helps in the discovery of efficient algorithms. It was the key, for example, to Karatsuba's fast multiplication method, the quicksort and mergesort algorithms, the Strassen algorithm for matrix multiplication, and fast Fourier transforms.
In all these examples, the D&C approach led to an improvement in the
Parallelism
Divide-and-conquer algorithms are naturally adapted for execution in
Memory access
Divide-and-conquer algorithms naturally tend to make efficient use of
The same advantage exists with regards to other hierarchical storage systems, such as NUMA or virtual memory, as well as for multiple levels of cache: once a sub-problem is small enough, it can be solved within a given level of the hierarchy, without accessing the higher (slower) levels.
Roundoff control
In computations with rounded arithmetic, e.g. with
Implementation issues
Recursion
Divide-and-conquer algorithms are naturally implemented as recursive procedures. In that case, the partial sub-problems leading to the one currently being solved are automatically stored in the procedure call stack. A recursive function is a function that calls itself within its definition.
Explicit stack
Divide-and-conquer algorithms can also be implemented by a non-recursive program that stores the partial sub-problems in some explicit data structure, such as a
Stack size
In recursive implementations of D&C algorithms, one must make sure that there is sufficient memory allocated for the recursion stack, otherwise, the execution may fail because of stack overflow. D&C algorithms that are time-efficient often have relatively small recursion depth. For example, the quicksort algorithm can be implemented so that it never requires more than nested recursive calls to sort items.
Stack overflow may be difficult to avoid when using recursive procedures since many compilers assume that the recursion stack is a contiguous area of memory, and some allocate a fixed amount of space for it. Compilers may also save more information in the recursion stack than is strictly necessary, such as return address, unchanging parameters, and the internal variables of the procedure. Thus, the risk of stack overflow can be reduced by minimizing the parameters and internal variables of the recursive procedure or by using an explicit stack structure.
Choosing the base cases
In any recursive algorithm, there is considerable freedom in the choice of the base cases, the small subproblems that are solved directly in order to terminate the recursion.
Choosing the smallest or simplest possible base cases is more elegant and usually leads to simpler programs, because there are fewer cases to consider and they are easier to solve. For example, an FFT algorithm could stop the recursion when the input is a single sample, and the quicksort list-sorting algorithm could stop when the input is the empty list; in both examples, there is only one base case to consider, and it requires no processing.
On the other hand, efficiency often improves if the recursion is stopped at relatively large base cases, and these are solved non-recursively, resulting in a
Thus, for example, many library implementations of quicksort will switch to a simple loop-based insertion sort (or similar) algorithm once the number of items to be sorted is sufficiently small. Note that, if the empty list were the only base case, sorting a list with entries would entail maximally quicksort calls that would do nothing but return immediately. Increasing the base cases to lists of size 2 or less will eliminate most of those do-nothing calls, and more generally a base case larger than 2 is typically used to reduce the fraction of time spent in function-call overhead or stack manipulation.
Alternatively, one can employ large base cases that still use a divide-and-conquer algorithm, but implement the algorithm for predetermined set of fixed sizes where the algorithm can be completely
The generalized version of this idea is known as recursion "unrolling" or "coarsening", and various techniques have been proposed for automating the procedure of enlarging the base case.[12]
Dynamic programming for overlapping subproblems
For some problems, the branched recursion may end up evaluating the same sub-problem many times over. In such cases it may be worth identifying and saving the solutions to these overlapping subproblems, a technique which is commonly known as
See also
- Akra–Bazzi method
- Decomposable aggregation function
- "Divide and conquer"
- Fork–join model
- Master theorem (analysis of algorithms)
- Mathematical induction
- MapReduce
- Heuristic (computer science)
References
- ISBN 978-0-511-77637-3.
- ISBN 978-0-262-53305-8.
- ^ Brassard, G., and Bratley, P. Fundamental of Algorithmics, Prentice-Hall, 1996.
- ^ Anany V. Levitin, Introduction to the Design and Analysis of Algorithms (Addison Wesley, 2002).
- ^ a b c Donald E. Knuth, The Art of Computer Programming: Volume 3, Sorting and Searching, second edition (Addison-Wesley, 1998).
- ^ Heideman, M. T., D. H. Johnson, and C. S. Burrus, "Gauss and the history of the fast Fourier transform", IEEE ASSP Magazine, 1, (4), 14–21 (1984).
- ISBN 0-201-89685-0.
- Bibcode:1963SPhD....7..595K.
- S2CID 62758836.
- ^ Nicholas J. Higham, "The accuracy of floating-point summation", SIAM J. Scientific Computing 14 (4), 783–799 (1993).
- ^ S2CID 6644892.
- ^ Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs" in Languages and Compilers for Parallel Computing, chapter 3, pp. 34–48. Lecture Notes in Computer Science vol. 2017 (Berlin: Springer, 2001).