Euclidean minimum spanning tree
A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.
The edges of the minimum spanning tree meet at angles of at least 60°, at most six to a vertex. In higher dimensions, the number of edges per vertex is bounded by the
A Euclidean minimum spanning tree, for a set of points in the
Publications on the Euclidean minimum spanning tree commonly abbreviate it as "EMST".[4][5] They may also be called "geometric minimum spanning trees",[6][7] but that term may be used more generally for geometric spaces with non-Euclidean distances, such as Lp spaces.[8] When the context of Euclidean point sets is clear, they may be called simply "minimum spanning trees".[9][10][11]
Several other standard geometric networks are closely related to the Euclidean minimum spanning tree:
- The Steiner tree problem again seeks a system of line segments connecting all given points, but without requiring the segments to start and end only at given points. In this problem, additional points may be added as segment endpoints, allowing the Steiner tree to be shorter than the minimum spanning tree.[12]
- In the Euclidean traveling salesperson path problem, the connecting line segments must start and end at the given points, like the spanning tree and unlike the Steiner tree; additionally, each point can touch at most two line segments, so the result forms a polygonal chain. Because of this restriction, the optimal path may be longer than the Euclidean minimum spanning tree, but is at most twice as long.[2]
- Geometric spanners are low-weight networks that, like the minimum spanning tree, connect all of the points. Unlike the minimum spanning tree, all of these connecting paths are required to be short, having length proportional to the distance between the points they connect. To achieve this property, these networks generally have cycles and so are not trees.[13]
Properties
Angles and vertex degrees
Whenever two edges of a Euclidean minimum spanning tree meet at a vertex, they must form an angle of 60° or more, with equality only when they form two sides of an equilateral triangle. This is because, for two edges forming any sharper angle, one of the two edges could be replaced by the third, shorter edge of the triangle they form, forming a tree with smaller total length.[14] In comparison, the Steiner tree problem has a stronger angle bound: an optimal Steiner tree has all angles at least 120°.[12]
The same 60° angle bound also occurs in the kissing number problem, of finding the maximum number of unit spheres in Euclidean space that can be tangent to a central unit sphere without any two spheres intersecting (beyond a point of tangency). The center points of these spheres have a minimum spanning tree in the form of a star, with the central point adjacent to all other points. Conversely, for any vertex of any minimum spanning tree, one can construct non-overlapping unit spheres centered at and at points two units along each of its edges, with a tangency for each neighbor of . Therefore, in -dimensional space the maximum possible degree of a vertex (the number of spanning tree edges connected to it) equals the kissing number of spheres in dimensions.[15] Planar minimum spanning trees have degree at most six, and when a tree has degree six there is always another minimum spanning tree with maximum degree five.[7] Three-dimensional minimum spanning trees have degree at most twelve.[15] The only higher dimensions in which the exact value of the kissing number is known are four, eight, and 24 dimensions.[16]
For points generated at random from a given continuous distribution, the minimum spanning tree is almost surely unique. The numbers of vertices of any given degree converge, for large number of vertices, to a constant times that number of vertices. The values of these constants depend on the degree and the distribution. However, even for simple cases—such as the number of leaves for points uniformly distributed in a unit square—their precise values are not known.[17]
Empty regions
For any edge of any Euclidean minimum spanning tree, the lens (or vesica piscis) formed by intersecting the two circles with as their radii cannot have any other given vertex in its interior. Put another way, if any tree has an edge whose lens contains a third point , then it is not of minimum length. For, by the geometry of the two circles, would be closer to both and than they are to each other. If edge were removed from the tree, would remain connected to one of and , but not the other. Replacing the removed edge by or (whichever of these two edges reconnects to the vertex from which it was disconnected) would produce a shorter tree.[12]
For any edge of any Euclidean minimum spanning tree, the rhombus with angles of 60° and 120°, having as its long diagonal, is disjoint from the rhombi formed analogously by all other edges. Two edges sharing an endpoint cannot have overlapping rhombi, because that would imply an edge angle sharper than 60°, and two disjoint edges cannot have overlapping rhombi; if they did, the longer of the two edges could be replaced by a shorter edge among the same four vertices.[12]
Supergraphs
Certain
- The relative neighborhood graph, which has an edge between any pair of points whenever the lens they define is empty.
- The Gabriel graph, which has an edge between any pair of points whenever the circle having the pair as a diameter is empty.
- The Delaunay triangulation, which has an edge between any pair of points whenever there exists an empty circle having the pair as a chord.
- The Urquhart graph, formed from the Delaunay triangulation by removing the longest edge of each triangle. For each remaining edge, the vertices of the Delaunay triangles that use that edge cannot lie within the empty lune of the relative neighborhood graph.
Because the empty-region criteria for these graphs are progressively weaker, these graphs form an ordered sequence of subgraphs. That is, using "⊆" to denote the subset relationship among their edges, these graphs have the relations:
Another graph guaranteed to contain the minimum spanning tree is the Yao graph, determined for points in the plane by dividing the plane around each point into six 60° wedges and connecting each point to the nearest neighbor in each wedge. The resulting graph contains the relative neighborhood graph, because two vertices with an empty lens must be the nearest neighbors to each other in their wedges. As with many of the other geometric graphs above, this definition can be generalized to higher dimensions, and (unlike the Delaunay triangulation) its generalizations always include a linear number of edges.[20][21]
Total length
For points in the unit square (or any other fixed shape), the total length of the minimum spanning tree edges is . Some sets of points, such as points evenly spaced in a grid, attain this bound.[12] For points in a unit hypercube in -dimensional space, the corresponding bound is .[22] The same bound applies to the expected total length of the minimum spanning tree for points chosen uniformly and independently from a unit square or unit hypercube.[23] Returning to the unit square, the sum of squared edge lengths of the minimum spanning tree is . This bound follows from the observation that the edges have disjoint rhombi, with area proportional to the edge lengths squared. The bound on total length follows by application of the Cauchy–Schwarz inequality.[12]
Another interpretation of these results is that the average edge length for any set of points in a unit square is , at most proportional to the spacing of points in a regular grid; and that for random points in a unit square the average length is proportional to . However, in the random case, with high probability the longest edge has length approximately
Any
Subdivision
If every edge of a Euclidean minimum spanning tree is subdivided, by adding a new point at its midpoint, then the resulting tree is still a minimum spanning tree of the augmented point set. Repeating this subdivision process allows a Euclidean minimum spanning tree to be subdivided arbitrarily finely. However, subdividing only some of the edges, or subdividing the edges at points other than the midpoint, may produce a point set for which the subdivided tree is not the minimum spanning tree.[25]
Computational complexity
For points in any dimension, the minimum spanning tree can be constructed in time by constructing a complete graph with an edge between every pair of points, weighted by Euclidean distance, and then applying a graph minimum spanning tree algorithm such as the Prim–Dijkstra–Jarník algorithm or Borůvka's algorithm on it. These algorithms can be made to take time on complete graphs, unlike another common choice, Kruskal's algorithm, which is slower because it involves sorting all distances.[13] For points in low-dimensional spaces, the problem may be solved more quickly, as detailed below.
Computing Euclidean distances involves a square root calculation. In any comparison of edge weights, comparing the squares of the Euclidean distances, instead of the distances themselves, yields the same ordering, and so does not change the rest of the tree's computation. This shortcut speeds up calculation and allows a minimum spanning tree for points with integer coordinates to be constructed using only integer arithmetic.[20]
Two dimensions
A faster approach to finding the minimum spanning tree of planar points uses the property that it is a subgraph of the Delaunay triangulation:
- Compute the Delaunay triangulation, which can be done in time. Because the Delaunay triangulation is a planar graph, it has at most edges.
- Label each edge with its (squared) length.
- Run a graph minimum spanning tree algorithm. Since there are edges, this requires time using any of the standard minimum spanning tree algorithms.
The result is an algorithm taking time,[2] optimal in certain models of computation (see below).
If the input coordinates are integers and can be used as
Higher dimensions
The problem can also be generalized to points in the -dimensional space . In higher dimensions, the connectivity determined by the Delaunay triangulation (which, likewise, partitions the convex hull into -dimensional simplices) contains the minimum spanning tree; however, the triangulation might contain the complete graph.[4] Therefore, finding the Euclidean minimum spanning tree as a spanning tree of the complete graph or as a spanning tree of the Delaunay triangulation both take time. For three dimensions the minimum spanning tree can be found in time , and in any greater dimension, in time
The optimal time complexity for higher-dimensional minimum spanning trees remains unknown,[29] but is closely related to the complexity of computing bichromatic closest pairs. In the bichromatic closest pair problem, the input is a set of points, given two different colors (say, red and blue). The output is a pair of a red point and a blue point with the minimum possible distance. This pair always forms one of the edges in the minimum spanning tree. Therefore, the bichromatic closest pair problem can be solved in the amount of time that it takes to construct a minimum spanning tree and scan its edges for the shortest red–blue edge. Conversely, for any red–blue coloring of any subset of a given set of points, the bichromatic closest pair produces one edge of the minimum spanning tree of the subset. By carefully choosing a sequence of colorings of subsets, and finding the bichromatic closest pair of each subproblem, the minimum spanning tree may be found in time proportional to the optimal time for finding bichromatic closest pairs for the same number of points, whatever that optimal time turns out to be.[4][11]
For uniformly random point sets in any bounded dimension, the Yao graph[20] or Delaunay triangulation have linear expected numbers of edges, are guaranteed to contain the minimum spanning tree, and can be constructed in linear expected time.[21][6][30] From these graphs, the minimum spanning tree itself may be constructed in linear time, by using a randomized linear time algorithm for graph minimum spanning trees.[31] However, the poor performance of these methods on inputs coming from clustered data has led algorithm engineering researchers to develop methods with a somewhat slower time bound, for random inputs or inputs whose distances and clustering resemble those of random data, while exhibiting better performance on real-world data.[8][32][5]
A well-separated pair decomposition is a family of pairs of subsets of the given points, so that every pair of points belong to one of these pairs of subsets, and so that all pairs of points coming from the same pair of subsets have approximately the same length. It is possible to find a well-separated pair decomposition with a linear number of subsets, and a representative pair of points for each subset, in time . The minimum spanning tree of the graph formed by these representative pairs is then an approximation to the minimum spanning tree. Using these ideas, a -approximation to the minimum spanning tree may be found in time, for constant . More precisely, by choosing each representative pair to approximate the closest pair in its equivalence class, and carefully varying the quality of this approximation for different pairs, the dependence on in the time bound can be given as for any fixed dimension.[33]
Dynamic and kinetic
The Euclidean minimum spanning tree has been generalized in many different ways to systems of moving or changing points:
- If a set of points undergoes a sequence of dynamic insertions or deletions of points, each of these updates induces a bounded amount of change to the minimum spanning tree of the points. When the update sequence is known in advance, for points in the plane, the change after each insertion or deletion can be found in time per insertion or deletion.[34] When the updates must be handled in an online manner, a slower (but still poly-logarithmic) time bound is known.[35] For higher-dimensional versions of the problem the time per update is slower, but still sublinear.[36]
- For points moving linearly with constant speed, or with more general algebraic motions, the minimum spanning tree will change by a sequence of swaps, in which one edge is removed and another replaces it at a point in time where both have equal length.[37] For linear motions, the number of changes is at most slightly larger than .[38] For more general algebraic motions, there is a near-cubic upper bound on the number of swaps, based on the theory of Davenport–Schinzel sequences.[39]
- The minimum moving spanning tree problem again concerns points moving linearly with constant speed, over an interval of time, and seeks a single tree that minimizes the maximum sum of weights occurring at any instant during this interval. It is NP-hard to compute exactly, but can be approximated to within a factor of two in polynomial time.[40]
- The kinetic Euclidean minimum spanning tree problem asks for a kinetic data structure that can maintain the minimum spanning tree as its points undergo both continuous motions and insertions and deletions. Several papers have studied such structures,[41][42][43][44][45] and a kinetic structure for algebraically moving points with near-cubic total time, nearly matching the bound on the number of swaps, is known.[44]
Lower bound
An asymptotic lower bound of of the Euclidean minimum spanning tree problem can be established in restricted models of computation. These include the
Applications
An obvious application of Euclidean minimum spanning trees is to find the cheapest network of wires or pipes to connect a set of places, assuming the links cost a fixed amount per unit length. The first publications on minimum spanning trees more generally concerned a geographic version of the problem, involving the design of an electrical grid for southern Moravia,[47] and an application to minimizing wire lengths in circuits was described in 1957 by Loberman and Weinberger.[48]
Minimum spanning trees are closely related to single-linkage clustering, one of several methods for hierarchical clustering. The edges of a minimum spanning tree, sorted by their length, give the order in which to merge clusters into larger clusters in this clustering method. Once these edges have been found, by any algorithm, they may be used to construct the single-linkage clustering in time .[1] Although the long thin cluster shapes produced by single-linkage clustering can be a bad fit for certain types of data, such as mixtures of Gaussian distributions, it can be a good choice in applications where the clusters themselves are expected to have long thin shapes, such as in modeling the dark matter halos of galaxies.[5] In geographic information science, several researcher groups have used minimum spanning trees of the centroids of buildings to identify meaningful clusters of buildings, for instance by removing edges identified in some other way as inconsistent.[49]
Minimum spanning trees have also been used to infer the shape of curves in the plane, given points sampled along the curve. For a smooth curve, sampled more finely than its local feature size, the minimum spanning tree will form a path connecting consecutive points along the curve. More generally, similar methods can recognize curves drawn in a dotted or dashed style rather than as a single connected set. Applications of this curve-finding technique include particle physics, in automatically identifying the tracks left by particles in a bubble chamber.[50] More sophisticated versions of this idea can find curves from a cloud of noisy sample points that roughly follows the curve outline, by using the topology of the spanning tree to guide a moving least squares method.[51]
Another application of minimum spanning trees is a
Realization
The realization problem for Euclidean minimum spanning trees takes an abstract
See also
- Rectilinear minimum spanning tree, a minimum spanning tree with distances measured using taxicab geometry
References
- ^ MR 0242315
- ^ S2CID 40615455
- MR 2257270
- ^ MR 1115099
- ^ S2CID 186025
- ^ S2CID 22176641
- ^ S2CID 30101649
- ^ a b Narasimhan, Giri; Zachariasen, Martin; Zhu, Jianlin (2000), "Experiments with computing geometric minimum spanning trees", Proceedings of the 2nd Workshop on Algorithm Engineering and Experiments, pp. 183–196
- ^ S2CID 5634139
- ^ a b King, James A. (2006), "Realization of degree 10 minimum spanning trees in 3-space" (PDF), Proceedings of the 18th Annual Canadian Conference on Computational Geometry, CCCG 2006, August 14-16, 2006, Queen's University, Ontario, Canada, pp. 39–42
- ^ a b Krznaric, Drago; Levcopoulos, Christos; Nilsson, Bengt J. (1999), "Minimum spanning trees in dimensions", Nordic Journal of Computing, 6 (4): 446–461, .
- ^ MR 0223269
- ^ MR 1746681
- MR 0875330
- ^ S2CID 16040977
- ^ Pfender, Florian; Ziegler, Günter M. (September 2004), "Kissing numbers, sphere packings, and some unexpected proofs" (PDF), Notices of the American Mathematical Society: 873–883
- S2CID 29026025
- S2CID 206656565
- doi:10.1049/el:19800611; reply by Urquhart, pp. 860–861
- ^ MR 0677663
- ^ S2CID 17238717
- MR 0986667
- MR 0958215
- MR 1442317
- MR 0491324
- ^ S2CID 11316974
- MR 2107027
- S2CID 60203
- ^ O'Rourke, J.; Demaine, E. (2001–2002), "Problem 5: Euclidean Minimum Spanning Tree", The Open Problems Project, Smith College
- MR 1098813
- S2CID 832583
- ISBN 978-3-642-13192-9
- MR 3478461
- MR 1291541
- S2CID 47454142
- S2CID 7339165
- MR 1314960
- S2CID 18966889
- ^ Rahmati, Zahed; Zarei, Alireza (2010), "Combinatorial changes of Euclidean minimum spanning tree of moving points in the plane" (PDF), Proceedings of the 22nd Annual Canadian Conference on Computational Geometry, Winnipeg, Manitoba, Canada, August 9-11, 2010, pp. 43–45
- S2CID 234599877
- S2CID 15556637
- S2CID 2559456
- MR 2960341
- ^ S2CID 18971251
- S2CID 4709616
- MR 1105929
- S2CID 10555375
- S2CID 7320964
- S2CID 46772272
- ^ Zahn, C. T. (1973), "Using the minimum spanning tree to recognize dotted and dashed curves", 1st International Computing Symposium, Davos, Switzerland, 4–7 September 1973
- MR 1733203
- S2CID 17514182
- S2CID 1297518
- S2CID 17863487
- S2CID 11982404
- ISBN 978-3-540-27580-0
- MR 1394494
- MR 3123788
External links
- EMST tutorial, mlpack documentation