Farthest-first traversal

In

compact metric space is a sequence of points in the space, where the first point is selected arbitrarily and each successive point is as far as possible from the set of previously-selected points. The same concept can also be applied to a finite set of geometric points, by restricting the selected points to belong to the set or equivalently by considering the finite metric space generated by these points.^[1] For a finite metric space or finite set of geometric points, the resulting sequence forms a permutation of the points, also known as the greedy permutation.^[2]

Every

linear time

.

Definition and properties

A farthest-first traversal is a

compact metric space, with each point appearing at most once. If the space is finite, each point appears exactly once, and the traversal is a permutation of all of the points in the space. The first point of the sequence may be any point in the space. Each point

p

after the first must have the maximum possible distance to the set of points earlier than

p

in the sequence, where the distance from a point to a set is defined as the minimum of the pairwise distances to points in the set. A given space may have many different farthest-first traversals, depending both on the choice of the first point in the sequence (which may be any point in the space) and on ties for the maximum distance among later choices.^[2]

Farthest-point traversals may be characterized by the following properties. Fix a number $k$ , and consider the prefix formed by the first $k$ points of the farthest-first traversal of any metric space. Let $r$ be the distance between the final point of the prefix and the other points in the prefix. Then this subset has the following two properties:

All pairs of the selected points are at distance at least $r$ from each other, and
All points of the metric space are at distance at most $r$ from the subset.

Conversely any sequence having these properties, for all choices of $k$ , must be a farthest-first traversal. These are the two defining properties of a Delone set, so each prefix of the farthest-first traversal forms a Delone set.^[3]

Applications

approximation ratio for this method, they show that in practice it often works better than other insertion methods with better provable approximation ratios.^[4]

Later, the same sequence of points was popularized by

approximation ratio of 2 for both clustering problems.^[3]

Gonzalez's heuristic was independently rediscovered for the metric $k$ -center problem by

P = NP.^[3]^[6]

As well as for clustering, the farthest-first traversal can also be used in another type of facility location problem, the max-min facility dispersion problem, in which the goal is to choose the locations of $k$ different facilities so that they are as far apart from each other as possible. More precisely, the goal in this problem is to choose $k$ points from a given metric space or a given set of candidate points, in such a way as to maximize the minimum pairwise distance between the selected points. Again, this can be approximated by choosing the first $k$ points of a farthest-first traversal. If $r$ denotes the distance of the $k$ th point from all previous points, then every point of the metric space or the candidate set is within distance $r$ of the first $k - 1$ points. By the pigeonhole principle, some two points of the optimal solution (whatever it is) must both be within distance $r$ of the same point among these first $k - 1$ chosen points, and (by the triangle inequality) within distance $2 r$ of each other. Therefore, the heuristic solution given by the farthest-first traversal is within a factor of two of optimal.^[8]^[9]^[10]

Other applications of the farthest-first traversal include color quantization (clustering the colors in an image to a smaller set of representative colors),^[11] progressive scanning of images (choosing an order to display the pixels of an image so that prefixes of the ordering produce good lower-resolution versions of the whole image rather than filling in the image from top to bottom),^[12] point selection in the

sensor networks,^[19] modeling phylogenetic diversity,^[20] matching vehicles in a heterogenous fleet to customer delivery requests,^[21] uniform distribution of geodetic observatories on the Earth's surface^[22] or of other types of sensor network,^[23] generation of virtual point lights in the instant radiosity computer graphics rendering method,^[24] and geometric range searching data structures.^[25]

Algorithms

Greedy exact algorithm

The farthest-first traversal of a finite point set may be computed by a greedy algorithm that maintains the distance of each point from the previously selected points, performing the following steps:^[3]

Initialize the sequence of selected points to the empty sequence, and the distances of each point to the selected points to infinity.
While not all points have been selected, repeat the following steps:
- Scan the list of not-yet-selected points to find a point $p$ that has the maximum distance from the selected points.
- Remove $p$ from the not-yet-selected points and add it to the end of the sequence of selected points.
- For each remaining not-yet-selected point $q$ , replace the distance stored for $q$ by the minimum of its old value and the distance from $p$ to $q$ .

For a set of $n$ points, this algorithm takes $O (n 2)$ steps and $O (n 2)$ distance computations.^[3]

Approximations

A faster approximation algorithm, given by Har-Peled & Mendel (2006), applies to any subset of points in a metric space with bounded doubling dimension, a class of spaces that include the Euclidean spaces of bounded dimension. Their algorithm finds a sequence of points in which each successive point has distance within a $1 - ε$ factor of the farthest distance from the previously-selected point, where $ε$ can be chosen to be any positive number. It takes time $O(n\log n)$ .^[2]

The results for bounded doubling dimension do not apply to high-dimensional Euclidean spaces, because the constant factor in the big O notation for these algorithms depends on the dimension. Instead, a different approximation method based on the Johnson–Lindenstrauss lemma and locality-sensitive hashing has running time $O(\varepsilon ^{-2}n^{1+1/(1+\varepsilon )^{2}+o(1)}).$ For metrics defined by shortest paths on weighted undirected graphs, a randomized incremental construction based on Dijkstra's algorithm achieves time $O(\varepsilon ^{-1}m\log n\log {\tfrac {n}{\varepsilon }})$ , where $n$ and $m$ are the numbers of vertices and edges of the input graph, respectively.^[26]

Incremental Voronoi insertion

For selecting points from a continuous space such as the Euclidean plane, rather than from a finite set of candidate points, these methods will not work directly, because there would be an infinite number of distances to maintain. Instead, each new point should be selected as the center of the largest empty circle defined by the previously-selected point set.^[12] This center will always lie on a vertex of the Voronoi diagram of the already selected points, or at a point where an edge of the Voronoi diagram crosses the domain boundary. In this formulation the method for constructing farthest-first traversals has also been called incremental Voronoi insertion.^[27] It is similar to Delaunay refinement for finite element mesh generation, but differs in the choice of which Voronoi vertex to insert at each step.^[28]

References

^
MR 2136964

^
S2CID 37346335

^
MR 0807927

S2CID 14764079

^
MR 0797340

^
MR 0793876

^ For prominent examples of incorrect attribution of the farthest-first heuristic to Hochbaum & Shmoys (1985), see, e.g.,
Dasgupta, Sanjoy (2002), "Performance guarantees for hierarchical clustering", in Kivinen, Jyrki; Sloan, Robert H. (eds.), Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, July 8-10, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2375, Springer, pp. 351–363,
ISBN 978-3-540-43836-6
(corrected in the 2005 journal version of the same paper)

Agarwal, Sameer; Ramamoorthi, Ravi; Belongie, Serge J.; Jensen, Henrik Wann (2003), "Structured importance sampling of environment maps", ACM Trans. Graph., 22 (3): 605–612,
doi:10.1145/882262.882314

Baram, Yoram; El-Yaniv, Ran; Luz, Kobi (2004), "Online choice of active learning algorithms" (PDF), J. Mach. Learn. Res., 5: 255–291

Basu, Sugato; Bilenko, Mikhail; Banerjee, Arindam; Mooney, Raymond J. (2006), "Probabilistic semi-supervised clustering with constraints", in Chapelle, Olivier; Schölkopf, Bernhard; Zien, Alexander (eds.), Semi-Supervised Learning, The MIT Press, pp. 73–102,
ISBN 978-0-262-03358-9

Lima, Christiane Ferreira Lemos; Assis, Francisco M.; de Souza, Cleonilson Protásio (2011), "A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection", IEEE International Workshop on Measurement and Networking, M&N 2011, Anacapri, Italy, October 10-11, 2011, IEEE, pp. 77–82,
S2CID 7510040

"Class FarthestFirst", Weka
, version 3.9.5, University of Waikato, December 21, 2020, retrieved 2021-11-06 – via SourceForge

MR 1154657
; White credits the use of the farthest-first traversal as a heuristic for this problem to Steuer, R. E. (1986), Multiple-Criteria Optimization: Theory, Computation, and Applications, New York: Wiley

MR 1129392

S2CID 16489402

S2CID 17713417

^
PMID 18283019

doi:10.1613/jair.468

^ Moenning, C.; Dodgson, N. A. (2003), "A new point cloud simplification algorithm", 3rd IASTED International Conference on Visualization, Imaging, and Image Processing

S2CID 10608234

doi:10.1155/S1110865704403217

S2CID 6001942

^ Girdhar, Y.; Giguère, P.; Dudek, G. (2012), "Autonomous adaptive underwater exploration using online topic modelling" (PDF), Proc. Int. Symp. Experimental Robotics

doi:10.1021/ie201850k

PMID 19085326

ISBN 978-3-642-00922-8

ISBN 978-3-642-64107-7

^ Vieira, Luiz Filipe M.; Vieira, Marcos Augusto M.; Ruiz, Linnyer Beatrys; Loureiro, Antonio A. F.; Silva, Diógenes Cecílio; Fernandes, Antônio Otávio (2004), "Efficient Incremental Sensor Network Deployment Algorithm" (PDF), Proc. Brazilian Symp. Computer Networks, pp. 3–14

S2CID 18626929

S2CID 6286186

S2CID 18316279

hdl:2433/84849

doi:10.1006/jagm.1995.1021

Retrieved from "https://en.wikipedia.org/w/index.php?title=Farthest-first_traversal&oldid=1213123111"

[daslon-1] 
MR 2136964

[harmen-2] 
S2CID 37346335

[gonzalez-3] 
MR 0807927

[rosste-4] S2CID 14764079

[dyfr-5] 
MR 0797340

[hocshm-6] 
MR 0793876

[incorrect-7] For prominent examples of incorrect attribution of the farthest-first heuristic to Hochbaum & Shmoys (1985), see, e.g.,
Dasgupta, Sanjoy (2002), "Performance guarantees for hierarchical clustering", in Kivinen, Jyrki; Sloan, Robert H. (eds.), Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, July 8-10, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2375, Springer, pp. 351–363,
ISBN 978-3-540-43836-6
(corrected in the 2005 journal version of the same paper)

Agarwal, Sameer; Ramamoorthi, Ravi; Belongie, Serge J.; Jensen, Henrik Wann (2003), "Structured importance sampling of environment maps", ACM Trans. Graph., 22 (3): 605–612,
doi:10.1145/882262.882314

Baram, Yoram; El-Yaniv, Ran; Luz, Kobi (2004), "Online choice of active learning algorithms" (PDF), J. Mach. Learn. Res., 5: 255–291

Basu, Sugato; Bilenko, Mikhail; Banerjee, Arindam; Mooney, Raymond J. (2006), "Probabilistic semi-supervised clustering with constraints", in Chapelle, Olivier; Schölkopf, Bernhard; Zien, Alexander (eds.), Semi-Supervised Learning, The MIT Press, pp. 73–102,
ISBN 978-0-262-03358-9

Lima, Christiane Ferreira Lemos; Assis, Francisco M.; de Souza, Cleonilson Protásio (2011), "A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection", IEEE International Workshop on Measurement and Networking, M&N 2011, Anacapri, Italy, October 10-11, 2011, IEEE, pp. 77–82,
S2CID 7510040

"Class FarthestFirst", Weka
, version 3.9.5, University of Waikato, December 21, 2020, retrieved 2021-11-06 – via SourceForge

[8] Dasgupta, Sanjoy (2002), "Performance guarantees for hierarchical clustering", in Kivinen, Jyrki; Sloan, Robert H. (eds.), Computational Learning Theory, 15th Annual Conference on Computational Learning Theory, COLT 2002, Sydney, Australia, July 8-10, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2375, Springer, pp. 351–363,
ISBN 978-3-540-43836-6
(corrected in the 2005 journal version of the same paper)

[9] Agarwal, Sameer; Ramamoorthi, Ravi; Belongie, Serge J.; Jensen, Henrik Wann (2003), "Structured importance sampling of environment maps", ACM Trans. Graph., 22 (3): 605–612,
doi:10.1145/882262.882314

[10] Baram, Yoram; El-Yaniv, Ran; Luz, Kobi (2004), "Online choice of active learning algorithms" (PDF), J. Mach. Learn. Res., 5: 255–291

[11] Basu, Sugato; Bilenko, Mikhail; Banerjee, Arindam; Mooney, Raymond J. (2006), "Probabilistic semi-supervised clustering with constraints", in Chapelle, Olivier; Schölkopf, Bernhard; Zien, Alexander (eds.), Semi-Supervised Learning, The MIT Press, pp. 73–102,
ISBN 978-0-262-03358-9

[12] Lima, Christiane Ferreira Lemos; Assis, Francisco M.; de Souza, Cleonilson Protásio (2011), "A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection", IEEE International Workshop on Measurement and Networking, M&N 2011, Anacapri, Italy, October 10-11, 2011, IEEE, pp. 77–82,
S2CID 7510040

[13] "Class FarthestFirst", Weka
, version 3.9.5, University of Waikato, December 21, 2020, retrieved 2021-11-06 – via SourceForge

[white-8] MR 1154657
; White credits the use of the farthest-first traversal as a heuristic for this problem to Steuer, R. E. (1986), Multiple-Criteria Optimization: Theory, Computation, and Applications, New York: Wiley

[tamir-9] MR 1129392

[ravros-10] S2CID 16489402

[xiang-11] S2CID 17713417

[elpz-12] 
PMID 18283019

[mab-13] doi:10.1613/jair.468

[moedod-14] Moenning, C.; Dodgson, N. A. (2003), "A new point cloud simplification algorithm", 3rd IASTED International Conference on Visualization, Imaging, and Image Processing

[gotall-15] S2CID 10608234

[shamol-16] doi:10.1155/S1110865704403217

[lipfun-17] S2CID 6001942

[ggd-18] Girdhar, Y.; Giguère, P.; Dudek, G. (2012), "Autonomous adaptive underwater exploration using online topic modelling" (PDF), Proc. Int. Symp. Experimental Robotics

[aye-19] doi:10.1021/ie201850k

[borrod-20] PMID 19085326

[fisjai-21] ISBN 978-3-642-00922-8

[hase-22] ISBN 978-3-642-64107-7

[vievie-23] Vieira, Luiz Filipe M.; Vieira, Marcos Augusto M.; Ruiz, Linnyer Beatrys; Loureiro, Antonio A. F.; Silva, Diógenes Cecílio; Fernandes, Antônio Otávio (2004), "Efficient Incremental Sensor Network Deployment Algorithm" (PDF), Proc. Brazilian Symp. Computer Networks, pp. 3–14

[laisar-24] S2CID 18626929

[abbame-25] S2CID 6286186

[ehs-26] S2CID 18316279

[terasa-27] hdl:2433/84849

[ruppert-28] doi:10.1006/jagm.1995.1021

[1]

[2]

[3]

[4]

[6]

[8]

[9]

[10]

[11]

[12]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]