Geometric median
In geometry, the geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions. It is also known as the spatial median,[1] Euclidean minisum point,[1] Torricelli point, [2] or 1-median.
The geometric median is an important
The special case of the problem for three points in the plane (that is, m = 3 and n = 2 in the definition below) is sometimes also known as Fermat's problem; it arises in the construction of minimal
Wesolowsky (1993) provides a survey of the geometric median problem. See Fekete, Mitchell & Beurer (2005) for generalizations of the problem to non-discrete point sets.
Definition
Formally, for a given set of m points with each , the geometric median is defined as the sum of the L2 distances minimizer
Here,
Properties
- For the 1-dimensional case, the geometric median coincides with the median. This is because the univariate median also minimizes the sum of distances from the points. (More precisely, if the points are p1, ..., pn, in that order, the geometric median is the middle point if n is odd, but is not uniquely determined if n is even, when it can be any point in the line segment between the two middling points and .) [10][11]
- The geometric median is unique whenever the points are not collinear.[12]
- The geometric median is Cartesian coordinates by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.[13]
- The geometric median has a robust estimatorfor the location of the uncorrupted data.
Special cases
- For 3 (non-collinear) points, if any angle of the triangle formed by those points is 120° or more, then the geometric median is the point at the vertex of that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to each three pairs of triangle vertices.[10] This is also known as the Fermat pointof the triangle formed by the three vertices. (If the three points are collinear then the geometric median is the point between the two other points, as is the case with a one-dimensional median.)
- For 4 Radon point of the four points.[14]
Computation
Despite the geometric median's being an easy-to-understand concept, computing it poses a challenge. The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each point, can be found by a simple formula — its coordinates are the averages of the coordinates of the points — but it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots, can exist in general for the geometric median. Therefore, only numerical or symbolic approximations to the solution of this problem are possible under this model of computation.[15]
However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances to the sample points is a
One common approach of this type, called Weiszfeld's algorithm after the work of
This method converges for almost all initial positions, but may fail to converge when one of its estimates falls on one of the given points. It can be modified to handle these cases so that it converges for all initial points.[12]
Bose, Maheshwari & Morin (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem.
which can be solved in polynomial time using common optimization solvers.
Characterization of the geometric median
If y is distinct from all the given points, xi, then y is the geometric median if and only if it satisfies:
This is equivalent to:
which is closely related to Weiszfeld's algorithm.
In general, y is the geometric median if and only if there are vectors ui such that:
where for xi ≠ y,
and for xi = y,
An equivalent formulation of this condition is
It can be seen as a generalization of the median property, in the sense that any partition of the points, in particular as induced by any hyperplane through y, has the same and opposite sum of positive directions from y on each side. In the one dimensional case, the hyperplane is the point y itself, and the sum of directions simplifies to the (directed) counting measure.
Generalizations
The geometric median can be generalized from Euclidean spaces to general Riemannian manifolds (and even metric spaces) using the same idea which is used to define the Fréchet mean on a Riemannian manifold.[17][18] Let be a Riemannian manifold with corresponding distance function , let be weights summing to 1, and let be observations from . Then we define the weighted geometric median (or weighted Fréchet median) of the data points as
- .
If all the weights are equal, we say simply that is the geometric median.
See also
Notes
- ^ a b c Drezner et al. (2002)
- ^ Cieslik (2006).
- ^ Lawera & Thompson (1993).
- ^ Dodge & Rousson (1999).
- ^ Eiselt & Marianov (2011).
- ^ Krarup & Vajda (1997).
- ^ Spain (1996).
- ^ Brimberg (1995).
- ^ Bose, Maheshwari & Morin (2003).
- ^ a b c Haldane (1948)
- ^ Claim 18.10, Geometric Methods and Optimization Problems, V. Boltyanski, H. Martini, V. Soltan, Springer, 1999.
- ^ a b Vardi & Zhang (2000)
- ^ a b c Lopuhaä & Rousseeuw (1991)
- ^ Cieslik (2006), p. 6; Plastria (2006). The convex case was originally proven by Giovanni Fagnano.
- ruler and compass
- ^ Weiszfeld (1937); Kuhn (1973); Chandrasekaran & Tamir (1989).
- ^ Fletcher, P. Thomas; Venkatasubramanian, Suresh; Joshi, Sarang (23 June 2008). "Robust statistics on Riemannian manifolds via the geometric median". 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE.
- ^ Fletcher, Venkatasubramanian & Joshi (2009).
References
- .
- .
- .
- Brimberg, J. (1995). "The Fermat–Weber location problem revisited". S2CID 206800756.
- Chandrasekaran, R.; Tamir, A. (1989). "Open questions concerning Weiszfeld's algorithm for the Fermat-Weber location problem". S2CID 43224801.
- Cieslik, Dietmar (2006). Shortest Connectivity: An Introduction with Applications in Phylogeny. Combinatorial Optimization. Vol. 17. Springer. p. 3. ISBN 9780387235394.
- Cockayne, E. J.; Melzak, Z. A. (1969). "Euclidean constructability in graph minimization problems". JSTOR 2688541.
- Cohen, Michael; Lee, Yin Tat; ISBN 978-1-4503-4132-5.
- Dodge, Yadolah; Rousson, Valentin (September 1999). "Multivariate L1 mean". Metrika. 49 (2): 127–134. .
- Drezner, Zvi; MR 1933966.
- Eiselt, H. A.; Marianov, Vladimir (2011). Foundations of Location Analysis. International Series in Operations Research & Management Science. Vol. 155. Springer. p. 6. ISBN 9781441975720.
- Fekete, Sándor P.; S2CID 1121.
- Fletcher, P. Thomas; Venkatasubramanian, Suresh; Joshi, Sarang (2009). "The geometric median on Riemannian manifolds with application to robust atlas estimation". NeuroImage. 45 (1 Suppl): s143–s152. PMID 19056498.
- .
- Krarup, Jakob; Vajda, Steven (1997). "On Torricelli's geometrical solution to a problem of Fermat". IMA Journal of Mathematics Applied in Business and Industry. 8 (3): 215–224. MR 1473041.
- S2CID 22534094.
- Lawera, Martin; Thompson, James R. (1993). "Some problems of estimation and testing in multivariate statistical process control" (PDF). Proceedings of the 38th Conference on the Design of Experiments. U.S. Army Research Office Report. Vol. 93–2. pp. 99–126. Archived from the original on May 17, 2014.
- Lopuhaä, Hendrick P.; JSTOR 2241852.
- Nie, Jiawang; Parrilo, Pablo A.; S2CID 16558095.
- Ostresh, L. (1978). "Convergence of a class of iterative methods for solving Weber location problem". .
- Zbl 1126.90046..
- Spain, P. G. (1996). "The Fermat point of a triangle". Mathematics Magazine. 69 (2): 131–133. MR 1573157.
- Vardi, Yehuda; Zhang, Cun-Hui (2000). "The multivariate L1-median and associated data depth". Proceedings of the National Academy of Sciences of the United States of America. 97 (4): 1423–1426 (electronic). PMID 10677477.
- Weber, Alfred (1909). Über den Standort der Industrien, Erster Teil: Reine Theorie des Standortes (in German). Tübingen: Mohr.
- Wesolowsky, G. (1993). "The Weber problem: History and perspective". Location Science. 1: 5–23.
- S2CID 21000317.