Dunn index
The Dunn index (DI) (introduced by J. C. Dunn in 1974) is a metric for evaluating
Preliminaries
There are many ways to define the size or diameter of a cluster. It could be the distance between the farthest two points inside a cluster, it could be the mean of all the pairwise distances between data points inside the cluster, or it could as well be the distance of each data point from the cluster centroid. Each of these formulations are mathematically shown below:
Let Ci be a cluster of vectors. Let x and y be any two n dimensional feature vectors assigned to the same cluster Ci.
- , which calculates the maximum distance (the version proposed by Dunn).
- , which calculates the mean distance between all pairs.
- , calculates distance of all the points from the mean.
This can also be said about the intercluster distance, where similar formulations can be made, using either the closest two data points (used by Dunn), one in each cluster, or the farthest two, or the distance between the centroids and so on. The definition of the index includes any such formulation, and the family of indices so formed are called Dunn-like Indices. Let be this intercluster distance metric, between clusters Ci and Cj.
Definition
With the above notation, if there are m clusters, then the Dunn Index for the set is defined as:
- .
Explanation
Being defined in this way, the DI depends on m, the number of clusters in the set. If the number of clusters is not known apriori, the m for which the DI is the highest can be chosen as the number of clusters. There is also some flexibility when it comes to the definition of d(x,y) where any of the well known metrics can be used, like
Notes and references
- S2CID 120919314.
- ISSN 0022-0280.
- ^ "MATLAB implementation of the Dunn Index". Retrieved 5 December 2011.
- ^ Lukasz, Nieweglowski. "Package 'clv'" (PDF). R project. CRAN. Retrieved 2 April 2013.
- ^ "Apache Mahout". Apache Software Foundation. Retrieved 9 May 2013.
External links
- Pakhira, Malay K.; Bandyopadhyay, Sanghamitra; Maulik, Ujjwal (2004). "Validity index for crisp and fuzzy clusters". Pattern Recognition. 37 (3): 487–501. .
- Bezdek, J.C.; Pal, N.R. (1995). "Cluster validation with generalized Dunn's indices". Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems. IEEE Xplore. pp. 190–193. S2CID 7816379.