Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and

statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed.^[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed, the value tends to 0.5.^[3]

Preliminaries

A typical formulation of the Hopkins statistic follows.[2]

Let

X

be the set of

n

data points.

Generate a random sample

{\overset {\sim }{X}}

of

m\ll n

data points sampled without replacement from

X

.

Generate a set

Y

of

m

uniformly randomly distributed data points.

Define two distance measures,

u_{i},

the minimum distance (given some suitable metric) of

y_{i}\in Y

to its nearest neighbour in

X

, and

w_{i},

the minimum distance of

{\overset {\sim }{x}}_{i}\in {\overset {\sim }{X}}\subseteq X

to its nearest neighbour

x_{j}\in X,\,{\overset {\sim }{x_{i}}}\neq x_{j}.

Definition

With the above notation, if the data is $d$ dimensional, then the Hopkins statistic is defined as:^[4]

$H={\frac {\sum _{i=1}^{m}{u_{i}^{d}}}{\sum _{i=1}^{m}{u_{i}^{d}}+\sum _{i=1}^{m}{w_{i}^{d}}}}\,$

Under the null hypotheses, this statistic has a Beta(m,m) distribution.

Notes and references

doi:10.1093/oxfordjournals.aob.a083391
.

^
S2CID 36701919
.

S2CID 13595565
.

doi:10.1016/B978-0-08-027618-2.50054-1
.

External links

http://www.sthda.com/english/wiki/assessing-clustering-tendency-a-vital-issue-unsupervised-machine-learning

v
t
e
Machine learning evaluation metrics
Regression

MSE

MAE

sMAPE

MAPE

MASE

MSPE

RMS

RMSE/RMSD

R²

MDA

MAD

Classification

F-score

P4

Accuracy

Precision

Recall

Kappa

MCC

AUC

ROC

Sensitivity and specificity

Logarithmic Loss

Clustering

Silhouette

Calinski-Harabasz index

Davies-Bouldin

Dunn index

Hopkins statistic

Jaccard index

Rand index

Similarity measure

SMC

SimHash

Ranking

MRR

NDCG

AP

Computer Vision

PSNR

SSIM

IoU

NLP

Perplexity

BLEU

Deep Learning Related Metrics

Inception score

FID

Recommender system

Coverage

Intra-list Similarity

Similarity

Cosine similarity

Euclidean distance

Pearson correlation coefficient

Confusion matrix

Retrieved from "https://en.wikipedia.org/w/index.php?title=Hopkins_statistic&oldid=1166892676"

[1] :10.1093/oxfordjournals.aob.a083391
.

[banerjee04-2] 
S2CID 36701919
.

[3] S2CID 13595565
.

[4] :10.1016/B978-0-08-027618-2.50054-1
.

[2]

[3]

[4]