/
Clustering Metrics (Clustering)

Clustering Metrics (Clustering)

Visual representation of the clustering metrics:

Clustering Quality

Clustering Quality is calculated from the Silhouette Score:

  • A higher value is better.

  • The worst is -1, clusters are assigned in the wrong way.

  • 0 is still bad, the distance between clusters is not significant

  • 1 is perfect, clusters are perfectly apart from each other and clearly distinguished.

The Silhouette Score is defined by the s = mean((b−a)/max(a,b)) where:

  • a – The mean distance between a sample and all other points in the same class.

  • b – The mean distance between a sample and all other points in the next nearest cluster.

Variance Ratio Criterion

Variance Ratio Criterion is calculated from the Calinski-Harabasz Index and represents the ratio of between-clusters dispersion and within-cluster dispersion.

  • A higher value is better.

  • The worst is 0.

  • The number of samples matters, so the value will change a lot between two datasets.

  • There is no perfect value.

The Calinski-Harabasz index is the ratio of the sum of between-clusters dispersion and of within-cluster dispersion for all clusters (where dispersion is defined as the sum of distances squared).

Cluster Separation

Cluster Separation is calculated from the Davies-Bouldin Index as:

"Cluster separation" = 1 / (1 + Davies-Bouldin Index)

  • A higher value is better.

  • A perfect value is 1.

The Davies-Bouldin Index relates to a model with better separation between the clusters.

It is computed as the average similarity measure of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. Thus, clusters which are farther apart and less dispersed will result in a better score.

Combined Score

Coming from all the scores combined together after scaling each score for clarity:

  • Scaled Clustering Quality = Clustering Quality / Max(“Clustering Quality” for all numbers of clusters)

  • Scaled Variance Ratio Criterion = Variance Ratio Criterion / Max(“Variance Ratio Criterion” for all numbers of clusters)

  • Scaled Cluster Separation = “Cluster Separation” / Max(“Cluster Separation” for all number of clusters)

Combined Score = “Scaled Clustering Quality” * “Scaled Clustering Quality” * “Scaled Variance Ratio Criterion” * “Scaled Cluster Separation”

Related content

Linkage Methods (Clustering)
Linkage Methods (Clustering)
More like this
Technical User Reference (Clustering)
Technical User Reference (Clustering)
More like this
Usage (Clustering)
Usage (Clustering)
More like this