Overview (Clustering)

Purpose

Clustering, specially clustering of customers, can be a powerful tool for businesses looking to better understand and target their pricing. By identifying meaningful clusters, businesses can develop more effective pricing and marketing strategies, increase customer satisfaction and retention, and ultimately drive revenue growth.

This Clustering Accelerator intends to provide a way to group items using a fairly simple process, not requiring a data scientist to proceed.

Data often comes too sparse or too granular to really leverage their full potential and be actionable, specially when setting a pricing strategy that can be handled in a nice and manageable way. So using clustering can be really valuable for grouping customers, products, point of sales or any of the pertinent dimensions of your transactions, and this clustering technique enables you to regroup items within a category depending on what happens in another attribute.

The initial intention here is to better understand the relationships between customers and products. The clustering idea is to regroup customers that buy the same products in the same proportions to create a data-driven typology of customers. This data-driven typology combined with the sizing of customers provides an accurate understanding of who buys what and allows you to adjust the pricing accordingly.

Versatility of this approach may also help in regrouping products with regard to who buys them, or where they are bought, or when etc. Any pair of transactions' dimensions can be explored to build a new dimension that enriches the transactions with data-driven labels that help in defining pertinent pricing strategies.

Pricefx Solution

Our clustering model is dedicated to enriching transactional data. To operate it, you need to define the following:

Grouping dimension to label, called groups (e.g. customers)
Observable dimension to characterize the groups, called based-ons (e.g. products or products category)
Metric

Different metrics serve different purposes, from the spend-pattern analysis to more common statistical metrics. The spend-pattern analysis computes for each group the ratio of the revenue spend for which based-ons, typically product category. As an alternative, statistical metrics can also be used (mean, median, sum) to build clusters as group of customers with similar discounts (then the metric would be “average” discount). Subsequently, the similarity between groups is computed and clustering is applied to result in clusters.

We use a hierarchical clustering algorithm and some additional evaluations to recommend the final number of clusters to the user who can still adjust that number.

To refine the meaning of the clusters, an intermediate analysis helps the user to focus the clustering on the most relevant groups and based-ons with the possibility to label the numerous less relevant groups (i.e. very small customers or long tail products) based on their similarity with already clusterized groups.

Outputs

The output of the model is a list of items grouped into clusters, typically customers grouped by customer segments.

A set of dashboards is also provided in order to review and assess the outputs.

Outputs can be exported directly to a Data Source and joined to a Datamart. Then the clusters can be leveraged to defined pricing strategies or used as a segmentation level in Negotiation Guidance.

Approach

This Clustering Accelerator is based on hierarchical clustering. Hierarchical clustering is a type of unsupervised machine learning algorithm that is used to group similar objects or data points into clusters. The algorithm works by merging the most similar pairs of data points into clusters.

One of the main advantages of hierarchical clustering is that it produces a hierarchy of clusters, which can be then explored to find out the best number of clusters and help users with an optimal set of clusters.

Also, hierarchical clustering is robust to small changes in the dataset, meaning clusters would remain fairly similar over time and should not be affected by small changes of scope.

Additional refinements can be used to define minimum revenue within a cluster or remove less meaningful data (like small customers) before extending the defined clusters by assigning each and every case to a cluster. This is part of the “extended cluster affection”.

Limitations

Minimal number of items to cluster – Even though it is possible to make this model execute with a minimum number of 7 items, the clustering result might not be as relevant. As a rule of thumb, the number of items should exceed 10 times the expected number of clusters.
Metric required – The clustering approach relies on a metric (4 ways offered for now) that will be used to find out some pattern in the data and group together the items. So using such metric is a prerequisite and this metric should be defined and can be computed at the right level: granularity group attribute x based on attribute (e.g. Customer x Product Group).

No predefined extension point – There is no out-of-the-box extension point defined for now. If you intend to use your own metric, custom code should be written. (But then the accelerator becomes specific so it cannot be updated without extra effort to port those modifications.)

Data requirements – See Data Requirements (Clustering).