The aim of this step is to configure the segmentation. It starts with the calculation of the price drivers. When the calculation is done, the step displays a tab composed of a left panel to set the segmentation parameters, and a right panel to analyze the price drivers.

Price Drivers Calculation

When you arrive at the Configuration step from the Analysis step, the model first runs a calculation to evaluate the relationships between the price drivers and the optimization targets. This calculation is based on a triggered Python job: first a preprocessing work is done, then a Python job is triggered. During the calculation, you will see that two different jobs are run.

The second step is running

The calculation creates five tables, each of them prefixed with “PriceDrivers”. They are used as inputs to the price drivers dashboard.

Price Drivers Dashboard

There are six portlets in the dashboard. The input values of the left panel are not related to the dashboard itself. Their goal is to define the calculation for the next step.

Price Drivers - Feature importance

Feature Importance

This portlet shows the importance of the selected price drivers, in a decreasing importance order. The importance is a measure of how good the feature is at predicting the optimization target (defined in the Definition step). Simply put, the higher the importance, the more the feature is accurate to provide a good segmentation. The importance is based on feature permutation, and enhanced for the purpose of segmentation:

by containing its natural randomness (the feature importance will not change if you recalculate with the same features selected),
by removing natural noise (features that have no real importance are put at 0), and
by adjusting the values for the purpose of the segmentation (features with lower cardinality are preferred for segmentation).

The importance measure is highly dependent on the features selected in the Analysis step. For example, if you selected only two features, you might have high importance for the two features, but the features might not be good candidates for the segmentation. A measure of the goodness of the importance measure called Explained Variance, is available in Price Drivers - Feature Importance portlet (in the figure Price Drivers - Feature Importance portlet, Explained Variance is 0.89). Given it is possible to select Numerical features in the Analysis step, but they cannot be used for the segmentation, they will be displayed differently in the Price Drivers - Feature Importance portlet, and will not be displayed in the Price Drivers - Relative Importance portlet. The “Numerical feature” and “Other feature” bars are not usable for the segmentation.

Numerical features cannot be directly used as the segmentation is based on categories, so a solution could be to structure a numerical feature into bins in the Data Source and then this new feature can be used. “Other feature” corresponds most of the time to features that are not set as a “dimension” in the Datamart or Data Source, which is required for the next step (for performance reasons).

The subtitle indicates the global explained variance for all the price drivers combined.

Price Drivers - Relative importance

This pie chart displays the sharing of importance between the different price drivers usable for the segmentation.

Segmentation Dimensions Recommendation

Leveraging feature importance, feature interaction, and hierarchies, the Segmentation Dimensions Recommendation portlet recommends an ordering and a selection of the dimensions. The ordering is created leveraging the importance of the features, and the hierarchy. The features are first ordered from higher importance to lower importance. Then, if a feature (called C) is in a hierarchy and there are features higher in the hierarchy (called A and B), those features are put before the feature (so the ordering would be A, then B, then C) if not already. This calculation also detects duplicated features (shown in the column “Duplicate with”) and gives them the same rank.

Feature Interactions

The Feature Interactions section informs the users how much two features are similar. It represents how much you know of feature 2 if you know feature 1. The interaction value is between 0 and 1, where if the interaction between feature 1 and feature 2 is 1, knowing feature 1 you entirely know feature 2. So the higher the value, the more information you know from this feature. Keep in mind this is not symmetric: knowing the customer city gives you the customer country, but not the other way around.

The raw data are displayed in the table Feature Interaction Data and the feature interactions are computed from:

for categorical-categorical feature interactions: Theil's U
for numerical-numerical feature interactions: Pearson's R
for categorical-numerical feature interactions: Correlation Ratio

Hierarchies

The interaction value is asymmetrical, so the interaction of feature 1 with feature 2 may not be the same as the interaction of feature 2 with feature 1. This property is used to detect hierarchy structure between the features which is then displayed in the Hierarchies portlet:

Segmentation Parameters

In the left panel, you define the parameters to run your segmentation.

Segmentation Dimensions Selection

The first section, Select segmentation dimensions, is a table of the dimensions available for the segmentation. The rank column displays the result of the Segmentation Dimensions Recommendation portlet. Only the price drivers set in the Analysis step can be selected as segmentation dimensions.

The dimensions are ordered, and some of them are preselected (for the first run of a new model, otherwise the previous selection is kept), according to the table Segmentation Dimensions Recommendation, in the right panel. Based on your business knowledge, you can reorder them or change the selected ones. Check all the fields that you want to use for the segmentation. You can drag and drop the lines to define the order of the segmentation levels.

You must select at least one and up to twenty levels of segmentation. Note also that:

Dimensions with null values will be replaced with the label defined in the Replacement value in the Definition step, if used. Otherwise, the segmentation will return an error if a null value is found.
You should not use a field used to map the transactions source as a segmentation level (product ID, customer ID). If you need to use one of them, duplicate the field in the source Datamart and use one field in the Definition step, the other one here.

If you intend to use alignment in the next step, we advice to use the dimensions with alignment part of the first levels of the segmentation for better results, specially due to data sparsity.

Segmentation Thresholds

The second section of this left panel, Segmentation Thresholds, contains three minimum values. The segmentation tree will only build nodes that match all of these three thresholds.

Elasticity

The third section, Elasticity, lets you choose the elasticity model, either Sigmoidal or Exponential. This defines the kind of elasticity functions that will be fitted to each segment’s data to get the elasticity parameters. There is also a checkbox Calculate metrics based on elasticity. If true, then the next step will not only calculate the elasticity function but also the projected quantity, revenue, and margin if the optimal target metric value is used.

Two parameters allow you to cancel the elasticity calculations for uninteresting or too large segments:

Min depth of leaves for elasticity calculation – If the segment is deeper than this value to the leaf nodes, the elasticity is not calculated in it.
Max #Transactions in segment for elasticity calculation – If the segment represents more than this amount of transactions, the elasticity is not calculated in it.

Elasticity Models

You can choose your elasticity model. The equations behind each elasticity model are in this table where q is the normalized quantity. The output of the elasticity function is a relative quantity: do not use the formula directly, but only in order to compare two different configurations.

Exponential model	Sigmoidal model

Exponential model	Sigmoidal model
where M is the optimization target value if this value is higher than the optimization target average in the segment and the optimization target average of the segment in the other case. A and q₀ are the elasticity parameters.	where x is the optimization target value and x₀ is its reference value. L, k and x₀ are the elasticity parameters.

Click the Continue button (top right) to go to the Segmentation step.

Accelerators Documentation

Configuration Step (Optimization - Negotiation Guidance)