Technical User Reference (Optimization - Product Similarity)

This section details the ModelClass and logics that the Product Similarity Accelerator deploys. For each step, its aim, outputs, and the main reasons to modify the logics are explained.

In this section:

 

Product Similarity Model Class

The Product Similarity Model Class organizes a list of logics to create the model architecture. It is a JSON file that refers to some logics and it is transformed into an optimized UI in the Pricefx platform.

The general architecture of the Product Similarity Model Class is:

It defines five steps:

  • Definition – Sets the scope of the products tables and of transactions, and sets parameters for similarity exploration model.

  • Similarity Weighting – Runs similarity measures and screens for more similar products, then lets the user set the weights and threshold for finest comparison of products.

  • Product Similarity – Looks at the outputs of the similarity analysis, the similar products of any product and finally lets the user configure the grouping.

  • Product Grouping – Looks at the groups and products in the groups.

  • Additional Products – Looks at the groups and how new products are dispatched in these groups.

Library

The logic is PSim_Lib.

ProdSimMC_Lib is used in nearly all the other logics deployed by the accelerator and defines a set of functions needed specifically for this accelerator, but also some constants used to easily change the user interface wording. There are the following elements:

  • Parameters – Contains a function to check the type of the columns when exporting some tables.

  • Utils – Constants definition.

  • Labels– Groups static fields used for naming variables, tables...

  • Definitions – Sets of tools dedicated to the Definition step.

  • DataDefinitions – Sets of tools dedicated to interaction with data and user settings.

  • Configurators – Groups the methods to deal with formatting user inputs.

  • ConfigurationUtils – Groups the methods to deal with initialization of user inputs.

  • TableUtils – Groups the methods to interact with model tables by centralizing their name, label and the step that creates them.

  • MixpanelUtils – Tools for tracking of model usage inside Mixpanel.

It is accessed via the calls on libs.PSimMC_Lib.XXX in the code.

Change text visible to users in LabelUtils or in case of table / table fields names in TableUtils.
If there is another kind of input to deal with.

Definition Step

There is no calculation logic in this step, and there are three tabs with related dashboard and evaluation logics: PSim_1_definitions_eval_productData and PSim_1_definitions_eval_transactionData and PSim_1_definitions_eval_modelConfiguration.

These logics provide the user inputs to define at least a source of product data to map it, and optionally to define a source of transactional data (plus mapping) and to define what kind of text transformer to use, as well as the maximum number of similar products to keep in the following analysis.

A table of the filtered product data and optionally, a table of the filtered transactional data that will be used for the similarity analysis.

  • Some other mappings are needed or would be retrieved.

  • Some customized metrics that will require specific developments.

  • To define pre-set filters.

  • To add a chart to better understand the data. (Caution: it can take long, as the data are not yet stored in the model.)

Similarity Weighting Step

Contains one calculation sequence that chains 4 logics PSim_2_simWeights_calc_loadData, PSim_2_simWeights_calc_textTransformers, PSim_2_simWeights_calc_approxNearestNeighbors, and PSim_2_simWeights_calc_coProductMetaData that are executed when accessing this step. The dashboard is split in two panels, one for user inputs, the other for evaluation.

Calculation: Data Aggregation

The logic is PSim_2_simWeights_calc_loadData.

Calculation: Text Transformation

The logic is PSim_2_simWeights_calc_textTransformers.

Calculation: Raw Similarity

The logic is PSim_2_simWeights_calc_approxNearestNeighbors.

Calculation: CoProductMetaData

The logic is PSim_2_simWeights_calc_coProductMetaData .

Setup Panel

The logic is PSim_2_simWeights_eval_simWeights and uses PSim_2_simWeights_eval_inputSimWeights_Configurator.

EvaluationPanel

The logic is PSim_2_simWeights_eval_simWeights .

Product Similarity Step

Starts with one calculation logic PSim_3_similarity_calc_productSimilarity that is executed when accessing this step which splits in three tabs: Similarity Overview, Similarity Dashboard, and Similarity Grouping. First, this calculation subsets the products' pairs that fulfill the minimum similarity criterion and saves them in a model table. Then, some other model tables are prepared to have the data ready for display in the dashboard’s histograms and tables, particularly similarityTable.

Similarity Overview

The logics is PSim_3_prodSimilarity_eval_simOverview.

Similarity Dashboard

The logic is PSim_3_prodSimilarity_eval_simDashboard which proposes an interactive dashboard for exploration of one product’s similarities.

Similarity Grouping

The logic is PSim_3_prodSimilarity_eval_simGrouping.

Product Grouping Step

Starts with two calculation logics named PSim_4_community_calc_wavgCommunity and PSim_4_community_calc_namingCommunity called in sequence.

Similarity Grouping Dashboard

The logic is PSim_4_prodGrouping_eval_simGrouping.

Product Overview

The logic is PSim_4_prodGrouping_eval_prodOverview.

Setting for Additional Products

The logic is PSim_4_prodGrouping_eval_additionalProduct.

More Products Step

Contains two calculation logics named PSim_5_newProducts_calc_loadNewData and PSim_5_newProducts_calc_labelingNewProducts that are automatically triggered when accessing this step. The first one loads the data about new products using the filters defined in the previous step. The second one is more complex:

  • Prepares the data for comparison purposes.

  • Computes embeddings only for new products that are unknown (to reduce resources usage and computation time).

  • Labels New Products using the parameters selected by the user (metric type, most similar or majority) and a nearest neighbor approach that allows each new product to find its best neighbors in the graph of similarity made using original products. This process is multi-threaded, so each new product will explore the graph of already labelled products to find its right place in an independent thread. The number of simultaneous threads equals to the number of available CPUs. For graph search, the function process_new_product present in Python Engine starting with version v9 is used.

  • Saves the results in model tables, particularly newProductTable.

This step can be re-run several times on different subsets of products by changing the setting from the last tab of the previous step. Each new run will result in an extension of the table which stores results with the new products affectation, including the time stamp of the run.

New Products

The logic is PSim_5_moreProducts_eval_newProducts.

Updated Groups

The logic is PSim_5_moreProducts_eval_updatedGroups.

Evaluations

The model has one evaluation: PSim_ModelEvaluation_Eval. That allows you to retrieve for one product or a list of products all the raw similarities that have been computed for it/them. For more details about model evaluations see Query Optimization Engine Results | Using the Evaluator.