Plasma Technical Introduction

Data Flow Stages

The particular stages of how PricefxPlasma gathers, anonymizes and aggregates customers’ data into standardized metrics are characterized as Extractor, Distiller and Harvester.

 

1. Extractor

Extractor gathers the customer’s data into standardized Plasma data. Extractor is deployed on the customer’s partition. It can be configured using Price Parameter tables to map the customer’s data to Plasma data, and extract the data from the customer’s partition to Plasma Data Sources and Datamarts, via Data Loads.

2. Distiller

Distiller calculates previous prices, aggregates the Plasma data rolled up by month into standardized metrics. Distiller is deployed on the customer’s partition.

3. Harvester

Harvester gathers the standardized metrics from various customers’ partitions using mapping from configuration tables. Harvester is deployed on the Plasma partition. It combines the standardized metrics data with demographic entity details for each customer. It also anonymizes the complete data and writes it to Harvester Datamart.

Plasma Anonymization Process

Plasma does data anonymization in 4 stages to ensure that it is impossible to trace individual customer data.

  1. As soon as participants sign up for Plasma, they are assigned a meaningless unique identifier (e.g. EFGxxx). The relationship “key – participant” is inaccessible but for a few members of the Plasma team.

  2. While extracting the participant’s data, identity fields, like SKU, CustomerID are hashed using an irreversible algorithm. Columns that are not required for the analysis are not extracted.

  3. The data is aggregated by month/selling-from-region/selling-to-region/product-or-service. This removes the individual transaction lines of the data. What is left are only summarized data per month/selling-from-region/selling-to-region/product-or-service.

Each row of the aggregated data will have UniqueId like below:

<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>

Examples:

  • 2019-M03_Oceania_Oceania_P

  • 2019-M04_Western Europe_Eastern Europe_S

  • 2019-M05_Northern America_South America_P

Steps 2 and 3 happen on the participant’s Pricefx partition – no data leaves the trusted environment.

4. Finally, all participants’ aggregated data is harvested to a central Plasma server and linked with the participant’s unique identifier (e.g. EFGxxx).

Each row will have UniqueId like below:

<EFGxxx>_<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>

Examples:

  • EFG001_2019-M01_Oceania_Oceania_P

  • EFG001_2019-M02_Western Europe_Eastern Europe_S

  • EFG002_2020-M03_Oceania_Oceania_S

  • EFG002_2020-M03_Northern America_South America_P

5. This final data is then sent to Bain&Co for the KPI benchmark generation. KPI benchmarks are generated based on this aggregated data.

7. KPI benchmarks will be generated only if there are 5 or more participants for the metric. The KPIs do not contain any identifiers at all.

8. The KPI values are distributed to all participants.