Plasma Technical Introduction
Data Flow Stages
The particular stages of how PricefxPlasma gathers, anonymizes and aggregates customers’ data into standardized metrics are characterized as Extractor, Distiller and Harvester.
1. Extractor
Extractor gathers the customer’s data into standardized Plasma data. Extractor is deployed on the customer’s partition. It can be configured using Price Parameter tables to map the customer’s data to Plasma data, and extract the data from the customer’s partition to Plasma Data Sources and Datamarts, via Data Loads.
2. Distiller
Distiller calculates previous prices, aggregates the Plasma data rolled up by month into standardized metrics. Distiller is deployed on the customer’s partition.
3. Harvester
Harvester gathers the standardized metrics from various customers’ partitions using mapping from configuration tables. Harvester is deployed on the Plasma partition. It combines the standardized metrics data with demographic entity details for each customer. It also anonymizes the complete data and writes it to Harvester Datamart.
Plasma Anonymization Process
Plasma does data anonymization in 4 stages to ensure that it is impossible to trace individual customer data.
As soon as participants sign up for Plasma, they are assigned a meaningless unique identifier (e.g. EFGxxx). The relationship “key – participant” is inaccessible but for a few members of the Plasma team.
While extracting the participant’s data, identity fields, like SKU, CustomerID are hashed using an irreversible algorithm. Columns that are not required for the analysis are not extracted.
The data is aggregated by month/selling-from-region/selling-to-region/product-or-service. This removes the individual transaction lines of the data. What is left are only summarized data per month/selling-from-region/selling-to-region/product-or-service.
Each row of the aggregated data will have UniqueId like below:
<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>
Examples:
2019-M03_Oceania_Oceania_P
2019-M04_Western Europe_Eastern Europe_S
2019-M05_Northern America_South America_P
Steps 2 and 3 happen on the participant’s Pricefx partition – no data leaves the trusted environment.
4. Finally, all participants’ aggregated data is harvested to a central Plasma server and linked with the participant’s unique identifier (e.g. EFGxxx).
Each row will have UniqueId like below:
<EFGxxx>_<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>
Examples:
EFG001_2019-M01_Oceania_Oceania_P
EFG001_2019-M02_Western Europe_Eastern Europe_S
EFG002_2020-M03_Oceania_Oceania_S
EFG002_2020-M03_Northern America_South America_P
5. This final data is then sent to Bain&Co for the KPI benchmark generation. KPI benchmarks are generated based on this aggregated data.
7. KPI benchmarks will be generated only if there are 5 or more participants for the metric. The KPIs do not contain any identifiers at all.
8. The KPI values are distributed to all participants.