Customers' data remain confidential throughout the entire process. We have made sure that it is impossible to identify customers' data.
PricefxPlasma benchmarking is based purely on anonymized and aggregated data.
Customer data is anonymized by Pricefx via double-blinding (see below how the process works).
No one outside of Pricefx has visibility to unique customer identifiers.
Benchmarks are compiled from aggregated metrics with a number of anonymized entities to ensure that users cannot self-identify certain companies.
Benchmarks are not shown when sufficient data is unavailable, e.g., in case of more detailed KPI filtering by a user (by industry, region, etc.). This rule also protects the identity of the underlying entities in addition to improving statistical validity.
It is impossible to conduct a competitive comparison on standalone items (e.g., product price or even product lines) for users to self-identify certain companies, due to applied aggregation of data.
Data is extracted and anonymized on a monthly basis but made available with a three month delay. (fast enough to allow for relevant analysis but sufficiently disconnected from the current market status to avoid any compliance issues)
PricefxPlasma works with the following data from customers: transactions data and quotes data
The anonymized data is transformed into standardized metrics that are then loaded into the PricefxPlasma platform, which aggregates and filters the metrics further to create industry-level benchmarks.
The resulting benchmarks are distributed to the customers’ environments as a set of standard Pricefx dashboards and customers can also include this data in their own dashboards, allowing for a direct comparison between their company and the benchmark.
Basics of Anonymization
In double blind anonymization, data is first anonymized by removing personally identifiable information such as names, addresses, and specific identifiers. This is the first layer of anonymization, which ensures that the users cannot identify the data subjects.
The second layer of anonymization involves masking other identifiers that could potentially reveal the identity of the data subjects. For example, if data collection includes information about the geographic location of data subjects, this information will be masked or generalized to protect their privacy.
By using double blind anonymization, benchmark comparisons can be done without compromising the privacy of data subjects. This technique helps to ensure that sensitive information remains protected and that research is done in an ethical manner.
How it Works in Plasma
Plasma does data anonymization* in 4 stages to ensure that it is impossible to trace individual customer data.
*an automated process that removes customer identifiers by assigning a random key to the customer data immediately at the point of extraction to disassociate it from the customer, ensuring that it is impossible to trace individual customer data.
Through the double-blinding process, no-one can see unique customer identifiers, including Pricefx customers. This means, that as a user of Plasma, the client can only see their data against other players in the industry without being able to identify them.
As soon as participants sign up for Plasma, they are assigned a meaningless unique identifier (e.g. EFGxxx). The relationship “key – participant” is inaccessible.
While extracting the participant’s data, identity fields, like SKU, CustomerID are hashed using an irreversible algorithm. Columns that are not required for the analysis are not extracted.
The data is aggregated by month/selling-from-region/selling-to-region/product-or-service. This removes the individual transaction lines of the data. What is left are only summarized data per month/selling-from-region/selling-to-region/product-or-service.
Each row of the aggregated data will have UniqueId like below:
<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>
Example:
2019-M03_Oceania_Oceania_P
4. Steps 2 and 3 happen on the participant’s Pricefx partition – no data leaves the trusted environment.
5. Finally, all participants’ aggregated data is harvested to a central Plasma server and linked with the participant’s unique identifier (e.g. EFGxxx).
Each row will have UniqueId like below:
<EFGxxx>_<Year>-M<month>_<SellingFrom>_<SellingTo>_<product or service>
Example:
EFG001_2019-M01_Oceania_Oceania_P
6. This final data is then sent to Bain&Co for the KPI benchmark generation. KPI benchmarks are generated based on this aggregated data.
7. KPI benchmarks will be generated only if there are 5 or more participants for the metric. The KPIs do not contain any identifiers at all.
8. The KPI values are distributed to all participants.
Good to know: Plasma aggregates data from year-to-date (YTD), excluding the most recent 3 months. This exclusion is intentional to account for any events such as fairs, expos, or other factors that could potentially impact the data, such as increases in sales. By avoiding the collection of data during this period, we ensure maximum anonymity and mitigate the risk of making the business identifiable.