Distributed Calculation Dataload Logic
Since version 10.0
You will use a Distributed Calculation Data Load when you need to:
Enrich/process a Datamart, Data Source or Data Feed with big amount of data, for example to:
Add new records – copy and transform data from other table to this table.
Modify existing records – pre-calculate values which were not imported from an external system, but are needed for analysis and their calculation takes too long to be done on demand.
Note that the Distributed Calculation Data Load can still be set not to be executed in the distributed mode. The use cases could be, for example:
The processing cannot be split in batches and executed concurrently because of dependencies between the rows. You may need to iterate over the rows in a given order to track price changes.
Replace the legacy Calculation Data Load with a faster version.
You need to test the process and would like to run all in the same thread.
Calculation Item
Calculation Item represents the "batch" of records which will be processed together in one execution.
Unlike legacy Calculation Data Load, in the distributed calculation the calculation elements are executed for each Calculation Item, instead for each row.
Logic API
Logic Nature: distPACalc
Logic Type: Calculation/Pricing
Execution Types – Each logic element can belong (and be executed there) to the following element contexts which are executed in 3 stages of the distributed calculation Data Load process:
calculation-init – Prepares a list of Calculation Items. The system then stores them in a Company Parameter DistributedCalculation [xxx].
calculation – Processes one Calculation Item. These calculation elements will be executed once for each Calculation Item. It is likely that more Calculation Items will be processed in parallel, possibly on different machines.
calculation-summary – Summarizes some statistics about the process.
Element context | Init | Init | Calculation | Calculation | Summary | Summary |
---|---|---|---|---|---|---|
Execution Type | Input Generation | Normal | Input Generation | Normal | Input Generation | Normal |
dist : DistFormulaContext | yes | yes | yes | yes | yes | yes |
build input field definitions | yes |
| yes |
| yes |
|
input : Map |
| yes |
| yes |
| yes |
api.currentItem() |
| yes |
| yes |
| yes |
generate rows for target table |
| yes |
| yes |
| yes |
generate list of Calculation Items |
| yes |
|
|
|
|
process a Calculation Item |
|
|
| yes |
|
|
calculate summary of the process |
|
|
|
|
| yes |
Information provided to the logic
Binding variable dist : DistFormulaContext
Binding variable input : Map – With values of all input fields created by the logic and set by the user
api.currentItem() : Map – Definition of the Data Load. Not available during testing of the logic.
Expected logic outcome
Input fields – During configuration of the Data Load, each element will be executed in Input Generation mode, to be able to build input fields it needs.
In each execution stage, the logic can generate rows for the target table.
Generate a list of Calculation Items – The init stage can generate the Calculation Items. The system will store them in the Company parameter DistributedCalculation [xxx] (where xxx is an ID of the Data Load).
Process a Calculation Item – In the calculation stage, the system will execute the calculation elements for each Calculation Item found.
Calculate a summary of the process (e.g. some stats of the data rows generated).
Summary values returned via elements with Display Mode Everywhere. Those will be stored in the Data Load definition.
Configuration
Besides Pricefx Studio, the logic can be configured at Administration › Logics › Calculation Logic › Analytics Calculations.
The Data Load is configured via Analytics › Data Manager › Data Load. See also Distributed Calculations in Analytics in documentation.
Code Samples
Code sample can be found in article How to Run Distributed PA Calculation.
Found an issue in documentation? Write to us.