Distributed Calculation Dataload Logic

Since version 10.0

You will use a Distributed Calculation Dataload when you need to:

Enrich/process a Datamart, Data Source or Data Feed with big amount of data, for example to:
- Add new records - copy and transform data from other table to this table.
- Modify existing records - pre-calculate values, which were not imported from an external system, but are needed for analysis and their calculation takes too long to be done on demand.

Note, that the Distributed Calculation Dataload can be still set NOT to be executed in the distributed mode. The use-cases could be, for example:

The processing cannot be split up to batches and executed concurrently, because of dependencies between the rows. You may need to iterate over the rows in given order to track price changes.
Replace the legacy Calculation Dataload with quite a bit faster version.
You need to test the process and would like all to run in the same thread.

Calculation Item

Calculation Item represents the "batch" of records, which will be processed together in one execution.

Unlike legacy Calculation Dataload, in the distributed calculation the calculation elements are executed for each Calculation Item, instead for each row.

Logic API

Logic Nature: distPACalc
- Logic Type: Calculation/Pricing
Execution Types - each logic element can belong (and be executed) to the following element contexts (which are executed in 3 stages of the distributed calculation dataload process:
1. calculation-init - prepares the list of Calculation Items. The system then stores them in company parameter DistributedCalculation [xxx]
2. calculation - process one Calculation Item. These calculation elements will be executed once for each Calculation Item. It is likely, that more Calculation Items will be processed in parallel, likely on different machines.
3. calculation-summary - summarize some stats about the process

element context

init

calculation

summary

Execution Type

Input Generation

Normal

Input Generation

Normal

Input Generation

Normal

dist : DistFormulaContext

yes

build input field definitions

yes

input : Map

yes

api.currentItem()

yes

generate rows for target table

yes

generate list of Calculation Items

yes

process a Calculation Item

yes

calculate summary of the process

yes

Information provided to the logic
- binding variable dist : DistFormulaContext
- binding variable input : Map - with values of all input fields created by the logic and set by the user
- api.currentItem() : Map - definition of the Dataload. Not available during testing of the logic.
- Allow object modification - true. This process can update data in tables (e.g., via api.update()).
Expected logic outcome
- Input fields - during configuration of the Dataload, each element will be executed in Input Generation mode, to be able to build input fields it needs.
- in each execution stage, the logic can generate rows for target table
- generate list of Calculation Items - the init stage can generate the Calculation Items. The system will store them in the company parameter DistributedCalculation [xxx] (where xxx is an ID of the Dataload)
- process a Calculation Item - in the calculation stage, the system will execute the calculation elements for each Calculation Item found
- calculate summary of the process (e.g. some stats of the data rows generated)
  - summary values returned via elements with Display Mode Everywhere. Those will be stored in the Dataload definition.

Configuration

Besides Studio, the logic can be configured at Administration › Logics › Calculation Logic › Analytics Calculations.

The dataload is configured via Analytics › Data Manager › Data Load. See also Distributed Calculations in Analytics in documentation.

Code Samples

Code sample can be found in article How to Run Distributed PA Calculation