You will use a Distributed Calculation Dataload when you need to:
-
Enrich/process a Datamart, Data Source or Data Feed with big amount of data, for example to:
-
Add new records - copy and transform data from other table to this table.
-
Modify existing records - pre-calculate values, which were not imported from an external system, but are needed for analysis and their calculation takes too long to be done on demand.
-
Note, that the Distributed Calculation Dataload can be still set NOT to be executed in the distributed mode. The use-cases could be, for example:
-
The processing cannot be split up to batches and executed concurrently, because of dependencies between the rows. You may need to iterate over the rows in given order to track price changes.
-
Replace the legacy Calculation Dataload with quite a bit faster version.
-
You need to test the process and would like all to run in the same thread.
Calculation Item
Calculation Item represents the "batch" of records, which will be processed together in one execution.
Unlike legacy Calculation Dataload, in the distributed calculation the calculation elements are executed for each Calculation Item, instead for each row.
Logic API
-
Logic Nature: distPACalc
-
Logic Type: Calculation/Pricing
-
-
Execution Types - each logic element can belong (and be executed) to the following element contexts (which are executed in 3 stages of the distributed calculation dataload process:
-
calculation-init - prepares the list of Calculation Items. The system then stores them in company parameter DistributedCalculation [xxx]
-
calculation - process one Calculation Item. These calculation elements will be executed once for each Calculation Item. It is likely, that more Calculation Items will be processed in parallel, likely on different machines.
-
calculation-summary - summarize some stats about the process
-
element context | init | init | calculation | calculation | summary | summary |
---|---|---|---|---|---|---|
Execution Type |
Input Generation |
Normal |
Input Generation |
Normal |
Input Generation |
Normal |
dist : DistFormulaContext |
yes |
yes |
yes |
yes |
yes |
yes |
build input field definitions |
yes |
yes |
yes |
|||
input : Map |
yes |
yes |
yes |
|||
api.currentItem() |
yes |
yes |
yes |
|||
generate rows for target table |
yes |
yes |
yes |
|||
generate list of Calculation Items |
yes |
|||||
process a Calculation Item |
yes |
|||||
calculate summary of the process |
yes |
-
Information provided to the logic
-
binding variable dist : DistFormulaContext
-
binding variable input : Map - with values of all input fields created by the logic and set by the user
-
api.currentItem() : Map - definition of the Dataload. Not available during testing of the logic.
-
Allow object modification - true. This process can update data in tables (e.g., via
api.update()
).
-
-
Expected logic outcome
-
Input fields - during configuration of the Dataload, each element will be executed in Input Generation mode, to be able to build input fields it needs.
-
in each execution stage, the logic can generate rows for target table
-
generate list of Calculation Items - the init stage can generate the Calculation Items. The system will store them in the company parameter DistributedCalculation [xxx] (where xxx is an ID of the Dataload)
-
process a Calculation Item - in the calculation stage, the system will execute the calculation elements for each Calculation Item found
-
calculate summary of the process (e.g. some stats of the data rows generated)
-
summary values returned via elements with Display Mode Everywhere. Those will be stored in the Dataload definition.
-
-
Configuration
Besides Studio, the logic can be configured at
.The dataload is configured via Distributed Calculations in Analytics in documentation.
. See alsoCode Samples
-
Code sample can be found in article How to Run Distributed PA Calculation