Distributed calculation is something that you can select in context of PL/LPG/CFS, which could speed up your processing time drastically, but how does it work?
The idea is pretty simple, as non-distributed calculation uses only one NODE and only one THREAD will be created, while the distributed one dynamically select number of NODES which has configurable numbers of THREADS that are going to be created per NODE to perform calculation.
Please look closer on provided examples of distributed types of calculations (enabled/disabled), as it was created without accessing the pricefx-core code, the idea should be kept as how does it work in real life.
Non-distributed Calculation
This flow show example of calculation process for non distributed calculation.
Example:
Distributed Calculation
This flow show example of calculation process for distributed calculation.
It was built based on the non distributed calculation flow chart, but the idea is to show the processes only for the one NODE (which in fact might have a lot of more THREADS), but the process is the same on all NODES, the number of NODES for calculation is dynamically created, so it is hard to predict how many NODES you are going to have in the game and how many THREADS in total.
Example:
Based on this flow, imagine that you will have 6000 items to be calculated, those are going to be split for all of the nodes.
For example if you have 3 nodes in the game, let's say that that this number will be divided by 3, having said NODE1 will have 2000, NODE2 will have 2000, NODE3 will have 2000.
Next step is to split those items for the batches and jobs to be calculated in (each node and thread will work in parallel), having said that there is going to be 5 threads per node, we will end up having:
On NODE1 400 items to be calculated (2 batches) per THREAD, and same for the other ones, meaning that the api.global will be shared only between THREAD (not NODE) meaning that you cannot pass items between nodes using api.global, in that case api.getSharedCache()/set.../remove... methods should be used, which then can be accessed between nodes and it's threads.
In the pricefx-config.xml file, you can restrict the number of parallel slave threads for a given job by its size. The definition MUST be sorted from low to high. The key defines the maximum job size for this slave count (="up to" semantics).
<maxSlaveThreadsPerJobMap>
<upto_5000>1</upto_5000>
<upto_20000>2</upto_20000>
<upto_50000>3</upto_50000>
<upto_10000000>4</upto_10000000>
</maxSlaveThreadsPerJobMap>
Q&A
- Is there any min limit that will consider if the job will create batches or not?
- Yes there is, up to the environment it might be vary, please ask support for your specific example.
- For shared production it was tested well that if there is more than 2000 items, then calculation will create batches and then calculate them, if the limit is not met, batches will always return null and calculation will be only performed in the row calculation level.
- This limit (5000 by default) is set in the price fx configuration file (server/config/pricefx-config.xml) as follows:
<!-- Configuration values applicable if the node is a calculation slave --><calculationSlave><!-- Minimum number of items to process to allow/use distributed mode --><itemThreshold>5000</itemThreshold></calculationSlave>
- Is it worth using it in all cases?
- I would say yes, if you are worried about performance, this is the first point that you should refer to.
- I am unable to use api.addOrUpdate() method in distributed calculation, is there any workaround for this function?
- Yes there is a api.boundCall() which can be called including batches, but it is not recommended, however if you do it safety then it will make sense (addOrUpdate only specific sort of keys, not the same).
- Please refer to this link as example of single usage, or this link as example of batching usage.