This article explains how timeouts are handled in a logic and in libraries which call each other and how they are handled in general.
Example
In the example below, running lib.ElementA.getAverageMargin()
for a large set of SKUs will time out after 2 seconds. The reason is that even though each time the getProductMargin
method is called, it takes less than 2 seconds each time, the cumulative total time spent inside that method (because of its reference in the loop) will be more than 2 seconds.
lib.ElementA: (timeout 300 seconds)
Code Block |
---|
Number getAverageMargin(List<String> skus){
List<Number> margins = []
for(sku in skus){
margins.add(lib.ElementB(getProductMargin(sku)))
}
return margins.sum() / margins.size()
} |
lib.ElementB: (timeout 2 seconds)
Code Block |
---|
Number getProductMargin(String sku){
def costPrice = api.find("PX30")... //get cost from some data source
def sellPrice = api.find("PLI")... //get sellprice from another data source
return sellPrice - costPrice
} |
Explanation and Technical Background
Timeouts are essentially safety catches against a bad logic (such as accidental infinite loops). JVM does not have a setting like "run this thread, but only for x seconds". So we use a trick that we inject (at compile time) some code that throws an exception if the system time has elapsed the start time + timeout. Now the different logic elements are essentially Groovy script classes. So there is no easy sharing of a script-global "start time" variable in all thinkable cases. So we resort to using a start time variable per element (normal or lib element) which is initialized at the class instantiation. So essentially a private script class member variable which is set to the system time in the constructor of the script class. The engine "resets" that start time for lib elements by re-instantiating the lib elements before executing every normal formula element. Normal elements are only instantiated once (hence the topic of "timer starts running when the element is executed first time" when you call previous elements further down).
In Collins 5.0, there will be some improvements with regards to better reporting on which exact element a timeout happened, and that a timeout change will lead to recompilation (to make it effective), but the fundamentals will stay the same. And unless someone comes up with a way to have this safety latch implemented with a single timeout value on element level without compromising the effectiveness, it will probably stay that way.
Hint: The suggestion "simply do not compile in timeouts for lib elements" does not work, as then an infinite loop in a lib would not be caught. From a system perspective we cannot trust any Groovy code to be "safe":
The executed element timeout is always honored.
Library elements timeouts are ignored. What matters is the time the logic element is executed, including library calls.
Elements timeouts are always capped by the maximum timeout configured at the cluster level.
This setting defaults to 900s (15 min).
It can be increased by Pricefx Support via the setting
formulaEngine.script.maxTimeoutInSec
.
Special cases:
Since 10.0: PA Data Loads, PA Distributed Calculation, DMModel Calculations and Model Calculations completely ignore the timeouts in their logics but not in their libraries.
Since 10.2: PA Data Loads, PA Distributed Calculation, DMModel Calculations and Model Calculations completely ignore the timeouts in their logics and in the libraries they use.
Before 10.2: Elements timeout is set when the logic is pushed/saved to the partition. So if you increase the maximum timeout in your cluster, you need to resave your logic.
If an element just calls a library and the library has a higher timeout than the element, then the library element timeout is honored.