Caching (3620438128)

whenever you need to improve performance of repetitive reading of the same data, the caching principles helps a lot. Note: We speak here about application server caching, not DB caching.

In Pricefx this applies to all tasks, which processes a long list of lines/rows, for example:

Pricelist calculation
Live Price Grid calculation
Data Enrichment Tasks
- Calculated Field Set
- Calculation Dataload
Quotes with many lines

Generally you can cache any kind of data, for example:

query result of a DS/DM query
list of rows from Price Parameter
list of rows from Product Extension

Principles of speed optimization

plan the code and measure performance on EXPECTED volume of data
- always ask customer about current and (in future) expected amount of lines to process
- generate simple mock data lines to get real performance measurements
- Testing with 5 lines will NOT give you good idea
- each process has certain time "overhead", which could look quite big, when you compare it to time spent on 5 lines, but could be minor, when compared to time spent on 10,000 rows
start with the biggest issues based on measurement
- measure current performance
  - use the Performance Log/Trace
  - consider also measuring of time spent in blocks of code, not only the whole elements
- start with optimization of the code, which takes the longest time
  - usual suspect is repetitive data reading, but it could be also in inefficient code

Principles of caching

Scenario:

Your line/row logic is executed many times, and each time it needs to read data from the same table and

the query either always returns completely the same result
or you’re reading from the same table using different filters, but the table is so small that it’s ok to read completely into memory

In such case you can cache the data:

you want to ensure to read the data only once (the first time you need them)
1. and store the data in a List or in a Map in the cache.
the next time you need the data, you will read them from the cache.

Feature \ Cache api.local api.global Shared Cache

Accessible via

binding/variable api.local

binding/variable api.global

function api.getSharedCache()

Stored Data Types

any object

String only

Content Scope

only during execution of Logic for 1 item

shared across executions of Logic for more items on one Node. Additionally also across certain Header and Line logics.

shared across Nodes and processes

Storage

memory

noSQL in-memory database

TTL (TimeToLive)

no limit, but cache survives only until the end of single logic execution

no limit, but cache survives only until the end of the process (e.g. end of pricelist items calculation)

15 mins

Speed

fast

slower

In-memory cache api.global

the content of api.global binding variable "survives" between the subsequent calls of your row/line logic during the execution of the process, so you can use it for caching in, for example:

pricelist line item logic
live price grid line item logic
quote line item logic
CFS row logic
DL "row" logic

During the distributed calculations (Pricelist, LPG, CFS), the process is executed across several server Nodes, each sub-process calculating small part of the lines/rows. Each sub-process has its own api.global content, i.e. it’s not shared across the sub-processes. So, if you’re running a PL calculation distributed across e.g. 3 Nodes, the system will hold the 3 instances of api.global in total in the memory.

api.global

This binding/variable is available in every type of Logic.
it behaves as a Map, so you can use usual ways to access the values via keys
it keeps all values during the whole process, i.e. in between all the calls to the same line/row Logic.

in former Pricefx versions, the "global" nature of the variable had to be switched on by a statement api.retainGlobal = true, otherwise it actually behaved the same as binding/variable api.local. You can find this statement in many former projects.
in recent versions of Pricefx, there’s a configuration setting, which causes the api.global to work in the "global" way, even without using the special statement.
- also, all newly created partitions have this setting ON by default.

Code Samples using api.global

Sample of reading a full content of Price Parameter into in-memory cache

def getAllFreightSurcharges() {
    return api.namedEntities(
                   api.findLookupTableValues("FreightSurcharge")  //(3)
    )
}

def getCachedAllFreightSurcharges() {
    final key = "FreightSurchargeDataCacheKey"

    if (api.global.containsKey(key)) {  //(1)
        return api.global[key]          //(2)
    }

    def data = getAllFreightSurcharges()

    api.global[key] = data              // (4)

    return data                         //(5)
}

check, if the data were already stored in cache under the given key
if it’s already in cache, return immediately a reference to it
read the data from the table
store data into cache
return the data

Shared Cache

The shared cache is a distributed key-value store (distributed cache). Main benefit of Shared Cache is that it’s shared across logics and across Nodes.

So, for example, if a Pricelist is running in Distributed mode, the logics running on different nodes can share data.
Another use case could be, that some process will write data into it, and later some scheduled task will pick up the value and process.

Features:

can store only String - if you need to store something else, you need to convert to JSON
shared across Nodes and processes
- data stored in DB (usually powered by Redis)
has TTL (Time To Live) - data are not stored there forever, usual limit is 15 mins

Sample of reading a full content of Price Parameter into Shared cache

def getAllMarginAdjustments() {
    return api.namedEntities(
                   api.findLookupTableValues("MarginAdjustment")  //(3)
               )
}

def getCachedAllMarginAdjustments() {
    final key = "MarginAdjustmentsDataCacheKey"

    if ((stringData = api.getSharedCache(key)) != null) {  //(1)
        return api.jsonDecodeList(stringData)              //(2)
    }

    def data = getAllMarginAdjustments()

    api.setSharedCache(key, api.jsonEncode(data))                // (4)

    return data                         //(5)
}

check, if the data are available in the cache
if it’s already in cache, decode the data rows from JSON and return it. As you’re constantly decoding the JSON, if this mechanism would be used frequently inside of a process on one Node, it would be good to consider caching of the decoded value in api.global.
read the data from the table
store data into cache
return the data

For details, see: api.setSharedCache(), api.getSharedCache().

Monitoring Performance

Each process (PL, CFS, DL, …) has a Performance Trace, where you can see:

the total amount of time spent in each Element of your logic
how many times was each Element executed

If you will run your logic only for e.g. 5 lines, then the summary of time of those 5 executions could be highly influenced by the caching which happened during 1^st execution of the Logic. So we recommend test the time on more lines - e.g. 100 and more.

To see the Performance Tracking

navigate to Administration → Logs → Jobs&Tasks
find the process, which you’re interested about, e.g. a Pricelist. You can find it by filtering by Type, Target Object, Name, etc.
then click on the "eye" symbol of the process

You notice that

the "ReadData" element
- was executed 58 times,
- all together it took 76.44ms,
- which was 35% of time of the Pricelist calculation process.
the function api.findLookupTableValues() was called 58x

In this Cached version of the same pricelist (we used api.global for caching) you can see 2 improvements:

the element total execution is only 8.12ms
the function api.findLookupTableValues was called only 1x !

References

Knowledge Base

Documentation

Global API Variables

Groovy API