Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is just to give you an idea what it means to optimize results for simple 3 queries and 2000 and 100 000 items calculation.

Notes:

  • The real numbers are greyed out, done on the 2000 level.

  • The 100 000 items is a projection, but the results will be much better than shown due to the fact that distributed calculation will send the work to more than 2 nodes (as it was for 2000 items).

  • Optimized = querying using api.getBatchInfo() together with api.global cache following current guide.

  • Not optimized = querying on item level (1 to 1) like in most projects.

  • Distributed = distributed jobs of calculations (send to x nodes). Allowed Distributed Calculation = Checked.

  • Not distributed = single node & thread work. Allowed Distributed Calculation = Unchecked.

Implementation (LIB Example)

...

Methods of the presented library (as of today):

  • libs.CommonLib.BatchUtils.prepareBatch(sku) – This method initializes the cache variables that can be used later on (should be called right after api.retainGlobal = true in the logic with feed of sku).

    • Will create a new local reference api.local.isNewBatch – boolean value (true/false) shows if you are working with a new batch or not.

    • Will create/override a new global reference api.global.iterationNumber – shows if this is 2nd pass or 3rd pass of calculation etc.

    • Will create/override a new global reference api.global.currentBatch as Set of the given key.

  • libs.CommonLib.BatchUtils.isNewBatch() – returns a boolean value showing if data should be fetched and cached from the sources or not.

  • libs.CommonLib.BatchUtils.getCurrentBatchSku() – returns the Set of elements that should be fetched and cached later on.

Implementation:

  1. You need a LIB, which is located at: https://gitlab.pricefx.eu/accelerators/pricefx-logic/-/blob/master/

  2. You need to apply it as required. Here is an example of fetching PX data and caching it for the batches using api.stream():

Code Block
languagegroovy
api.retainGlobal = true
api.local.currentSku = api.product("sku")
libs.CommonLib.BatchUtils.prepareBatch(api.local.currentSku)

if (libs.CommonLib.BatchUtils.isNewBatch()) {
    def skus = libs.CommonLib.BatchUtils.getCurrentBatchSku()
    def filter = Filter.and(
            Filter.equal("name", "Costs"),
            Filter.in("sku", skus)
        )
    def costsStream = api.stream("PX", "-lastUpdateDate", ["sku","attribute1"], filter)
    def costs = costsStream.collectEntries( 
        [ it.sku, it?.attribute1 ?: 0.0 ] 
    )
    costsStream.close()
    api.global.costs = [:]
    
    skus.each {
        api.global.costs[it] = costs[it] ?: 0.0
    }
}

Note: The presented LIB is under development, currently it does not support CFS and 2nd 2nd key logics (MATRIX).

Implementation (NON-LIB example)

...

Code Block
languagegroovy
if (!api.global.batch) {
    api.global.batch = [:]
}

In the next step, check whether the SKU is already cached or not. More details are in the comments in the code.

Code Block
languagegroovy
api.local.pid = api.product("sku") 
if (!api.global.batch[api.local.pid]) {
    // ensure to clear() current batch, there is no need to keep the previous batch cache in the memory, it only makes the map oversized
	// it is also more effective using .clear() function over initialization of map by [:] again and again
    api.global.batch.clear()

    // get all SKUs from the batch, we need to handle the NULL result which will happen during debugging 
	// while at batchInfo, collect all SKUs (as first element) – the second element will be null or it will have a value  
	// if 2nd key is not null, you can even cache values for SKU + 2nd key values, but we want to keep this example simple
    def batch = api.getBatchInfo()?.collect { it[0] } ?: [api.local.pid] as Set


    // in this example api.find() is used, it gets costs for the whole batch (it will also work if the batch is a single item!)  
	// here we already marked the PX “Costs” attribute2 within the “Real” format and “Required” flag, so we don’t need to validate 
	// the output (but it is good practice to do so), also the business key is set up on the SKU level so there won’t be any duplicates
    def cost = api.find("PX", 0, "lastUpdateDate",
            Filter.and(
                    Filter.equal("name", "Costs"),
                    Filter.in("sku", batch) // make sure you use Filter.in not Filter.equal, then you will check against Set of the SKUs and fetch data for the given batch
            )
    ).inject([:]) {
        result, cost ->
            result [cost.sku] = cost.attribute2 ?: 0.0
            result
    }     
	// then you need to save the result in the cache for all of the items, make sure that the cached values have some value, so even 
	// if we have cost data for 180 out of 200 items, we need to ensure that all 200 items will have some data, so we won’t fetch the cost for the same batch again
    batch.each {
        api.global.batch[it] = [
                "cost"    : cost[it] ?: 0.0,
                "tx”     : false // will be assigned in 2nd example.        
		]
    }
}

...

The given examples were for api.find and Datamart queries, however you can use them in any other strategy, including pre-calculations of data, fetching a logic for a given SKU, playing with the 2nd key etc.

Warning

  • When you work on a small amount of data, api.getBatchInfo() returns NULL. Using a shared library will solve this issue and it will always take at least one item. (In the pricefx-server settings, you can set when a batch is going to be created. I tested on 2000+ items and there were no issues, but less could return a value of NULL.)

  • api.find() has a limited number of rows that you can fetch (you can use api.getMaxFindResultsLimit() to get the limit for the given environment and partition). Make sure that the data is not cut in the middle of the query or handle it with startRow/maxRows. As an alternative, you can use api.stream to cache the data and then work on it, especially while playing with the 2nd key.

  • Datamart queries have limitations for the data that can be loaded (it is not exposed in the UI and it is cluster wide as pricefx-server settings). As an alternative, use streams.

  • 2nd and 3rd... passes will create new batches, so the data will be loaded and processed again.

  • Make sure that you set the element timeout for a sufficient period of time, as in the logic you will be fetching data for up to 200 SKUs and the query result time will be higher than for a single SKU.

  • Find and others functions in PFX have their own limitations in terms of maximum rows that will be represented to the user. Make sure your logic and data do not exceed those limits; if they do, use streams or ask Support to extend the limits. But a good implementation of the code and good configuration (limitations, requirements, business keys of PX, filters on the different context) will almost always solve the issue.

  • Once there was the following issue: Allowed Distributed Calculation NODE1 was fetching values correctly from PA, however NODE3 was not. The issue was resolved later on, but it was hard to troubleshoot, so it is good to have a few log outputs for such cases.

  • If you need to store the data for all nodes in the calculation flow (in Distributed Mode), do not use api.global as it will be different for all nodes and threads; use a shared cache instead.