Batching

When you need to improve performance of repetitive reading of the same dataset (e.g., Cost data), you can use caching (see the other Lesson for details) - but what if:

  • The size of the dataset is too big, so you simply cannot fetch it all into memory.

  • It is not useful to pre-fetch all the data into memory because you do not know, which of them you will need (remember, the Price List logic does not know, which and how many SKUs are going to be calculated).

  • The process can run in the "distributed" mode and it does not make sense to pre-fetch all data because some of them will be used only in another thread - i.e., not used by the current thread.

In such scenarios, the batching could help because you can effectively cache only data for a portion of the calculation, which you know you will work with.

In Pricefx, this applies natively to the following tasks:

  • Price List and Matrix PL calculation.

  • Live Price Grid and Matrix LPG calculation.

  • Calculated Field Set (CFS) when used for tables Products or Product Extension.

Generally, you can use batching+cache for any kind of data, for example:

  • Query result of a DS/DM query

  • List of rows from Company Parameter

  • List of rows from Product Extension

You can "simulate" the batching in any process in Groovy code, even if not natively supported by the system (i.e., even where the api.getBatchInfo() is not returning values).

Principle of Batching

Let’s say you’re calculating a Price List with hundreds of thousands of lines and for every line (i.e., for each SKU) you need to read the Cost of the product from the Product Extension table.

The system calculates the Price List in batches of certain size. The batch size is configured on the backend server usually as 200 lines in a batch.

Batching is enabled and performed by the engine by default, you do not have to do any settings to make it happen.

As a developer, in the logic you can get the information about the current "batch" via Groovy API function api.getBatchInfo(). It returns you a list of SKUs which are in the current batch.

Sample Code

def sku = api.product("sku") /* if Batch not yet available or the SKU is not in the Current Batch, treat it as beginning of new batch */ def isNewBatch = api.global.currentBatch == null //❶ || !api.global.currentBatch.contains(sku) /* when the new batch starts, pre-load the list of SKUs from the Batch into memory */ if (isNewBatch) { api.global.currentBatch = api.getBatchInfo()?.collect { it.first() }?.unique() //❷ ?: ([sku] as Set) } /* when the new batch starts, do pre-load product costs (for all SKUs of the batch) into memory */ if (isNewBatch) { //TODO remove the logging in Production environment api.logInfo("NewBatchOfSKUs: ", api.jsonEncode(api.global.currentBatch)) def rowIterator = api.stream( "PX3", "sku", ["sku", "attribute1"], Filter.equal("name", "ProductCost"), Filter.in("sku", api.global.currentBatch) //❸ ) api.global.productCosts = rowIterator //❹ ?.collectEntries { [(it.sku): (it.attribute1 as BigDecimal)] } rowIterator.close() } return api.global.productCosts[sku] //❺

❶ The logic needs to find out, if it was executed in a new batch (i.e., the data are not yet cached) or in an existing batch (i.e., we already cached it).The variable api.global.currentBatch is used to store SKUs of the current batch.
❷ In case we’re out of SKUs in the previous batch (or if it’s the first batch), we will read the list of SKUs of the current batch.
❸ Pre-fetch only the data of the SKUs in the batch.
❹ Store the Cost data into cache.
❺ Return the data from cache.

Monitoring

When you recalculate a Price List or Live Price Grid, you can easily see that the system uses batching - simply review the Log file and you will find something like this:

AbstractPriceGridCalculationTask - Skip Auto-Approve dirty items: false AbstractCalculableObjectItemCalculationTask - Processing matrix formula for batch of 52 skus DefaultFormulaEngine - Retain Global Always ON: true PriceGridCalculationTask - Found 2444 items to process PriceGridCalculationTask - Starting to calculate PG 365 LOCALLY ONLY BackgroundCalculationTask - Job status batch size is: 200 #❶ AbstractCalculableObjectItemCalculationTask - Preloaded request batch items in 21 ms DefaultFormulaEngine - Retain Global Always ON: true #❷ AbstractProducer - StreamingSearchExecutor[PX3-RmsqjQKdyS].iterator iterator end of queue AbstractProducer - StreamingSearchExecutor[PX3-RmsqjQKdyS].iterator iterator.close PriceGridCalculationTask - Processed: 200 items....2244 remaining #❸ AbstractCalculableObjectItemCalculationTask - Preloaded request batch items in 21 ms #❷ AbstractProducer - StreamingSearchExecutor[PX3-LQ9hpHECN8].iterator iterator end of queue AbstractProducer - StreamingSearchExecutor[PX3-LQ9hpHECN8].iterator iterator.close PriceGridCalculationTask - Processed: 400 items....2044 remaining #❸ AbstractCalculableObjectItemCalculationTask - Preloaded request batch items in 21 ms #❷ AbstractProducer - StreamingSearchExecutor[PX3-Kihz2flYxH].iterator iterator end of queue AbstractProducer - StreamingSearchExecutor[PX3-Kihz2flYxH].iterator iterator.close PriceGridCalculationTask - Processed: 600 items....1844 remaining #❸ AbstractCalculableObjectItemCalculationTask - Preloaded request batch items in 22 ms

❶ This is the size of the batch used for calculation.
❷ We made an empty line here to show you, at which point of time your "line item" logic is executed.
❸ Status how many have been processed and how many still remain.

From the log you can see that the system is using the batching by default anyway, so it’s up to you if you use that information for clever caching or not.

Performance Log/Trace

When you’re testing the effect of batching, always use the Performance Log/Trace. It shows you how long it took to execute the elements of your logic, so you can compare performance "before" and "after".

Always test the performance on more line items - ideally on the same amount that will be used on production system - as it will give you the best idea how much time you can save by using batching & caching.

To find the Performance log of your finished calculation job:

  1. Navigate to Administration  Logs  Jobs&Tasks.

  2. Find the process you want to review and click on the "eye" symbol.

  3. Review the time performance of the critical elements. See the sample of Performance Log/Trace:

    __TRACE__--------------------------------------------------------- Duration (ms) Count Execution element --------------------------------------------------------- 5116.20 100% ██████████ 1 CalcPG-sce-solutions-365-2978868-2EfdK ... 5022.99 98% █████████▓ 1 ├── calculateLocally ... 3807.97 74% ███████░░░ 2444 │ ├── PG_Batching_Line 1899.58 37% ███▓░░░░░░ 2444 │ │ ├── POS.flush ... 104.55 2% ░░░░░░░░░░ 2444 │ │ ├── ProductCost 0.45 0% ░░░░░░░░░░ 2444 │ │ │ ├── productLookup 0.71 0% ░░░░░░░░░░ 1 │ │ │ ├── search.ApplicationProperties 3.32 0% ░░░░░░░░░░ 8 │ │ │ ├── search.ProductExtensionAttributeMeta 8.21 0% ░░░░░░░░░░ 4 │ │ │ └── stream(PX3) 40.44 1% ░░░░░░░░░░ 2444 │ │ ├── BasePrice ... 34.06 1% ░░░░░░░░░░ 2444 │ │ ├── Cost 185.97 4% ░░░░░░░░░░ 2444 │ │ └── GrossMarginPct ...

Summary

Certain processes (e.g., Price List calculation, …​) are processing the line items in batches. In your "line item" logic you can read the IDs of the lines being processed and use that to implement clever caching - i.e., to pre-fetch data (from PX, DM, …​) only for the SKUs, which are in the current batch.

Such clever caching will dramatically improve:

  • The number of DB queries (instead of doing a query for each Logic execution, you will do only 1 query during a batch). Remember that if you, for example, run too many queries to Datamart, you can cause the query throttling, which could further slow down performance of your logic.

  • The speed of the logic execution - since DB queries take time, if you run only one query (even though it returns somewhat more data) per batch instead of 200, the total time difference per 200 executions of the logic will be far better.

Even though the principle and code required for handling is easy, you can also use the functions of the Shared Library to handle the batching.

References

Knowledge Base

Groovy API

Other (Information with limited access)

Found an issue in documentation? Write to us.