Data Querying using api.find() and api.stream()
Always Use a maxRows
Limit with api.find()
Why
The api.find()
function without the maxRows
parameter specified returns only the first 200 records by default. That often causes a problem because if you do not analyze how many records you can potentially get, you may miss some data from your result.
When to Use
Always using a maxRows
parameter is strictly required. The maximum value you can set this to is api.getMaxFindResultsLimit()
which by default gives 2000 records (the value can be increased for customer dedicated instances). A typical api.find()
should look like this:
def sku = api.product("sku")
def filters = [
Filter.equal("name", "Competition"),
Filter.equal("sku", sku),
]
// sort by competitor's price ASC
def competitors = api.find("PX", 0, api.getMaxFindResultsLimit(), "-attribute2", ["attribute2", "attribute5", "attribute6"], *filters)
competitors.each {
...
}
But think twice if you really need to collect so many records, since they will eat the server memory and potentially cause OutOfMemoryException
.
When Not to Use
If you still need to retrieve more data than api.getMaxFindResultsLimit()
, you will need to use api.stream
. See also “Prefer api.stream
Over Using Paging” below.
Using a maxRows limit is mandatory.
From 4.0 Manhattan, you get a warning if the number of rows returned equals 200 or formulaEngine.script.findMaxResults
and the maxRows
parameter was not specified (=0).
Prefer api.stream
Over Using Paging with api.find()
Why
When you use api.find
and need to retrieve more than api.getMaxFindResultsLimit()
records, you could be tempted to call it repeatedly with different values for startRow
. This is called “paging”.
However, this approach has many performance and determinism issues because such use of paging relies on a consistent ordering of the results. Until version 13.0 (or with the feature flag enableDefaultSortInFindWithPaging
), ensuring a correct sortBy
falls under logic's responsibility, which is error-prone and susceptible to mistakes. Check the following overview table with the Configuration Option Names and their default values per each release.
Release | Configuration Option Name | Default Value | Description |
---|---|---|---|
12.x |
|
| Sets the default sorting order when paging with the |
13 |
|
| - |
13 |
|
| Sets the default sorting order when paging fetched rows and ensures consistent ordering of the results. |
When to Use
As mentioned in “Always Use a maxRows
Limit”, if you need more than maxRows
results, then you should use api.stream
. Make sure that you always call close()
on the iterator or use withCloseable {}
!
def sku = api.product("sku")
def filters = [
Filter.equal("name", "Competition"),
Filter.equal("sku", sku),
]
// sort by competitor's price ASC
api.stream("PX", "-attribute2", ["attribute2", "attribute5", "attribute6"], *filters).withCloseable {
it.each { px ->
// do something with px
// avoid just storing it into a list, the point of using stream
// is not to store everything in memory!
}
}
When Not to Use
As mentioned in “Always Use a maxRows
Limit”, think twice before you collect too many records in memory: they can potentially cause OutOfMemoryException
.
Specify The Subset of Fields To Be Retrieved
Why
The call of api.find()
and api.stream()
, without the fields
parameter specified, retrieves all the fields of the given domain object from the database and transfers that data over the network to a given calculation node. But very often you only need only a few fields to work with and retrieving all the fields causes unnecessary data transfer.
When to Use
Whenever you don't need all of the fields, is is required to specify which you do need. For example, if you are querying e.g. a product master that has 43 fields and you use only 4 fields in the logic, the data transfer over the network would be 10 times higher than needed. In the code sample above, you can see the fields being specified by the ["attribute2", "attribute5", "attribute6"]
parameter.
When Not to Use
The only time you aren't required to specify your fields is when you are using all of them. This is still not recommended because it may form bad code habits.
Specifying the subset of fields is mandatory when applicable.
TypeCode | Number of fields |
---|---|
P | 43 |
PX20 | 28 |
PX30 | 38 |
CX30 | 38 |
PGI | 146 |
PR | 51 |
Avoid Multiple Occurrences of the Same api.find()
Why
Data lookups are among the most costly operations from a performance standpoint, and so it is irresponsible to do more than necessary. Even with many other elements, combining two api.finds() into one can double the speed of the logic.
When to Use
Any time there is two lookups to the same table in a logic, it is likely that there is a way to combine them into one, and doing so is strictly required. To easily combine these, you should call api.find()
only once in one of the very first elements of the logic and mark it as Display: None (not to have the records persisted in the database) and then refer to the results using api.getElement()
. Alternatively, you place the common api.find()
searches to the Groovy library as a separate function (e.g. like Queries.findStandardCostRecord(...)
).
When Not to Use
There are incredibly few cases where keeping separate lookups would be more efficient. Even if it means that there is some extra processing that needs to the retrieved data, it will most likely be faster than another lookup.
Avoiding multiple api.find() calls to the same table is mandatory.
Use Aggregation Functions
Why
Aggregation functions allow you to fetch less data in the lookup, and also save you some time on processing.
When to Use
When you need only the aggregated values, consider using the aggregation functions SUM, AVG, MIN, MAX in api.find()
. Instead of iterating all the records, use the following special Map construct to define the aggregation:
api.find("PX", 0, 1, null, ["attribute1": "SUM"], true, filters)
When Not to Use
It is difficult to re-use lookups when an aggregation is applied, as the underlying data isn't visible. If you find yourself creating another api.find() lookup to get data from the same table without an aggregation, it is recommended to just combine them into one api.find() and do the aggregation manually.
It is recommended to use aggregation functions when applicable, for code clarity.
Use Filter for name
Field When Querying PX or CX
Why
Because the PX or CX tables are virtual tables and are stored in one physical table (depending on a type, e.g. PX20 or CX20), it is mandatory that you specify the filter for the name
field because two virtual PX tables can be stored in different physical tables. This will instruct the engine which physical table to look at (e.g. PX10 or PX20). In the first code sample on this page, you can see the PX table name is included in the filters:
References:
Found an issue in documentation? Write to us.