Data Querying using api.find()

Always Use a maxRows Limit

Why

The api.find() function without the maxRows parameter specified returns only the first 200 records by default. That often causes a problem because if you do not analyze how many records you can potentially get, you may miss some data from your result.

When to Use

Always using a maxRows parameter is strictly required. The maximum value you can set this to is api.getMaxFindResultsLimit() which by default gives 2000 records (the value can be increased for customer dedicated instances). A typical api.find() should look like this:

def sku = api.product("sku") def filters = [     Filter.equal("name", "Competition"),     Filter.equal("sku", sku), ] // sort by competitor's price ASC def competitors = api.find("PX", 0, api.getMaxFindResultsLimit(), "-attribute2", ["attribute2", "attribute5", "attribute6"], *filters) competitors.each {     ... }

But think twice if you really need to collect so many records, since they will eat the server memory and potentially cause OutOfMemoryException.

When Not to Use

If you still need to retrieve more data than api.getMaxFindResultsLimit(), you will need to use api.stream. See also “Prefer api.stream Over Using Paging” below.

 

Using a maxRows limit is mandatory.

 From 4.0 Manhattan, you get a warning if the number of rows returned equals 200 or formulaEngine.script.findMaxResults and the maxRows parameter was not specified (=0).

Prefer api.stream Over Using Paging

Why

When you use api.find and need to retrieve more than api.getMaxFindResultsLimit() records, you could be tempted to call it repeatedly with different values for startRow. This is called “paging”.

However, this approach has many performance and determinism issues because such use of paging relies on a consistent ordering of the results. Until version 13.0 (or with the feature flag enableDefaultSortInFindWithPaging), ensuring a correct sortBy falls under logic's responsibility, which is error-prone and susceptible to mistakes.

When to Use

As mentioned in “Always Use a maxRows Limit”, if you need more than maxRows results, then you should use api.stream. Make sure that you always call close() on the iterator or use withCloseable {}!

def sku = api.product("sku") def filters = [     Filter.equal("name", "Competition"),     Filter.equal("sku", sku), ] // sort by competitor's price ASC api.stream("PX", "-attribute2", ["attribute2", "attribute5", "attribute6"], *filters).withCloseable { it.each { px -> // do something with px // avoid just storing it into a list, the point of using stream // is not to store everything in memory! } }

When Not to Use

As mentioned in “Always Use a maxRows Limit”, think twice before you collect too many records in memory: they can potentially cause OutOfMemoryException.

Specify The Subset of Fields To Be Retrieved

Why

The call of api.find(), without the fields parameter specified, retrieves all the fields of the given domain object from the database and transfers that data over the network to a given calculation node. But very often you only need only a few fields to work with and retrieving all the fields causes unnecessary data transfer.

When to Use

Whenever you don't need all of the fields, is is required to specify which you do need. For example, if you are querying e.g. a product master that has 43 fields and you use only 4 fields in the logic, the data transfer over the network would be 10 times higher than needed. In the code sample above, you can see the fields being specified by the ["attribute2", "attribute5", "attribute6"] parameter.

When Not to Use

The only time you aren't required to specify your fields is when you are using all of them. This is still not recommended because it may form bad code habits.

Specifying the subset of fields is mandatory when applicable.

 

TypeCode

Number of fields

TypeCode

Number of fields

P

43

PX20

28

PX30

38

CX30

38

PGI

146

PR

51

Avoid Multiple Occurrences of the Same api.find()

Why

Data lookups are among the most costly operations from a performance standpoint, and so it is irresponsible to do more than necessary. Even with many other elements, combining two api.finds() into one can double the speed of the logic.

When to Use

Any time there is two lookups to the same table in a logic, it is likely that there is a way to combine them into one, and doing so is strictly required. To easily combine these, you should call api.find() only once in one of the very first elements of the logic and mark it as Display: None (not to have the records persisted in the database) and then refer to the results using api.getElement(). Alternatively, you place the common api.find() searches to the Groovy library as a separate function (e.g. like Queries.findStandardCostRecord(...)).

When Not to Use

There are incredibly few cases where keeping separate lookups would be more efficient. Even if it means that there is some extra processing that needs to the retrieved data, it will most likely be faster than another lookup.

 

Avoiding multiple api.find() calls to the same table is mandatory.

Use Aggregation Functions

Why

Aggregation functions allow you to fetch less data in the lookup, and also save you some time on processing.

When to Use

When you need only the aggregated values, consider using the aggregation functions SUM, AVG, MIN, MAX in api.find(). Instead of iterating all the records, use the following special Map construct to define the aggregation:

api.find("PX", 0, 1, null, ["attribute1": "SUM"], true, filters)

When Not to Use

It is difficult to re-use lookups when an aggregation is applied, as the underlying data isn't visible. If you find yourself creating another api.find() lookup to get data from the same table without an aggregation, it is recommended to just combine them into one api.find() and do the aggregation manually.

It is recommended to use aggregation functions when applicable, for code clarity.

Use Filter for name Field When Querying PX or CX

Why

Because the PX or CX tables are virtual tables and are stored in one physical table (depending on a type, e.g. PX20 or CX20), it is mandatory that you specify the filter for the name field because two virtual PX tables can be stored in different physical tables. This will instruct the engine which physical table to look at (e.g. PX10 or PX20). In the first code sample on this page, you can see the PX table name is included in the filters:

 

References:

Found an issue in documentation? Write to us.