Performance
When to Use api.find and api.stream
The rule is that whenever you have a certain limited number of records to retrieve, you should always use api.find
with the limits set as the parameter.
Using api.find
with either a hard-coded limit such as api.find(”P”, 0, 200, ...)
or with api.find(”P”, 0, api.getMaxFindResultsLimist(), ...)
is a bad practice because you are expecting just a certain number of rows being present in the table. The exception is if you really want to load only a certain number of rows or you are using api.find
in a loop in certain cases.
When you do not know the number of records, you have two options: either use api.stream
or use api.find
in a loop. The preferred way is in most cases api.stream
.
def iter = api.stream("P", "sku", ["sku", "attribute1"], *filters) while (iter.hasNext()) { def row = iter.next() // process the row... // if a performance intensive work is done here, // such as another access to the DB or a datamart query // then use api.find instead } iter.close()
def start = 0 def data = null while (data = api.find("P", start, api.getMaxFindResultsLimit(), "sku", ["sku", "attribute1"], *filters)) { start += data.size() for (row in data) { // process the row } }
The preferred way for loading undefined amount of data from the database is api.stream
with these exceptions:
- If the code within the loop takes significant time, then you should use
api.find
instead. The reason is thatapi.stream
maintains an open connection to the database during the processing and this can have a negative impact on the performance, whereasapi.find
fetches the data at once and no connection is maintained. - The input generation (syntax check) mode is enabled.
See also Data Querying using api.find() and api.stream() and General Queries (Quick Reference).
Beware of Groovy Closures Performance
It is a fact that using the Groovy closures have overhead and you should be very careful when iterating over a big amount of data. To demonstrate this here is a simple logic which just sums up numbers in a list.
(1..n).collect { it }.sum()
long sum = 0 long i = 1 while (i <= n) { sum += i ++i } return sum
long sum = 0 (1..n).each { sum += it } return sum
long sum = 0 for (long i = 1; i <= n; ++i) { sum += i } return sum
Here are the measured results for a list of size n. The duration is in milliseconds.
Duration for list of size n [ms] | 1 000 x | 10 000 x | 100 000 x | 1 000 000 x | 10 000 000 x |
---|---|---|---|---|---|
collect + sum | 12 | 130 | 904 | 9 014 | 90 242 |
each | 12 | 93 | 881 | 8 747 | 88 896 |
while | 2 | 12 | 111 | 708 | 6 925 |
for | 1 | 11 | 110 | 705 | 6 820 |
Here is a different example with a slightly more complex logic: https://dzone.com/articles/loops-performance-in-groovy
It is clear that for small lists the overhead does not play a significant role in the total calculation time but for larger fields it is much better to stick to the classic while-loop or for-loop.
Found an issue in documentation? Write to us.