Performance

When to Use api.find and api.stream

The rule is that whenever you have a certain limited number of records to retrieve, you should always use api.find with the limits set as the parameter.

Using api.find with either a hard-coded limit such as api.find(”P”, 0, 200, ...) or with api.find(”P”, 0, api.getMaxFindResultsLimist(), ...) is a bad practice because you are expecting just a certain number of rows being present in the table. The exception is if you really want to load only a certain number of rows or you are using api.find in a loop in certain cases.

When you do not know the number of records, you have two options: either use api.stream or use api.find in a loop. The preferred way is in most cases api.stream.


Example api.stream
def iter = api.stream("P", "sku", ["sku", "attribute1"], *filters)
while (iter.hasNext()) {
  def row = iter.next()
  // process the row...
  // if a performance intensive work is done here, 
  // such as another access to the DB or a datamart query
  // then use api.find instead
}
iter.close()


Example api.find in a loop
def start = 0
def data = null
while (data = api.find("P", start, api.getMaxFindResultsLimit(), 
                        "sku", ["sku", "attribute1"], *filters)) {
  start += data.size()
  for (row in data) {
    // process the row
  }
}

The preferred way for loading undefined amount of data from the database is api.stream with these exceptions: 

  • If the code within the loop takes significant time, then you should use api.find instead. The reason is that api.stream maintains an open connection to the database during the processing and this can have a negative impact on the performance, whereas api.find fetches the data at once and no connection is maintained.
  • The input generation (syntax check) mode is enabled. 

(info) See also Data Querying using api.find() and api.stream() and General Queries (Quick Reference).

Beware of Groovy Closures Performance

It is a fact that using the Groovy closures have overhead and you should be very careful when iterating over a big amount of data. To demonstrate this here is a simple logic which just sums up numbers in a list.

collect + sum
(1..n).collect { it }.sum()
while
long sum = 0
long i = 1
while (i <= n) {
  sum += i
  ++i
}
return sum
each
long sum = 0
(1..n).each { sum += it }
return sum
for
long sum = 0
  for (long i = 1; i <= n; ++i) {
    sum += i
  }
  return sum

Here are the measured results for a list of size n. The duration is in milliseconds.

Duration for list of size n [ms]

1 000 x

10 000 x

100 000 x

1 000 000 x

10 000 000 x

collect + sum

12

130

904

9 014

90 242

each

12

93

881

8 747

88 896

while

2

12

111

708

6 925

for

1

11

110

705

6 820

Here is a different example with a slightly more complex logic: https://dzone.com/articles/loops-performance-in-groovy

It is clear that for small lists the overhead does not play a significant role in the total calculation time but for larger fields it is much better to stick to the classic while-loop or for-loop.

Found an issue in documentation? Write to us.