When to Use api.find and api.stream

The rule is that whenever you have a certain limited number of records to retrieve, you should always use api.find with the limits set as the parameter.

Using api.find with either a hard-coded limit such as api.find(”P”, 0, 200, ...) or with api.find(”P”, 0, api.getMaxFindResultsLimist(), ...) is a bad practice because you are expecting just a certain number of rows being present in the table. The exception is if you really want to load only a certain number of rows or you are using api.find in a loop in certain cases.

When you do not know the number of records, you have two options: either use api.stream or use api.find in a loop. The preferred way is in most cases api.stream.

Example api.stream

def iter = api.stream("P", "sku", ["sku", "attribute1"], *filters)
while (iter.hasNext()) {
  def row = iter.next()
  // process the row...
  // if a performance intensive work is done here, 
  // such as another access to the DB or a datamart query
  // then use api.find instead
}
iter.close()

Example api.find in a loop

def start = 0
def data = null
while (data = api.find("P", start, api.getMaxFindResultsLimit(), 
                        "sku", ["sku", "attribute1"], *filters)) {
  start += data.size()
  for (row in data) {
    // process the row
  }
}

The preferred way for loading undefined amount of data from the database is api.stream with these exceptions:

If the code within the loop takes significant time, then you should use api.find instead. The reason is that api.stream maintains an open connection to the database during the processing and this can have a negative impact on the performance, whereas api.find fetches the data at once and no connection is maintained.
The input generation (syntax check) mode is enabled.

Beware of Groovy Closures Performance

It is a fact that using the Groovy closures have overhead and you should be very careful when iterating over a big amount of data. To demonstrate this here is a simple logic which just sums up numbers in a list.

collect + sum

(1..n).collect { it }.sum()

while

long sum = 0
long i = 1
while (i <= n) {
  sum += i
  ++i
}
return sum

each

long sum = 0
(1..n).each { sum += it }
return sum

for

long sum = 0
  for (long i = 1; i <= n; ++i) {
    sum += i
  }
  return sum

Here are the measured results for a list of size n. The duration is in milliseconds.

Duration for list of size n [ms]	1 000 x	10 000 x	100 000 x	1 000 000 x	10 000 000 x
collect + sum	12	130	904	9 014	90 242
each	12	93	881	8 747	88 896
while	2	12	111	708	6 925
for	1	11	110	705	6 820

Here is a different example with a slightly more complex logic: https://dzone.com/articles/loops-performance-in-groovy

It is clear that for small lists the overhead does not play a significant role in the total calculation time but for larger fields it is much better to stick to the classic while-loop or for-loop.