When to Use api.find and api.stream

The rule is that whenever you have a certain limited number of records to retrieve, you should always use api.find with the limits set as the parameter.

Using api.find with either a hard-coded limit such as api.find(”P”, 0, 200, ...) or with api.find(”P”, 0, api.getMaxFindResultsLimist(), ...) is a bad practice because you are expecting just a certain number of rows being present in the table. The exception is if you really want to load only a certain number of rows or you are using api.find in a loop in certain cases.

When you do not know the number of records, you have two options: either use api.stream or use api.find in a loop. The preferred way is in most cases api.stream.

Code Block

title	Example api.stream

def iter = api.stream("P", "sku", ["sku", "attribute1"], *filters)
while (iter.hasNext()) {
  def row = iter.next()
  // process the row...
  // if a performance intensive work is done here, 
  // such as another access to the DB or a datamart query
  // then use api.find instead
}
iter.close()

Code Block

title	Example api.find in a loop

def start = 0
def data = null
while (data = api.find("P", start, api.getMaxFindResultsLimit(), 
                        "sku", ["sku", "attribute1"], *filters)) {
  start += data.size()
  for (row in data) {
    // process the row
  }
}

The preferred way for loading undefined amount of data from the database is api.stream with these exceptions:

If the code within the loop takes significant time, then you should use api.find instead. The reason is that api.stream maintains an open connection to the database during the processing and this can have a negative impact on the performance, whereas api.find fetches the data at once and no connection is maintained.
The input generation (syntax check) mode is enabled.

Beware of Groovy Closures Performance

It is a fact that using the Groovy closures have overhead and you should be very careful when iterating over a big amount of data. To demonstrate this here is a simple logic which just sums up numbers in a list.

Code Block

title	collect + sum

(1..n).collect { it }.sum()

Code Block

title	while

long sum = 0
long i = 1
while (i <= n) {
  sum += i
  ++i
}
return sum

Code Block

title	each

long sum = 0
(1..n).each { sum += it }
return sum

Code Block

title	for

long sum = 0
  for (long i = 1; i <= n; ++i) {
    sum += i
  }
  return sum

Here are the measured results for a list of size n. The duration is in milliseconds.

Duration for list of size n [ms]	1 000 x	10 000 x	100 000 x	1 000 000 x	10 000 000 x
collect + sum	12	130	904	9 014	90 242
each	12	93	881	8 747	88 896
while	2	12	111	708	6 925
for	1	11	110	705	6 820

Here is a different example with a slightly more complex logic: https://dzone.com/articles/loops-performance-in-groovy

Note
It is clear that for small lists the overhead does not play a significant role in the total calculation time but for larger fields it is much better to stick to the classic while-loop or for-loop.

Versions Compared

Old Version 2

New Version Current

Key

When to Use api.find and api.stream

Beware of Groovy Closures Performance

Page Comparison

Versions Compared

Old Version 2

New Version Current

Key

When to Use api.find and api.stream

Beware of Groovy Closures Performance