Performance Practices in Analytics

Data Volumes and Growth

Data Sources and Datamarts usually contain huge volume of data, so it is essential to ensure that the data will not grow forever. It is recommended to set up truncating Data Loads, especially for Data Sources that have any of the key fields of the type “Date”. A typical setup is to delete records older than 2 years.

Calculation Data Loads

Data Loads can calculate data in two ways depending on the element context which is either header level using the “Init” context or enrichment using the “Row” context of the element.

Upsert Data Loads (Header Level)

Header level Data Loads typically have one main query that populates the data in the target rowset. Therefore the performance typically depends on the performance of the query and secondly on the number of records in the target rowset. More records in the target rowset mean slower adding of new records.

Enrichment Data Loads (Row Level)

Performance of row level calculations depends on the number of records to be calculated and logic complexity. Therefore row level Data Loads are not suitable for huge amount of records (hundred-thousands or millions) and is usable only for small amount of records (tens of thousands lines). So a daily feed and enrichment of sales transactions is typically better solution than enrichment of the sales transactions for the whole month.

Throttling

Analytics module has a safety mechanism called “throttling” which helps avoid having too many query requests on the analytical database.

The limit is maximum of 600 requests per minute on PROD clusters and 60 requests per minute on QA clusters for a given job. For every occurrence where a logic goes beyond this limit, the server will make it wait for 1 second.

This can have a very important impact on the overall performance of the logic.

The limits are configurable on the server side and they apply to these methods:

When querying Data Sources or Datamarts from line item logics, the solution is to query only the records relevant to the current batch of SKUs being processed, instead of performing a query for a single SKU at a time.

For more details see Performance Improvements (Batching, api.getBatchInfo).

Messages found in the log file related to throttling:

  • Info: Current PA query executions count={}, 1min-rate={} – only info message

  • Warning: THROTTLING PA query executions as current count={} exceeds the throttle threshold of {} and the 1min-rate is hitting the max of {}/min – message which comes up if throttling was applied

Found an issue in documentation? Write to us.