Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Data Volumes and Growth

The data sources and datamarts Data Sources and Datamarts usually contain huge volume of data, so it is essential to ensure that the data will not grow forever. So It is recommended to set up truncating data loadsData Loads, specially especially for data sources Data Sources that have any of the key field fields of a the type “Date”, is a crutial to have. Typical set up . A typical setup is to to delete records older than 2 years.

Calculation

...

Data Loads

Data loads Loads can calculate data in two ways depending on the element context : which is either header level using the “Init” context or enrichment using the “Row” context of the element.

Upsert

...

Data Loads (Header Level)

Header level data loads Data Loads typically have one main query that populates the data in the target rowset. Therefore the performance typically depends on the performance of the query and secondly by on the number of records in the target rowset. More records in the target rowset means more mean slower adding of new records.

Enrichment

...

Data Loads (Row Level)

Performance of row level calculations depends on the number of records to be calculated and logic complexity. Therefore row level data loads Data Loads are not suitable for huge amount of records (hundred-thousands or millions) and is usable only for small amount of records (ten tens of thousands lines). So a daily feed and enrichment of sales transactions is typically better solution than enrichment of the sales transations transactions for the whole month.

Throttling

Analytics module has a safety mechanism called “throttling” which avoids to have helps avoid having too many query requests on the analytical database.

...

The limits are configurable on the server side and they apply to these methods:

When quering datasources or data marts querying Data Sources or Datamarts from line item logics, the solution is to query only the records relevant to the current batch of SKUs being processed , instead of performing a query for a single SKU at a time.

For more details see Performance Improvements (Batching, api.getBatchInfo).

Messages found in the Log log file related to throttling:

  • Info: Current PA query executions count={}, 1min-rate={} - only info message

  • Warning: THROTTLING PA query executions as current count={} exceeds the throttle threshold of {} and the 1min-rate is hitting the max of {}/min - message which comes up , if throttling was applied