Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • DMs now store all fields in their own tables, and the normalization setting is dropped.

  • The Refresh DL picks up new and updated rows from the constituent data sources, but this 'refresh data' is not yet visible to queries until the new DM Publish DL is run.

  • The published data table is designed to be read-only, taking advantage of column-store performance benefits, while the source data continues to be stored in regular row-oriented tables.

  • The impact on the Groovy and REST/JSON APIs is minimal, with a new method added to the DatamartContext interface to allow accessing the refresh data for enrichment jobs.

  • After upgrading to 13.0, the Publish DL is not automatically run or scheduled, and queries will continue to see the unpublished data until the Publish DL is first run.

Data Mart Issue Example

...

The typical schema for a Data Mart (DM) follows a snowflake design. In the example provided, the DM sources fields from the Invoice, Product, Product Hierarchy, and Customer Data Sources (DSs). The Product Hierarchy levels are indirectly linked to a transaction via the Product Group field.

...

  • When the Data Mart (DM) contains fields that are calculated by a PA Calculation Data Load (DL), there exists a time window in which newly loaded rows have not yet been enriched by the calculation(s). During this window, the DM can be considered to be in an inconsistent state.

Data Mart Issue Resolution

To address these issues, Rampur 13.0 takes a slightly different approach:

...

With the new approach in Rampur 13.0, a specific cutover point can be chosen to make the new data available to DM clients. Additionally, PA calculations can enrich the refreshed data without impacting the results of DM queries from these clients.

...

Data Mart Performance

With Citus, the self-contained published data table is created as a columnar table, with compression, providing performance benefits for analytical workloads. The source data continues to be stored in regular row-oriented tables, which are far more suited for (regular) updates.

Transition from Pre-Rampur 13.0 Approach

When upgrading to version 13.0, the DM Publish Data Load (DL) is not automatically run, or scheduled. There is no migration approach that can predict the desired behavior for each individual customer.

...

From that point forward, a DM query may return a result that deviates from the loaded data, if the data has not yet been published.

Data Mart Enrichment

Enriching a Data Mart (DM) means populating placeholder fields, persisted fields defined in the DM but not sourced from any DS, by means of a PA Calculation DL. These types of fields are used when they cannot be (easily) calculated using a forward expression (ie. NetProfitMargin = 1 - InvoicePrice/Cost).

...

Panel
panelIconIdatlassian-cross_mark
panelIcon:cross_mark:
bgColor#FFBDAD

NOTE: For these reason, Rampur 13.0 release will not support this scenario.

Data Mart Loading

Another unfortunate design could be that the rows in a Data Mart (DM) are generated by a PA Calculation job. Again, this would be ill-advised since a Data Mart (DM) is to be populated by its DM Refresh DL.

...

Note

This is not an approach we want to support in Rampur 13.0 and beyond. The correct approach will be to populate the DM’s main DS instead.

Data Mart Publishing and Groovy API

The impact on the Pricefx Groovy API is minimal since it is assumed that all clients of the Data Mart (DM) intend to use the published data only. However, there is one exception in regard to PA Calculation/Enrichment job, since these need to have access to the newly loaded ('refresh') data.

...

Code Block
languagejava
/**
  * Gets a table object representing a Datamart with the given name, using either it's refresh or published data,
  * depending on the value of the useRefreshData argument.
  * This method is intended to be used in jobs that enrich/transform DM data, requiring access to data that
  * has been loaded (through the DM's Refresh DL), but has not yet been published (by the DL's Publish DL).
  * A reference to this table can be used when building a {@link Query} on that Datamart.
  * @param name sourceName, uniqueName or label of the DM.
  * @param useRefreshData
  * @return Table representing the DM in DataContext.
  */
Table getDatamart(final String name, Boolean useRefreshData);

Data Mart Publishing and REST/JSON API

While do not expect external clients to want to access not yet published Data Mart (DM) data, there is of course one exception: a data manager user will want to see the refresh data, after it’s loaded and before it is (optionally) fully enriched and published.

...

Code Block
pricefx/mb1/datamart.fetch/107.DM?refreshData=true

Data Mart Publishing and Rampur Related Upgrades

It is not mandatory to run or schedule the Publish DL after upgrading to Rampur 13.0. However, as mentioned previously, as long as the Publish DL is not executed, the system-generated Publish DL will remain in the DRAFT state, and Data Mart (DM) queries will continue to find the unpublished data, now also referred to as the refreshed or staging data, as they did before.

...