Datamarts

Datamart (DM) is a data table, with data coming generally from one or more Data Sources (it is possible to make a Datamart without a Data Source). This data set can then be enriched:

  • with data from other joined data sources (through related matching columns). Datamart is then a combination of joined data sources

  • or with data from fields calculated on the fly (not stored in DB)

  • or with columns, whose values are calculated by a logic (via a Calculation Data Load; these are stored in DB)

A Datamart can have two states: staging and published.

  • Staging – At this state, the Datamart has picked up all new and updated rows from all the Data Sources (Refresh Data Load has been run). This data can then be enriched with calculated fields (Calculation Data Load). This data is not yet accessible by Datamart queries.

  • Published – Only after the Datamart is published (Publishing Data Load is run), the data is exposed to queries from Dashboards, Pricing Logic, etc. This allows you to choose a specific cutover point when you make the new data available to Datamart clients. This allows Analytics calculations to enrich the data without impacting the results of Datamart queries from these clients.

Distributed Datamart

A distributed datamart is an advanced database architecture that optimizes data storage and retrieval by distributing data across multiple tables. This setup is particularly beneficial for handling large datasets and improving query performance in application clusters.

Implementation with Citus and Postgres

In application clusters where Citus is installed in the PostgreSQL Analytics database, distributed tables can be created to hold Data Sources (DS) and Datamarts (DM). A distributed table in Citus requires a single column to be designated as the distribution key.

Distribution Key Requirements

  • Key Field Designation – The distribution key should be a key field of the DS/DM. Typically, either the product ID or customer ID field is chosen as the distribution key.

  • Consistency Across DS and DM – To maximize performance, the field used as the distribution key in the DM must also be defined as the distribution key in the DS from which it is sourced.

  • Recommendation for Product DS – Unless the Product DS is very small, it is recommended to define the productId key field as a distribution key. This ensures optimal performance during data loading and querying.

Performance Optimization

By carefully selecting and consistently applying the distribution key across DS and DM, significant performance improvements can be achieved. This schema ensures efficient data distribution, minimizing query times and enhancing overall system responsiveness.

Datamart Overview and Options

Click on a Datamart's label to display its details and the fields it contains.

The available options are:

  • Label – Allows you to set a Datamart label.

  • Source – Allows you to choose a Data Source from which the data will be pulled.

  • Base Currency – Allows you to set the base currency for this Datamart. (This currency is then shown as the first one in currency drop-down menus across the application.) It is also used as the default currency of calculated persisted fields (also called placeholders) of the Money type when they are converted to the query currency at query time.

  • Base UOM – Allows you to set the base unit of measure for this Datamart.

The options on the Data tab are the same as in Datamart Viewer. By default, the table displays up to 12 columns, with the following priority: fields that have a function (key, alt-key, per unit basis, etc.), fields that are dimensions and then all other fields up to the limit. If you want to see more fields, add them using the Select Fields to Display option or create and save a view preference. Admins can also increase the default limit using the maxPATablesFetchLimit advanced configuration option.

You can add new fields to the Datamart. You can either import the structure definition in JSON format or you can do it manually from a data sample or add individual fields.

There is no Datamart size limit – it can have billions of rows. There are only technical limits, i.e., storage and data processing times. In practice, importing and processing 1 billion rows is already quite challenging.

 It is not possible to modify field names (of fields coming from a Data Source) in a Datamart. The change has to be done at the Data Source level.
You can, however, add a translation for the field's label in the Translation Editor. The translated label will be displayed in the corresponding language version of the user interface, e.g., in data tables, filters, etc. In the Locale column of the editor, use the language code that you see in the locale URL parameter, e.g., 'fr' for French.

Calculated Fields (On the Fly)

Calculated fields are added directly in the Datamart definition, in the Fields tab (as opposed to fields sourced from Data Sources). In the expression, you can combine the existing Datamart fields with the simple arithmetic operators (+, -, *, /) and make use of the functions mentioned in the table below. 

Calculated fields are evaluated at the query time, i.e., are not persisted in the Analytics DB. For this reason, no reloading or refreshing of Datamart data is required after adding or modifying calculated field expressions (Datamart deployment is still required).

 Since Godfather 8.1, there is an internal change related to calculated fields during a Datamart Refresh: they are cleared to NULL instead of being persisted. For details see the Upgrade to Godfather 8.x Troubleshooting section.

Mainly for null-handling or defaulting, a small set of functions can be used as well:

Datamart Calculated Field Function

Comment

SQL Equivalent

Datamart Calculated Field Function

Comment

SQL Equivalent

TOSTRING(arg)



CAST(arg AS VARCHAR)

TODATE(arg)

 Pricefx implementation only supports the CAST from a DateTime field to a Date. In other words, you can only convert a date with a timestamp to a date without a timestamp. 

CAST(arg AS DATE)

ISNULL(arg)



(arg IS NULL)

IF(condition, valueIfTrue, valueIfFalse)

Examples:

  • IF(SomeDiscount>0,100,0)

  • IF(SomeDiscount=0,0,100)

  • IF(Country="Germany",100,0)

  • IF(Country!="Germany",0,100)

  • IF(Country!="Germany","A","B") (only text fields will accept that the expression returns Strings)

CASE condition WHEN TRUE THEN valueIfTrue ELSE valueIfFalse END

IFNULL(arg1,arg2)

IFNULL(arg1, arg2) is equivalent to IF(ISNULL(arg1), arg2, arg1), i.e. if arg1 is NULL, arg2 is returned, otherwise it returns arg1.

COALESCE(arg1,arg2)

NULLIF(arg1,arg2)

The NULLIF(arg1, arg2) function compares arg1 and arg2 and returns NULL if they are equal. Otherwise, arg1 is returned.

NULLIF(arg1,arg2)

SQLn

An SQL query with formula: “sqln(“sql expression with n inputs”, field1, …, field n). See also Datamart Query Expressions.



Some more functions are also in Datamart Query Expressions (only the first table General Functions applies to calculated fields – on row level values, there are no aggregations, hence no statistical functions apply).

More complex use cases can be handled with the help of some "data enrichment" task – which is typically Calculation Data Load (then the values will be stored in the Datamart).

Per Unit Basis & Percent Basis

Note the difference in behavior between Datamart and Data Source. In Datamart, all money columns that are defined as extended are already multiplied by Quantity. The purpose of 'auto-extending' in a Datamart query is that you do not need to care whether a money amount in the source DS is per-unit or not.

Comments:

  •  Cost

    • In DS – Is marked as a 'per unit' measure. This means that the amount of money (number) coming from Data Feed is meant as a 'per unit' amount.

    • In DM – Is already multiplied by Quantity (so by default it shows the total amount).

      • If the user wants to see the 'per unit' cost (e.g. in a chart), the system will divide the cost (stored in Datamart) by the Quantity column and return it.

      • If the user wants to see the cost as a percentage, the system will divide the Cost by the Invoice Price column (because it is marked as the 'percent' basis).

  • Invoice Price

    • In DS – Is marked as an 'extended' measure. This means that the amount of money (number) coming from Data Feed is meant as the total amount (for all quantity).

Found an issue in documentation? Write to us.

 
Pricefx version 12.0