Datamarts

Datamart (DM) is a data table, with data coming generally from one or more Data Sources (it is possible to make a Datamart without a Data Source). This data set can then be enriched:

  • with data from other joined data sources (through related matching columns). Datamart is then a combination of joined data sources

  • or with data from fields calculated on the fly (not stored in DB)

  • or with columns, whose values are calculated by a logic (via a Calculation Data Load; these are stored in DB)

  • The data in a Datamart is not considered source data. In a well configured system it should always be possible to reload all the rows in a Datamart from Data Sources (in reasonable time). This includes any calculation (enrichment) jobs, in addition to the Datamart Refresh Data Load.

  • Datamart is a physical table but it does not store a copy of all attribute values (for the default Normalization settings with value High). Only keys and calculated columns are stored, the rest of the data is joined dynamically when a query is run.

  • An expression cannot be used as a key for joining tables; for details see this article.

Click on a Datamart's label to display its details and the fields it contains.

The available options are:

  • Label – Allows you to set a Datamart label.

  • Source – Allows you to choose a Data Source from which the data will be pulled.

  • Normalization – You can decide whether you prefer quicker Datamart queries or refreshes. This option allows you to copy more fields to the DM table to speed up queries (SELECTs) at the expense of DM refresh performance or vice versa. When you reduce the normalization level (e.g., from high to medium), refresh the Datamart.

    • None – All primary and secondary Data Source fields are copied. This means fast queries but (very) slow refreshes.

    • Low – All primary Data Source fields and all numeric and dimension secondary DS fields are copied to the DM. This may result in quicker queries but (potentially much) slower DM refreshes, but it highly depends on the specifics of the schema. For example, where a secondary DS has a large number of rows, this option will likely speed up query time. But if all secondary Data Sources are small, it may even slow down queries. In other words, some experimentation may be needed to properly evaluate the benefits. This is the default option for solutions with Greenplum database.

    • Medium – All primary DS fields are copied to the DM. This way we avoid the join, at query time, of the DM and primary DS tables, which typical have the same (large) number of rows. This option provides the best balance between query and load performance. This is the default option for solutions with PostgreSQL database.

    • High – All DM fields sourced from Data Sources are joined together at query time. This yields the fastest DM refresh times, but in larger DMs (depending on the DB sizing, from a few million to over 10 million rows), query times will start to suffer.

    • View – Query time joins on the primary and secondary DS fields; the query accesses only DS tables. Refresh does not do anything in this normalization level.

  • Base Currency – Allows you to set the base currency for this Datamart. (This currency is then shown as the first one in currency drop-down menus across the application.) It is also used as the default currency of calculated persisted fields (also called placeholders) of the Money type when they are converted to the query currency at query time.

  • Base UOM – Allows you to set the base unit of measure for this Datamart.

The options on the Data tab are the same as in Datamart Viewer. By default, the table displays up to 12 columns, with the following priority: fields that have a function (key, alt-key, per unit basis, etc.), fields that are dimensions and then all other fields up to the limit. If you want to see more fields, add them using the Select Fields to Display option or create and save a view preference. Admins can also increase the default limit using the maxPATablesFetchLimit advanced configuration option.

You can add new fields to the Datamart. You can either import the structure definition in JSON format or you can do it manually from a data sample or add individual fields.

There is no Datamart size limit – it can have billions of rows. There are only technical limits, i.e., storage and data processing times. In practice, importing and processing 1 billion rows is already quite challenging.

 It is not possible to modify field names (of fields coming from a Data Source) in a Datamart. The change has to be done at the Data Source level.
You can, however, add a translation for the field's label in the Translation Editor. The translated label will be displayed in the corresponding language version of the user interface, e.g., in data tables, filters, etc. In the Locale column of the editor, use the language code that you see in the locale URL parameter, e.g., 'fr' for French.

Calculated Fields (On the Fly)

Calculated fields are added directly in the Datamart definition, in the Fields tab (as opposed to fields sourced from Data Sources). In the expression, you can combine the existing Datamart fields with the simple arithmetic operators (+, -, *, /) and make use of the functions mentioned in the table below. 

Calculated fields are evaluated at the query time, i.e., are not persisted in the Analytics DB. For this reason, no reloading or refreshing of Datamart data is required after adding or modifying calculated field expressions (Datamart deployment is still required).

 Since Godfather 8.1, there is an internal change related to calculated fields during a Datamart Refresh: they are cleared to NULL instead of being persisted. For details see the Upgrade to Godfather 8.x Troubleshooting section.

Mainly for null-handling or defaulting, a small set of functions can be used as well:

Datamart Calculated Field Function

Comment

SQL Equivalent

Datamart Calculated Field Function

Comment

SQL Equivalent

TOSTRING(arg)



CAST(arg AS VARCHAR)

TODATE(arg)

 Pricefx implementation only supports the CAST from a DateTime field to a Date. In other words, you can only convert a date with a timestamp to a date without a timestamp. 

CAST(arg AS DATE)

ISNULL(arg)



(arg IS NULL)

IF(condition, valueIfTrue, valueIfFalse)

Examples:

  • IF(SomeDiscount>0,100,0)

  • IF(SomeDiscount=0,0,100)

  • IF(Country="Germany",100,0)

  • IF(Country!="Germany",0,100)

  • IF(Country!="Germany","A","B") (only text fields will accept that the expression returns Strings)

CASE condition WHEN TRUE THEN valueIfTrue ELSE valueIfFalse END

IFNULL(arg1,arg2)

IFNULL(arg1, arg2) is equivalent to IF(ISNULL(arg1), arg2, arg1), i.e. if arg1 is NULL, arg2 is returned, otherwise it returns arg1.

COALESCE(arg1,arg2)

NULLIF(arg1,arg2)

The NULLIF(arg1, arg2) function compares arg1 and arg2 and returns NULL if they are equal. Otherwise, arg1 is returned.

NULLIF(arg1,arg2)

SQLn

An SQL query with formula: “sqln(“sql expression with n inputs”, field1, …, field n). See also Datamart Query Expressions.



Some more functions are also in Datamart Query Expressions (only the first table General Functions applies to calculated fields – on row level values, there are no aggregations, hence no statistical functions apply).

More complex use cases can be handled with the help of some "data enrichment" task – which is typically Calculation Data Load (then the values will be stored in the Datamart).

Per Unit Basis & Percent Basis

Note the difference in behavior between Datamart and Data Source. In Datamart, all money columns that are defined as extended are already multiplied by Quantity. The purpose of 'auto-extending' in a Datamart query is that you do not need to care whether a money amount in the source DS is per-unit or not.

Comments:

  •  Cost

    • In DS – Is marked as a 'per unit' measure. This means that the amount of money (number) coming from Data Feed is meant as a 'per unit' amount.

    • In DM – Is already multiplied by Quantity (so by default it shows the total amount).

      • If the user wants to see the 'per unit' cost (e.g. in a chart), the system will divide the cost (stored in Datamart) by the Quantity column and return it.

      • If the user wants to see the cost as a percentage, the system will divide the Cost by the Invoice Price column (because it is marked as the 'percent' basis).

  • Invoice Price

    • In DS – Is marked as an 'extended' measure. This means that the amount of money (number) coming from Data Feed is meant as the total amount (for all quantity).

Found an issue in documentation? Write to us.

 
Pricefx version 12.0