In version 13.0, distributed Data Sources and Datamarts have been introduced. This article helps you make the required configuration settings to switch to the distributed database.
To configure a Datamart or Data Source as distributed, follow these steps:
In the Unity user interface, navigate to Analytics > Data Manager > Data Sources or Datamarts.
Open a Data Source or Datamart and click the Import & Export button.
In the JSON definition, locate the desired distribution field and
Make sure it is defined as a key –
"key": true,
Set the
"distributionKey"
property to true.
Click Apply.
When a Datamart source, i.e., a Data Source, is distributed then the Datamart (on deployment and rebuild if previously already deployed) will be automatically distributed on the same key.
New Data Sources / Datamarts
When a Data Source or Datamart has not previously been deployed, i.e., there is no corresponding table in the database, then a distributed table is created for it on the first deploy. In new implementations, it is recommended to choose the distribution key and configure it in the necessary Data Sources and Datamarts from the beginning.
Existing Data Sources / Datamarts
When a table already exists in the database, it is not automatically converted on deployment to a distributed table after the Data Source / Datamart is reconfigured to have a distributed key. Distributed tables will normally be used to hold large amount of data and altering a table on the fly in a UI transaction would very quickly time out.
Instead, the table is rebuilt as a new and distributed table when the IndexMaintenance job for the Data Source / Datamart is run in non-incremental mode.
Upgrade to 13.0
The Publishing Data Load is not automatically created when upgrading to 13.0. Instead, it is created when the reconfigured Datamart is redeployed.
Before the Publishing Data Load is run, a query on the DM will return the exact same result as before the upgrade. Once the Publishing Data Load has run for the first time, the query will use the published data only. From this moment on, a Datamart query can return a result deviating from loaded data, if this data is not yet published.
It is not required to run or schedule the Publish DL after upgrading to 13. As previously mentioned, as long as it is not run, i.e., the system-generated Publishing Data Load remains in the DRAFT state, Datamart queries will find the unpublished data (now also called refreshed or staging data), as before. When a Datamart is refreshed, any new and modified data will immediately be reflected in query results. The same applies when the Datamart is truncated.
After the Publishing Data Load has been run, this behaviour changes. From this moment on, the only way to expose changes to the data is to re-publish the data, including when truncating the Datamart.
After upgrading, there is an incentive to publish the Datamart data, as with CItus the query performance should be much improved. The main reason for not running or scheduling the Publishing Data Load when upgrading is that this additional step needs to be appropriately fitted into your data load flow.
Datamart Enrichment
Enriching a Datamart involves populating placeholder fields not sourced from any Data Source using a Calculation Data Load, which is necessary for fields not easily calculated with forward expressions. In versions 12.x and before, fields sourced from a Data Source could be changed using the DatamartRowSet
API, but this led to inconsistent query results and user confusion. Therefore, version 13.0 will not support modifying Datamart fields this way.
Datamart Loading
Generating Datamart rows using a Calculation job is not a recommended configuration, as Datamarts should be populated by their Refresh Data Load. Though this might work with 'None' normalization if the refresh is never run, this approach won't be supported in version 13 and beyond. The correct method is to populate the DM's main DS.
Groovy API
The impact on the Groovy API is minimal, as it is assumed that all DM clients intend to use only the published data, with one exception: the Calculation/Enrichment job, which requires access to the newly loaded ('refresh') data. Therefore, a single method is added to the DatamartContext
interface to accommodate this need:
getDatamart(String name, Boolean useRefreshData)
REST/JSON API
While the requirement to access unpublished Datamart data by external clients is not expected, there is one exception: a data manager user may want to see the Refresh data, after it’s loaded and before it is (optionally) fully enriched and published. We allow this with a URL parameter in the datamart.fetchdata
endpoint, for example:
pricefx/customer/datamart.fetch/107.DM?refreshData=true