Data Loads

A Data Load represents a task/process for data manipulation between Analytics objects, such as uploading data from Data Feed to Data Source, deleting rows from Datamart or calculation of new values of the fields. 

Most Data Loads are created automatically (when you deploy a Data Source or Datamart) but you can also create them manually (e.g., a calculation Data Load to manipulate data).

Data Loads provide the following actions:

Type

Description

Available for

Type

Description

Available for

Truncate

Deletes (all/filtered) rows in the target.

Note: When a Data Source is deployed, the Truncate Data Load of the linked Data Feed is updated with a filter to include only rows previously successfully flushed to Data Source and it is scheduled to run once a week. This applies only if there is no other filter or schedule already defined.

Incremental mode is no longer available for Truncate jobs. For older jobs (created before upgrade to Collins 5.0 release) where this option was enabled, it will stay enabled. If you disable the Incremental option the check-box will become non-editable and you will not be able to enable the option again. For Data Loads saved with the Incremental option off, the check-box is completely hidden.

  • Data Feed

  • Data Source

  • Datamart

  • Sim Datamart

Flush

Copies data from the Data Feed into the Data Source. It can also convert values from string to proper data types set in the Data Source.

It can copy everything or just new data (i.e., incremental Data Load).

  • Data Feed

  • Data Source

Refresh

Copies data from Data Sources (configured in the Datamart fields) into the target Datamart. 

It can copy everything or just new/modified data (i.e., incremental Data Load).

If you want to run a non-incremental refresh but avoid the costly merging of almost the same data, you can truncate the DM first – set the advanced configuration option 'truncateDatamartOnNonIncRefresh' to true.

Notes:

  • The Source section of this type of Data Load is empty intentionally.

  • Since Godfather 8.1, rows updated during Refresh behave differently: their calculated fields are cleared to NULL instead of being persisted. For details see the release notes.

  • Datamart

  • Sim Datamart

Calculation

Applies a logic (defined in Configuration) to create new rows, or change/update values in existing rows in the target Data Source or Datamart. The calculation can take data from anywhere, e.g., Master Data tables.

Example usage:

  • Datamart / Data Source columns calculations

  • Rebate allocations

  • Copy of data from PX / Company Parameters /... into the Data Source

  • Data Source

  • Datamart

Calendar

Generates rows of the built-in Data Source "cal" and you get a Gregorian calendar. (If you need any other business calendar, just upload the data into the "cal" Data Source from a file or via integration and do not use this Data Load).



Customer

Special out-of-the-box Data Load which copies data from the Master Data table "Customer" into the Data Source "Customer".



Product

Special out-of-the-box Data Load which copies data from the Master Data table "Product" into the Data Source "Product".



Simulation

Applies a logic to the data as defined by the simulation for which the Data Load was created.

  • Sim Datamart

Internal Copy

Copies data from a source into the Data Source table.

The source here can be:

  • Master Data table (P, PX, C, CX, Company Parameters)

  • PA Rollup query (intended for Rollups materialization)

  • PO Model table

  • Price Records table (Quoting or Agreements & Promotions – select the preferred option on the Source tab)

  • Rebate Records table

The easiest way to create this type of Data Load is to create a new Data Source from Template and deploy it; this automatically creates the Data Load and pre-fills the columns.

 The incremental mode in Internal Copy tasks is not exactly the same as in the Refresh or Calculation type. Here, incremental means the Data Source will not be truncated before the copy, i.e., it will keep old records instead of being a true copy.

  • Data Source

Index Maintenance

This task can be run to repair indexes associated with the target Data Source or Datamart, typically after backend DB migration. The task should be run only in these special circumstances, not on a regular or scheduled basis. We also strongly recommend consulting Pricefx support before you run this task.

  • Data Source

  • Datamart

Distributed Calculation

Allows you to split a large set of data into batches and process them independently. See Distributed Calculation in Analytics for details.

  • Data Source

On the Data Loads page, Data Loads are by default grouped by Data Load type on the first level and target type on the second level. The Target column indicates for which object the Data Load is used.

In this list, the Delete button only shows if all selected Data Loads can be deleted, i.e., they are either invalid (e.g., their target object is deleted), or they are not the default Data Load created by the system. (These system generated Data Loads cannot be deleted by a user, only if they are invalid.)

Click on a Data Load's label to display the job details. At the top there are buttons to run the Data Load manually or cancel the load. (Data Loads have a default timeout of 48 hours to accommodate even large and complex jobs. If needed, the jobs can be cancelled here.)

Depending on the Data Load type, you will find here some of these sections:

  • Options – Allows you to make Data Load settings:

    • Target Datamart – Specifies the target of the data coming from the Data Load operation.

      • For the Flush operation, the target is one of the Data Sources.

      • For Refresh, the target is one of the Datamarts.

      • For Calculation, the target is either Data Source or Datamart which you would like to update or enrich with columns.

    • Data Source – Specifies the source of data.

      • For the Flush operation, it will be a Data Feed.

      • For Refresh it is not specified because there can be more sources for one target Datamart. Those source Data Sources are specified in the columns definition of the Datamart.

    • Allow batching – Large amounts of data (more than 5 million rows) can be processed in batches. By default, batch processing is enabled for Flush operations and disabled for Calculation. You can override this setting here. (The batch size is an instance parameter and the default value is 2 million rows. There must not be any dependencies between rows belonging to different batches.)

    • Validation Logic – You can select a validation logic that will validate the target data after refresh. The following rules apply:

      • The target Datamart name is available in the validation logic through the "dmName" binding variable:

        def dmName = api.getBinding("dmName")
      • Use the name to query for the Datamart's data and apply custom validation rules.

      • Raise a warning with a custom validation message when a validation rule is not satisfied:

        api.addWarning("Missing value in field1")
      • When api.addWarning() is invoked, the data validation is considered as failed and the Data Load's status is set to Error. Note that the validation logic result does not affect the Data Load process itself as it is run after the Data Load is completed.

      • Validation messages passed from validation logic are present among the Data Load's calculation messages and can be viewed in the UI.

  • Overview – Summarizes the basic information on the Data Load.

  • Schedule – Here you can also schedule the Data Load manually (described below). The Job/Task Tracking section at the bottom shows the status for each task of the Data Load.

  • Target – Displays the complete target data set. The available options are the same as for Data Sources. View preferences are not available in this table. Instead, quick and advanced filter and sorting are saved automatically when you save the Data Load.

  • Source – Displays the complete source data set. The available options are the same as for Data Sources. View preferences are not available in this table. Instead, quick and advanced filter and sorting are saved automatically when you save the Data Load.

  • Calculation – Specifies the logic to be executed. Such logic is set up in Administration > Logics > Analytics. It can manipulate the data coming from the Source in many ways, e.g., filter the incoming rows from Source, create new lines for Target, modify/enrich/transform the data being copied from the Source into the Target. Click Default Formula to open the currently active Analytics default logic in an editor.
    If you leave the Target Date field empty, the calculation will use "today" as the target date.

Schedule a Data Load

To schedule a Data Load:

  1. On the Schedule tab, click the Add Task button.

  2. Enter a Start Date.

  3. Enter a Period (the load will run every X minute, hour or day).

  4. Enter an Interval representing the number of repetitions in the selected period (e.g., if you set Period = day and Interval = 1 and Start Date = 27/11/2015 10:00, the Data Load will run every day at 10:00 AM, starting on 27/11/2015). You can also enter "0" for jobs that you want to run just once.
    If the Incremental option is checked, only new/changed data is loaded.

  5. Load Date indicates the last time this task ran successfully. Only rows loaded/updated since this time are considered in the Data Load. The Load Date can be edited to force in data older than the last successful run date.

  6. Enter the name of the task.

  7. For Calculation Data Load type, you can also enable the option With target snapshot, meaning that the target rowset will be pre-populated with the target rows in the scope of the Data Load. Otherwise this rowset is empty. For details see the note below.

  8. Rowset holds the rows to be loaded in the target Field Collection. Initially this rowset is empty.

  9. Target and Source are automatically generated. You need to specify these only if you set up the Data Load manually.

Target Snapshot Option

The target rowset represents the updates you want to apply to the actual target data once the Data Load execution is ended. So an empty target rowset means no change will be applied.

If the With target snapshot option is:

  • Enabled – The Data Load starts by copying the whole target into the rowset (in the scope defined by the DL.filter).

  • Disabled – The target rowset is empty; no change is applied.

Note: To delete data, you add a row with the correct key values, and set its isDeleted field to true. See also a Knowledge Base article on this topic.

Example of use: The option With target snapshot can be used in the following advanced use case. When a Calculation job is started, there are two DatamartRowSets available in the formula context: source and target (api.getDatamartRowSet("source")...). The target rowset is initially empty, unless the 'With target snapshot' was checked. The typical use case would be a Flush, where the data to be loaded in the Data Source depends on which data is already there, with Groovy code using the DatamartRowSet API to find and inspect existing rows etc.

 

Found an issue in documentation? Write to us.

 
Pricefx version 12.0