Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By transitioning to the Citus solution in NextGen, customers can benefit from a more streamlined and reliable database infrastructure capable of meeting the complex and dynamic requirements inherent to Pricefx's PA ecosystem.

Rampur Upgrade Flowchart

The following illustration depicts the flowchart for a process involving the upgrade to Rampur version 13 and subsequent steps based on variety of conditions.

...

Rampur Upgrade Flowchart Steps

Here is a detailed breakdown of the flowchart:

  1. Start

  2. Upgrade to 13

  3. Using Greenplum?

    • Yes: Move to the next decision point.

    • No: End of the process.

  4. Migrate to NextGen?

    • Yes: Create Citus DB Cluster, then Identify large DS/DMs, followed by Configure Distribution Keys, and finally Done.

    • No: Move to the next decision point.

  5. Use DM Publishing?

    • Yes: Schedule DM Publishing DL(s), then Done.

    • No: Rebuild distributed DS/DMs, then Done.

Rampur Upgrade Process Insights:

  • Upgrade Path: The process starts with an upgrade to version 13.

  • Greenplum Usage: The first decision point checks if Greenplum is being used and If not, then the process ends.

  • NextGen Migration: If Greenplum is used, the next decision point is whether to migrate to NextGen. If migrating, then a Citus DB Cluster is created, large DS/DMs are identified, and distribution keys are configured.

  • DM Publishing: If not migrating to NextGen, the next decision checks for DM Publishing usage.

    • If using DM Publishing, scheduling DL(s) is done.

    • If not using DM Publishing, distributed DS/DMs are rebuilt.

This flowchart provides a clear and structured approach to handle database upgrades and migrations based on specific conditions and requirements.

Additional PA Considerations for Rampur

Migrate to NexGen

The rationale for migration is that the Citus database solution employed in the NextGen environment is considered a more robust and capable option compared to the legacy Greenplum deployment. Greenplum, while functional, represents a more complex database system that requires extensive configuration and tuning efforts to ensure optimal performance across the wide-ranging and often dynamic requirements of Pricefx customers.

These customer-specific demands can encompass varied PA data schemas, significant data volumes, diverse reporting and dashboard queries, as well as the intricate pricing logic governing quotes, agreements, and batch processing workflows. Migrating to the NextGen platform with its Citus-based architecture provides a more streamlined and reliable database solution capable of meeting these complex operational needs.

Create Citus DB Cluster

The initial Citus cluster configuration for the migration involves a single Coordinator node paired with two Worker nodes. In this setup, all of the existing PA data is first migrated to the Coordinator node, which must be provisioned with sufficient computing resources to accommodate this data payload.

...

Panel
panelIconIdatlassian-info
panelIcon:info:
bgColor#B3D4FF

LEARN MORE: To learn more about this process, click here.

Identify Large Data Sources (DS) and Data Marts (DM)

Unlike the legacy Greenplum deployment, the approach taken with the Citus-based migration does not involve automatically distributing the data across all Data Sources (DSs) and Data Marts (DMs). In contrast, a more selective approach is adopted, as most tables, when considering the total number of rows, do not stand to benefit from distributed data storage.

...

When choosing which tables to distribute, we also need to consider the dependency between a DM’s distribution key, and that of its constituent DS(s). See the next step.

Configure Distribution Keys

This concept can be best illustrated through an example. When starting with a large Transactions Data Mart (DM), the obvious choice for the distribution key would typically be the sku or productId field. This is because the out-of-the-box functionality in Pricefx's PA solution is often oriented around product-centric data and workflows.

...

Tip

NOTE: However, it is important to note that this is not a universal rule, as the optimal distribution key can vary based on the specific nuances of each customer's data landscape.

Distribution Key Examples

For example, in the case of the Company X deployment, the reverse scenario was true, with the customer data (reflected in the customerId field) comprising the more suitable distribution key for the Transactions DM.

...

These examples illustrate the importance of carefully evaluating the unique data characteristics and relationships within each customer's PA environment to identify the most suitable distribution key for the DMs.

Distribution Key Configuration Summary

Once the optimal distribution key has been identified, it is crucial that this configuration is applied consistently across both the Data Mart (DM) and its corresponding primary Data Source (DS).

...

Careful coordination of the distribution key settings across the DMs and their primary DSs is essential to maintain the integrity and operational reliability of the Pricefx PA solution.

Rebuild distributed DS/DMs

It is important to note that when an existing Data Source (DS) or Data Mart (DM) is configured to be distributed, simply deploying the new configuration does not automatically convert the underlying database table structure. Instead, an additional step is required to physically rebuild the table to align with the distributed architecture.

...

This same method can be used when removing or changing the distribution key of a DS/DM.

Use DM Publishing

Why use this? There is a functional case and a performance based one. For a detailed explanation of what DM publishing entails,

...

Tip

NOTE: When using Citus, there is significant performance to be gained from the fact that the Publish DM DB table is column oriented, and its data is compressed.

Scheduling DM Publishing DL(s)

The moment a DM’s Publishing DL has run for the first time, any client or logic querying the DM data will see only this published data. New or modified data loaded in its DSs, will not show until after the next Publishing DL run. Clearly, this new DL needs to be appropriately scheduled into the overall PA data load sequence.

...