/
Key-Value Database Storage

Key-Value Database Storage

If you want to use this feature, please contact your administrator, as this feature is disabled by default and has to be configured for your cluster.

The key-value database storage is a useful solution if you have to store a very large amount of data. For example, if you have thousands of customers and products and for each combination there is potentially a different price and the requirement is to get this price via API very quickly. Pre-calculation is a way to meet this “quick API response” requirement but there is a problem with storing these pre-calculated prices. Neither PostgreSQL nor MySQL work well when it comes to billions of records. Therefore, this solution can use a key-value database as a “super charged cache” with a simple to use Groovy API and REST API.

The current implementation uses “Yugabyte DB” as backing storage/database. It is designed to be utilized for multiple clusters concurrently (by means of “keyspaces”). Generally, the design should also be usable for other similar KV databases (like CockroachDB or others).

Note: In any case this feature is only available for environments on next-gen infrastructure!

Groovy API Methods

The methods that can be used in Groovy API are as follows:

 All of these calls assume/require that the specified table exists. Tables can only be created (or dropped) via REST API.

All methods are also available in distributed calculations.

REST API Endpoints

The complete REST API for the Key Value store, including examples, is documented at https://api.pricefx.com/openapi/reference/pricefx/tag/Key-Value-Store/.

The following REST API endpoints are available:

Examples

The following examples explain the basic usage of the API endpoints.

Example 1

POST /pricefx/martin/kvservice.createtable/prices

Payload:

{ "data" : { "keys" : [ { "columnType" : "TEXT", "columnName" : "sku", "isPrimaryKey" : true }, { "columnType" : "TEXT", "columnName" : "customer", "isPrimaryKey" : true }, { "columnType" : "INT", "columnName" : "record", "isIndexed" : true } ] } }

Creates a table with four columns: sku, customer, record and payload (TEXT). The “payload” column is always added automatically and must not be present in the creation definition.

Valid column types are: UUID, TEXT, INT, TIMESTAMP, DECIMAL, DATE.

columnName is a unique name (minus some reserved names) that can contain only lower case letters or underscores.

There needs to be at least one column with isPrimaryKey = true. These keys are then used to look up records. If the Primary Key consists of multiple columns, all values need to be set for lookup. Additional non-Primary Key columns can be added for further filtering (you cannot filter on “payload”). If also fast lookup on these secondary columns is required they should be indexed (see also the section Performance Considerations below).

In case you DO NOT have indexed secondary columns you can also specify the parameter ttl in parallel to {{keys }} with a number as value. This number specifies the number of seconds the records will expire/disappear after insert. This can be useful for cache tables.

Example 2

POST /pricefx/martin/kvservice.fetchkey/prices

Payload:

{ "data" : { "keys" : { "sku" : "A1", "customer" : "B2" } } }

Retrieves a single entry.

Example 3

POST /pricefx/martin/kvservice.loaddata/prices

Payload:

{ "data" : { "header" : ["sku","customer","record","payload"], "data": [ ["A1","83-3126689",1,"{ ... JSON ... }"], ["A1","45-0245039",1,"{ ... JSON ... }"], ["A2","02-0995855",1,"{ ... JSON ... }"], ... ... ] } }

Performs a bulk upload of data. Existing Primary Key combos are overwritten automatically. This is the only command where you explicitly have to specify the automatic “payload” column. While technically you could put any text into that column, other endpoints and groovy perform JSON conversions (e.g. return a proper map) when you perform key lookups. So inserting valid JSON into payload is highly recommended!

Performance Considerations

The fastest option is a plain table with only a Primary Key or multiple columns forming a Primary Key and with no secondary columns. This results in the fastest insert speeds as well as fast lookups using Primary Key only. Use this approach wherever possible.

As soon as secondary columns with indexes are defined, the insert speed drops dramatically (though still fast enough for most use cases). However, you can then efficiently query also for other values than Primary Key. Primary Key columns cannot be part of an index. So if you need to query for a partial Primary Key, you need to duplicate the value and assign an index on one column and make the other column part of the Primary Key, or you use artificial Primary Key values. TTL is also not supported in this case.

Found an issue in documentation? Write to us.