CSVLint

This module validates CSV inputs. For the validation purposes, a validation schema is needed. The schema provides validation rules and other configuration how to validate the schema.

Schema Definition

The schema is in the JSON format and looks like this (added descriptions of elements):

{ "title": "Pricefx data schema for set 1 - cars", "name": "My Pricefx Schema 1", "description": "Lorem ipsum", "options": { "failFast": true, "rowCountOnly": false, "ignoreEmptyLines": false, "limitLines": 10, "charset": "UTF-8", "delimiter": ",", "quoteChar": "\"", "quoteMode": "NONE", "escapeChar": "~", "headers": [ "Transaction Id", "Invoice Date", "Customer Id", "Product Id", "Quantity", "List Price", "Discount", "Cost", "Currency", "UOM", "PricingDate" ] }, "fields": [ { "name": "sku", "title": "Stock-keeping unit", "description": "A stock-keeping unit (SKU) is a scannable bar code, most often seen printed on product labels in a retail store. The label allows vendors to automatically track the movement of inventory. The SKU is composed of an alphanumeric combination of eight-or-so characters.", "constraints": { "required": true, "unique": false, "type": "POSITIVE_INTEGER", "minLength": 1, "maxLength": 5, "pattern": "/d+", "minimum": 1, "maximum": 1000, "datePattern": "dd-MM-yyyy", "locale": "cs" } }, { "name": "label", "title": "Car label", "description": "Label of the car. Could be anything.", "constraints": { "required": false, "type": "STRING", "maxLength": 255 } }, { "name": "price", "title": "Price of the car", "description": "The price is important.", "constraints": { "required": true, "type": "DOUBLE", "maximum": 100000 } } ] }

Here is the JSON schema of the schema:

Element

Type

Description

Default

Example

Element

Type

Description

Default

Example

title

 

 

String

Custom text – title of the schema.

 

My schema 1

name

 

 

String

Reference name

 

MySchema1

description

 

 

String

Custom text – description of the schema.

 

This is my schema ….

options

 

 

Object

Validation/parsing options. Might be empty.

 

 

 

failFast

 

Boolean

Determines whether to stop the validation after occurrence of the first error. Setting this to true will make the result rows invalid (processing of file is stopped).

false

true

 

rowCountOnly

 

Boolean

Determines whether to only count number of rows in the input dataset. Setting this to true will not perform any validation.

false

false

 

ignoreEmptyLines

 

Boolean

Determines whether to ignore empty lines in the input dataset. By default, empty lines will not trigger a validation error. Setting this to false will throw an error if there are empty lines in the input dataset.
Note: Empty line is nulland not ,, (delimiters only).

true

true

 

limitLines

 

Long

If not set, validation will be performed on the whole dataset. If set to a number, only rows up to that number will be validated.

 

5000

 

charset

 

String

Charset of the dataset

UTF-8

UTF-8

 

delimiter

 

Char

Delimiter of the columns

,

,

 

quoteChar

 

Char

Quote character of the field

 

quoteMode

 

String

Quote mode

 

 

 

escapeChar

 

Char

Escape character

 

 

 

headers

 

Array

Definition of the headers of the dataset. You have two options:

  • Define headers and provide dataset without the header row.

  • Do not define headers and provide the header row in the dataset.

Headers must be known before the file is processed.

 

[“Transaction Id”, “Invoice Date”, “Customer Id”, “Product Id”]

fields

 

 

Array

Fields to be validated.

 

 

 

name

 

String

Name of the field – must match the header name.

 

Transaction Id

 

title

 

String

Custom text – title of the field.

 

My transaction field

 

description

 

String

Custom text – description of the field.

 

This is transaction….

 

constraints

 

Object

Validation constrains of the field. Details of each validation constraint is in the Validators section below.

 

 

 

 

required

Boolean

Determines whether the field is required or not. Validator: RequiredValidator

 

true

 

 

unique

Boolean

Determines whether the field should be unique.

 

false

 

 

type

DataTypes

Required type of the field. Possible values: [STRING, INTEGER, FLOAT, DOUBLE, URL, BOOLEAN, NON_POSITIVE_INTEGER, POSITIVE_INTEGER, NON_NEGATIVE_INTEGER, NEGATIVE_INTEGER, DATE, DATE_TIME, YEAR, YEAR_MONTH, TIME]

 

STRING

 

 

minLength

Integer

Minimal length (chars) of the field

 

10

 

 

maxLength

Integer

Maximal length (chars) of the field

 

20

 

 

pattern

String

Pattern of the field

 

\d+

 

 

minimum

Object

For Number and Date formats, the minimum value that the field must have f(x >= minimum).

 

05-11-2020

 

 

maximum

Object

For Number and Date formats, the maximum value that the field must have f(x <= maximum).

 

158

 

 

datePattern

String

For Date formats, sets the pattern that the date(time) must follow.

 

dd-MM-yyyy'T'HH:mm:ss.SSS

 

 

locale

String

For Number formats, sets the locale that the field must match.

 

de

Process

There are two phases of the validation – preValidation and validation. The file must succeed the preValidation phase without errors in order to perform the validation phase. The reason is that for the validation phase, we must be sure that the input is valid and readable CSV file.

Validators

There are two types of validators – PreValidators and FieldValidators.

PreValidators

PreValidators check that whole data structure meets the requirements. E.g. there is no empty line, the file is â correct CSV, the lines match a certain pattern.

PreValidators can be only used in the preValidation phase.

Supported validators:

Name

Parameters

Description

Name

Parameters

Description

IsCSVValidator

CSVFormat – the schema which the CSV should meet

Determines whether the input dataset is a valid CSV or not.

NotEmptyValidator

 

Checks that there is no empty line in the input dataset.

RegexValidator

String regularExpression – the pattern which the line of the input datasets must match

Checks that each line of the input matches the given regular expression.

FieldValidators

FieldValidators are used for validation of columns in the dataset. FieldValidators can be only used in the validationphase.

Supported validators:

Name

Trigger rule

Reference in constraints

Description

Name

Trigger rule

Reference in constraints

Description

MaxLengthValidator

maxLength != null

maxLength

Checks the max length of the field.

MinLengthValidator

minLength != null

minLength

Checks the min length of the field.

PatternValidator

pattern != null

pattern

Checks the pattern of the field.

RequiredValidator

required == true

required

Checks that the field is empty (i.e. null or empty string).

BooleanValidator

dataType == BOOLEAN

dataType

Checks that the type is Boolean.

DateValidator

dataType in (DATE, DATE_TIME, YEAR, YEAR_MONTH, TIME)

dataType

Checks that the type is Date. Uses minimum and maximum properties to determine the min and max date. Also uses the datePattern property to match the pattern.

DoubleValidator

dataType == DOUBLE

dataType

Checks that type is Double. Uses minimum and maximum properties to determine the min and max values.

FloatValidator

dataType == FLOAT

dataType

Checks that type is Float. Uses minimum and maximum properties to determine the min and max values.

IntegerValidator

dataType in (INTEGER, POSITIVE_INTEGER, NON_POSITIVE_INTEGER, NEGATIVE_INTEGER, NON_NEGATIVE_INTEGER)

dataType

Checks that type is Integer. Can further check the positive/negative. Uses minimum and maximum properties to determine the min and max values.

URLValidator

dataType == URL

dataType

Checks that type is URL.

Examples

Example with Route

<route id="my-route"> <from uri="file:..."/> <setHeader name="PfxValidatorSchema"> <constant>{"name":"mySchemaName", "...here goes the json schema.."}</constant> </setHeader> <... there are data in body as file ...> <to uri="pfx-validator:csv?validationSchemaName=mySchemaName"/> <to uri="pfx-datamart:import"/> </route>

Other Examples

The following examples show different aspects of validators.

Each example uses the same code to validate:

def schemaAsJson = getClass().getClassLoader().getResourceAsStream("schema.json").text when: def result = CSVValidator.preValidate(getClass().getClassLoader().getResourceAsStream("data.csv"), schemaAsJson, [new NotEmptyValidator(), new IsCSVValidator(CSVFormat.DEFAULT)]) if (result.isValid()) { result = CSVValidator.validate(getClass().getClassLoader().getResourceAsStream("data.csv"), schemaAsJson) }

Browse the examples:

IntegrationManager version 5.8.0