CSVLint
This module validates CSV inputs. For the validation purposes, a validation schema is needed. The schema provides validation rules and other configuration how to validate the schema.
Schema Definition
The schema is in the JSON format and looks like this (added descriptions of elements):
{
"title": "Pricefx data schema for set 1 - cars",
"name": "My Pricefx Schema 1",
"description": "Lorem ipsum",
"options": {
"failFast": true,
"rowCountOnly": false,
"ignoreEmptyLines": false,
"limitLines": 10,
"charset": "UTF-8",
"delimiter": ",",
"quoteChar": "\"",
"quoteMode": "NONE",
"escapeChar": "~",
"headers": [
"Transaction Id",
"Invoice Date",
"Customer Id",
"Product Id",
"Quantity",
"List Price",
"Discount",
"Cost",
"Currency",
"UOM",
"PricingDate"
]
},
"fields": [
{
"name": "sku",
"title": "Stock-keeping unit",
"description": "A stock-keeping unit (SKU) is a scannable bar code, most often seen printed on product labels in a retail store. The label allows vendors to automatically track the movement of inventory. The SKU is composed of an alphanumeric combination of eight-or-so characters.",
"constraints": {
"required": true,
"unique": false,
"type": "POSITIVE_INTEGER",
"minLength": 1,
"maxLength": 5,
"pattern": "/d+",
"minimum": 1,
"maximum": 1000,
"datePattern": "dd-MM-yyyy",
"locale": "cs"
}
},
{
"name": "label",
"title": "Car label",
"description": "Label of the car. Could be anything.",
"constraints": {
"required": false,
"type": "STRING",
"maxLength": 255
}
},
{
"name": "price",
"title": "Price of the car",
"description": "The price is important.",
"constraints": {
"required": true,
"type": "DOUBLE",
"maximum": 100000
}
}
]
}
Here is the JSON schema of the schema:
Element | Type | Description | Default | Example | ||
---|---|---|---|---|---|---|
title |
|
| String | Custom text – title of the schema. |
| My schema 1 |
name |
|
| String | Reference name |
| MySchema1 |
description |
|
| String | Custom text – description of the schema. |
| This is my schema …. |
options |
|
| Object | Validation/parsing options. Might be empty. |
|
|
| failFast |
| Boolean | Determines whether to stop the validation after occurrence of the first error. Setting this to | false | true |
| rowCountOnly |
| Boolean | Determines whether to only count number of rows in the input dataset. Setting this to | false | false |
| ignoreEmptyLines |
| Boolean | Determines whether to ignore empty lines in the input dataset. By default, empty lines will not trigger a validation error. Setting this to | true | true |
| limitLines |
| Long | If not set, validation will be performed on the whole dataset. If set to a number, only rows up to that number will be validated. |
| 5000 |
| charset |
| String | Charset of the dataset | UTF-8 | UTF-8 |
| delimiter |
| Char | Delimiter of the columns | , | , |
| quoteChar |
| Char | Quote character of the field | “ | “ |
| quoteMode |
| String | Quote mode |
|
|
| escapeChar |
| Char | Escape character |
|
|
| headers |
| Array | Definition of the headers of the dataset. You have two options:
Headers must be known before the file is processed. |
| [“Transaction Id”, “Invoice Date”, “Customer Id”, “Product Id”] |
fields |
|
| Array | Fields to be validated. |
|
|
| name |
| String | Name of the field – must match the header name. |
| Transaction Id |
| title |
| String | Custom text – title of the field. |
| My transaction field |
| description |
| String | Custom text – description of the field. |
| This is transaction…. |
| constraints |
| Object | Validation constrains of the field. Details of each validation constraint is in the Validators section below. |
|
|
|
| required | Boolean | Determines whether the field is required or not. Validator: RequiredValidator |
| true |
|
| unique | Boolean | Determines whether the field should be unique. |
| false |
|
| type | DataTypes | Required type of the field. Possible values: [STRING, INTEGER, FLOAT, DOUBLE, URL, BOOLEAN, NON_POSITIVE_INTEGER, POSITIVE_INTEGER, NON_NEGATIVE_INTEGER, NEGATIVE_INTEGER, DATE, DATE_TIME, YEAR, YEAR_MONTH, TIME] |
| STRING |
|
| minLength | Integer | Minimal length (chars) of the field |
| 10 |
|
| maxLength | Integer | Maximal length (chars) of the field |
| 20 |
|
| pattern | String | Pattern of the field |
| \d+ |
|
| minimum | Object | For Number and Date formats, the minimum value that the field must have |
| 05-11-2020 |
|
| maximum | Object | For Number and Date formats, the maximum value that the field must have |
| 158 |
|
| datePattern | String | For Date formats, sets the pattern that the date(time) must follow. |
| dd-MM-yyyy'T'HH:mm:ss.SSS |
|
| locale | String | For Number formats, sets the locale that the field must match. |
| de |
Process
There are two phases of the validation – preValidation
and validation
. The file must succeed the preValidation phase without errors in order to perform the validation phase. The reason is that for the validation phase, we must be sure that the input is valid and readable CSV file.
Validators
There are two types of validators – PreValidators and FieldValidators.
PreValidators
PreValidators check that whole data structure meets the requirements. E.g. there is no empty line, the file is â correct CSV, the lines match a certain pattern.
PreValidators can be only used in the preValidation
phase.
Supported validators:
Name | Parameters | Description |
---|---|---|
IsCSVValidator | CSVFormat – the schema which the CSV should meet | Determines whether the input dataset is a valid CSV or not. |
NotEmptyValidator |
| Checks that there is no empty line in the input dataset. |
RegexValidator | String regularExpression – the pattern which the line of the input datasets must match | Checks that each line of the input matches the given regular expression. |
FieldValidators
FieldValidators are used for validation of columns in the dataset. FieldValidators can be only used in the validation
phase.
Supported validators:
Name | Trigger rule | Reference in constraints | Description |
---|---|---|---|
MaxLengthValidator | maxLength != null | maxLength | Checks the max length of the field. |
MinLengthValidator | minLength != null | minLength | Checks the min length of the field. |
PatternValidator | pattern != null | pattern | Checks the pattern of the field. |
RequiredValidator | required == true | required | Checks that the field is empty (i.e. |
BooleanValidator | dataType == BOOLEAN | dataType | Checks that the type is Boolean. |
DateValidator | dataType in (DATE, DATE_TIME, YEAR, YEAR_MONTH, TIME) | dataType | Checks that the type is Date. Uses |
DoubleValidator | dataType == DOUBLE | dataType | Checks that type is Double. Uses |
FloatValidator | dataType == FLOAT | dataType | Checks that type is Float. Uses |
IntegerValidator | dataType in (INTEGER, POSITIVE_INTEGER, NON_POSITIVE_INTEGER, NEGATIVE_INTEGER, NON_NEGATIVE_INTEGER) | dataType | Checks that type is Integer. Can further check the positive/negative. Uses |
URLValidator | dataType == URL | dataType | Checks that type is URL. |
Examples
Example with Route
<route id="my-route">
<from uri="file:..."/>
<setHeader name="PfxValidatorSchema">
<constant>{"name":"mySchemaName", "...here goes the json schema.."}</constant>
</setHeader>
<... there are data in body as file ...>
<to uri="pfx-validator:csv?validationSchemaName=mySchemaName"/>
<to uri="pfx-datamart:import"/>
</route>
Other Examples
The following examples show different aspects of validators.
Each example uses the same code to validate:
def schemaAsJson = getClass().getClassLoader().getResourceAsStream("schema.json").text
when:
def result = CSVValidator.preValidate(getClass().getClassLoader().getResourceAsStream("data.csv"), schemaAsJson, [new NotEmptyValidator(), new IsCSVValidator(CSVFormat.DEFAULT)])
if (result.isValid()) {
result = CSVValidator.validate(getClass().getClassLoader().getResourceAsStream("data.csv"), schemaAsJson)
}
Browse the examples:
IntegrationManager version 5.8.0