Skip to main content

Checks


Checks are test conditions that users can define to validate one or more attributes within a source or target dataset. They are primarily used to ensure data correctness and to perform comparisons between two values. Checks enable validation for formats, nulls, transformations, calculations, comparisons, and various other conditions. Within the tool, checks are linked to each rule template, excluding Script and Pushdown rules.

Use Cases

  • Reconciliation: Compare any two attributes from different datasets.
  • Validation: Apply standard validations like NULL checks, pattern matching, and duplicate detection, or implement complex business logic through custom expressions.

Behaviour

A check functions as a stateless operation, meaning it evaluates each row independently with no influence from the preceding or succeeding rows. Behavior varies based on the rule template:

  • Checksum Rule: Evaluates only one row.
  • Validation Rule: Evaluates all rows in the dataset.
  • Reconciliation Rule: Evaluates only the rows that are common to both the source and target datasets.

Output

Each check evaluates to one of three states:

  • Success - The test condition evaluated as true.
  • Failure - The test condition evaluated as false.
  • Error - The test condtion could not be evaluated due to incorrect input, semantic error or syntactic error.

Categories

Checks are broadly categorized as follows:

1. Out of the box

These checks are in-built within the tool and can be directly configured without writing any expressions.

  • Completeness: Validates for NULLs, spaces, or empty values.
  • Length: Checks the length of each attribute value.
  • Pattern: Matches values against a defined regular expression.
  • Contains: Verifies if the attribute contains only the specified list of values.
  • Datatype: Checks if the string value can be cast to a specific data type.
  • Range: Ensures values fall within a specified range.
  • Date: Validates string values against selected date formats.
  • Duplicate: Detects duplicates based on a combination of one or more attributes.

How To: Add Out of Box Checks

This video shows how to add different types of out of box checks in a validation rule.


2. Custom

Custom checks are used to evaluate test conditions that cannot be addressed using out-of-the-box checks. They provide flexibility for users to define more complex expressions that meet specific data validation requirements.
Example: A user can convert a source string into an MD5 hash and compare it with a target hashed value.

To create a custom check:

  • Users must write a Groovy or Java expression that returns a boolean value.
  • The expression can be simple or highly complex.
  • Once defined, the expression can be saved and tested as a custom check.

3. Other

These checks are non-functional and are used solely to provide additional context in the exception report:

  • Source: Includes attributes from the source dataset as-is.
    Example: Adding customerID to identify the customer associated with faulty data.
  • Target: Includes attributes from the target dataset as-is.
    Example: Adding a unique identifier to help locate the affected data record.
  • Calculated: Adds derived values that enhance exception report clarity.
    Example: Concatenating first and last names to display the full name of the individual.

Data Quality Dimensions

Each check can be classified in the following data quality dimensions:

  • Completeness: Ensures that all required data is present and nothing is missing.
  • Accuracy: Verifies that data correctly reflects the real-world values it represents.
  • Uniqueness: Confirms that each record or data element appears only once where appropriate.
  • Validity: Checks that data conforms to defined formats, rules, or standards.
  • Timeliness: Assesses whether data is up to date and available when needed.
  • Consistency: Ensures data is uniform across different systems or datasets.
important

Checks are not applicable to pushdown or script rule.