Validation

A validation template enables users to test data formats, lengths, value ranges, pattern matching, and complex business logic within a given dataset. It can be applied at multiple points within the data pipeline, including incoming feed files, staging tables, dimension tables, APIs, and other data sources. It allows users to thoroughly validate a single data source and can be applied to the following use cases:

Use Cases

Data Ingestion Validation: Validation templates can be leveraged at the point of data ingestion into an ETL process to ensure the incoming data adheres to predefined formats and business rules. This prevents dirty data from propagating through the pipeline.
Data Warehouse Validation: These templates can be applied to data residing in the data warehouse to validate data completeness, enforce referential integrity, check for duplicates, validate conformance to standard patterns (e.g., email, phone numbers), and ensure historical consistency.
Data Quality Monitoring: Validation templates play a key role in ongoing data quality validation, allowing automated checks across all the dimensions of data quality. This helps in continuously monitoring the health of the data assets and identifying quality issues proactively.

Components

There are five (5) different components in a Validation Rule:

Overview: To provide rule related metadata.
Source Dataset: To provide source connection details required to fetch the dataset.
Checks: To provide the conditions for testing the datasets.
Exception Report: To provide the conditions for viewing and evaluating data mismatches.
Summary: To provide variables that are attached to rule execution instance event.

Validation Rule Components

Overview

This component allows users to define rules and configure advanced options for parameterization. Below are the details for the fields included in the rule definition:

Field	Description	Purpose
Rule Name (Mandatory)	The unique name of the rule within a folder. It should be descriptive enough to help users quickly identify and understand the purpose of the rule.	Serves as the primary identifier for the rule. Ensures clarity and uniqueness within the folder structure.
Rule Code (Optional)	An alphanumeric shorthand or technical identifier for the rule. This field is optional but highly useful for efficient management and organization of rules.	Helps users search, filter, or differentiate between similar rules. Provides a compact reference for integrations, automation, or workflows.
Description (Optional)	A concise explanation of the rule. This field offers additional technical or business context to enhance understanding.	Clarifies the rule's intent or scope for users. Assists in onboarding new team members or providing documentation for audits.
Purpose (Optional)	A short summary describing the purpose or intended outcome of the rule. This field provides context for why the rule was created.	Highlights the business or technical objective of the rule. Helps users understand its significance and expected impact.
Criticality (Mandatory)	The Criticality field indicates the severity level of an issue. This is a mandatory field that helps categorize and prioritize errors based on their impact. The field provides the following options: Warning (Default): The issue should be reviewed but does not require immediate action. Critical: The issue needs to be resolved promptly; however, testing can continue despite its presence. Blocker: The issue must be resolved before testing can proceed, as it prevents further progress.	Ensures a clear understanding of the issue's severity, enabling teams to prioritize effectively and maintain workflow efficiency.

Advance Settings

Field	Description	Purpose
Parameter (Optional)	This field specifies the default parameter file associated with the rule.	Enables parameterization of the rule by referencing keys in the parameter file in SQL, checks, etc.

Source Dataset

This section allows users to configure the source settings to extract data for processing. The following fields and options are available:

Connection Type: This field allows users to select the type of data source (application, database, file ...) they want to connect to. Selecting the appropriate connection type ensures that only relevant connections are displayed for further selection.
Connection: This field displays a list of connections available for the selected connection type. Once selected a user can:
- Test the connection using the ✔️ Checkmark Icon.
- Edit the connection details using the ✏️ Edit Icon.
- Refresh the list of connections using the 🔄 Refresh Icon.
Methods for reading data from the data source
- Default: This option allows users to browse and select the required Schema and Table/View from the chosen connection. It is Ideal for users who prefer navigating schemas and tables visually without writing SQL.
- SQL: This option allows users to manually input custom SQL queries to retrieve specific data from the selected connection.It provides flexibility for advanced users to query data directly, offering precise control over the data extraction process.
Preview Data: Clicking the Preview Data button allows users to view a sample of the data retrieved based on the selected table (Default) or SQL query.

Checks

Checks are used to validate format, patterns, ranges, or any other test conditions involving attributes from the source dataset. Here is the list of Out Of Box Checks and how to add Custom Check.

info

Currently all rules supports Groovy expression language for writing test conditions in Custom Check. You can use any Groovy, Java, or user-defined functions when writing these expressions.

Exception Report

This component is for configuration of exception report display, storage and export formats. Below are the different properties available in this component.

Property	Description	Note
Export File Format	Specifies the default export format. Options include Excel or CSV.
Store Failure & Errors Only	When enabled, only failed records are stored in the report.	This property is enabled by default.
Store Exported File Only	Deletes the raw exception report and retains only the exported file.
Show Download Exception URL	Displays a download link for the exception report.
Store Only # Rows	Limits the number of records stored in the report to the specified count.

Summary

Users can configure result in override as well as attach custom variables to the Rule execution event.

Exit Code

The exit code of a Rule execution indicates whether the Rule succeeded, failed, or encountered an error. It represents the count where one or more checks failed (evaluated to false).

Exit Code	Status	Note
== 0	Success	This indicates that the rules executed were successful with all records.
> 0	Failure (Warning/ Critical/ Blocker)	This indicates that the rules executed failed with few or all records.
< 0	Error	This indicates that the rule itself encountered an error during execution. Every negative Exit Code code has its own significance. In case of error, revisit the rule, input records or check the product health to figure out the issue. Or, contact iceDQ Support.

Note

An Exit Code is the total number of rows for which 1 or more checks evaluated to false or identified a data issue.

Scenarios

Following scenarios will help you understand how the Exit Code of a Validation Rule is calculated.

Scenario #1

In this example the Exit Code will be 2 as two rows have data issues for two different checks.

Validation Rule Scenario #1

Scenario #2

In this example the Exit Code will be 1 as only 1 row has data issues for two different checks.

Validation Rule Scenario #2

How To: Create a Validation Rule

This video shows you how to create a validation rule against customer data.

Understanding the Engine Behaviour

Below is the behaviour of the engine when executing a Validation Rule.

Connect to the source and read the data in-memory in micro-batches (e.g. 10k rows/ batch) that needs to be validated
Evaluate each check (test condition) created in the Rule
Calculate the number of rows processed and issue identified
Write the data issues to an exception report
Repeat step 2-4 till all the data has been evaluated
Complete the execution and submit the results

Validation Rule Engine Behaviour

Considerations

Always use an alias when writing complex SQL and avoid using spaces and special characters in the name.
Add a column as Source Check if it's not part of any expression but the user would like to see it in the report.
Only Underscore (_) and Hyphen (-) special characters are allowed in the rule Name.
Always use Stop at Error and Store Failure/ Error properties when validating high volume data.
Update the column names in Checks if it changes in Source dataset.

Components​

Overview​

Advance Settings​

Source Dataset​

Checks​

Exception Report​

Summary​

Exit Code​

Scenarios​

Scenario #1​

Scenario #2​

How To: Create a Validation Rule​

Understanding the Engine Behaviour​

Considerations​