Rules
In iceDQ, a Rule is the smallest unit of work that can be executed. It can be used to identify data issues by validating a single dataset or comparing multiple datasets. Rules are written in a combination of SQL and expressions (Groovy/Java), and the in-memory engine uses them to identify and report any data issues.
There are five types of out-of-box Rule templates available in iceDQ: Checksum, Validation, Pushdown, Recon, and Script.
Checksum Template
It is the most basic type of Rule template that allows a user to compare numeric (aggregated) values between any two data sources. The most common use case for this is Row Count Comparison. Below is the list of few more use cases.
- Compare counts between a file and table.
- Ensure the reference table has specific number of records.
- Identity the sudden spike or dip in the rows processed daily.
Validation Template
This Rule template helps identify data issues in a single dataset with multiple columns and rows of any data source. The following is a list of use cases, but it is not exhaustive.
- Identify NULL values
- Ensure value conforms to specified Datatype
- Identify any value that does not match the specified Pattern
Pushdown Template
As the name suggests this template allows users to perform push-down validations for identifying data issues in any data source.
- Identify Duplicate data in a table
- Ensure only 1 record is active at any given time (Type II Dimension Test)
- Find any Referential integrity data issues
Recon Template
This template can be used by users to identify data issues across any two data sources, regardless of the volume or type of data. It does this by comparing the data row by row and column by column.
- Identify all the records that did not load in to the target as data issues.
- Ensure the schema between UAT and PROD environment is same.
- Identify any values between source and target that are not transforming correctly for specified attributes.
Script Template
Script rule enables users write and execute any kind of groovy script.
- Execute DDL/ DML statements to generate test data.
- Create a properly formatted delimited file for incoming feed files.
- Pick up values from the database to be passed as parameters to other rules.