Skip to main content

Checksum


A checksum template allows users to define rules for comparing numeric values across two data sources. It evaluates whether the values are identical, differ, or fall within a specified threshold, based on the test conditions configured by the user.

Use Cases

  • Verifying that the row count between a file and a table is identical.
  • Ensuring that the sum of amounts across different databases falls within a defined threshold.
  • Comparing the record count of a reference table against a static number to confirm consistency over time.

Components

There are five (5) different components in a Checksum Rule:

  • Overview: To provide rule related metadata.
  • Source Dataset: To provide source connection details required to fetch the dataset.
  • Target Dataset: To provide target connection details required to fetch the dataset.
  • Checks: To provide the conditions for testing the datasets .
  • Summary: To provide variables that are attached to rule execution instance event.

Checksum Rule Component

Following section describes the various fields of each component in detail.

Overview

This component allows users to define rules and configure advanced options for parameterization. Below are the details for the fields included in the rule definition:

FieldDescriptionPurpose
Rule Name (Mandatory)The unique name of the rule within a folder. It should be descriptive enough to help users quickly identify and understand the purpose of the rule.Serves as the primary identifier for the rule. Ensures clarity and uniqueness within the folder structure.
Rule Code (Optional)An alphanumeric shorthand or technical identifier for the rule. This field is optional but highly useful for efficient management and organization of rules.Helps users search, filter, or differentiate between similar rules. Provides a compact reference for integrations, automation, or workflows.
Description (Optional)A concise explanation of the rule. This field offers additional technical or business context to enhance understanding.Clarifies the rule's intent or scope for users. Assists in onboarding new team members or providing documentation for audits.
Purpose (Optional)A short summary describing the purpose or intended outcome of the rule. This field provides context for why the rule was created.Highlights the business or technical objective of the rule. Helps users understand its significance and expected impact.
Criticality (Mandatory)The Criticality field indicates the severity level of an issue. This is a mandatory field that helps categorize and prioritize errors based on their impact. The field provides the following options:

Warning (Default): The issue should be reviewed but does not require immediate action.
Critical: The issue needs to be resolved promptly; however, testing can continue despite its presence.
Blocker: The issue must be resolved before testing can proceed, as it prevents further progress.
Ensures a clear understanding of the issue's severity, enabling teams to prioritize effectively and maintain workflow efficiency.

Advance Settings

FieldDescriptionPurpose
Parameter (Optional)This field specifies the default parameter file associated with the rule.Enables parameterization of the rule by referencing keys in the parameter file in SQL, checks, etc.

Source Dataset

This section allows users to configure the source settings to extract data for processing. The following fields and options are available:

  1. Connection Type: This field allows users to select the type of data source (application, database, file ...) they want to connect to. Selecting the appropriate connection type ensures that only relevant connections are displayed for further selection.
  2. Connection: This field displays a list of connections available for the selected connection type. Once selected a user can:
    • Test the connection using the ✔️ Checkmark Icon.
    • Edit the connection details using the ✏️ Edit Icon.
    • Refresh the list of connections using the 🔄 Refresh Icon.
  3. Methods for reading data from the data source
    • Default: This option allows users to browse and select the required Schema and Table/View from the chosen connection. It is Ideal for users who prefer navigating schemas and tables visually without writing SQL.
    • SQL: This option allows users to manually input custom SQL queries to retrieve specific data from the selected connection.It provides flexibility for advanced users to query data directly, offering precise control over the data extraction process.
  4. Preview Data: Clicking the Preview Data button allows users to view a sample of the data retrieved based on the selected table (Default) or SQL query.

Target Dataset

This section allows users to configure the target settings to extract data for processing. The following fields and options are available:

  1. Connection Type: This field allows users to select the type of data source (application, database, file ...) they want to connect to. Selecting the appropriate connection type ensures that only relevant connections are displayed for further selection.
  2. Connection: This field displays a list of connections available for the selected connection type. Once selected a user can:
    • Test the connection using the ✔️ Checkmark Icon.
    • Edit the connection details using the ✏️ Edit Icon.
    • Refresh the list of connections using the 🔄 Refresh Icon.
  3. Methods for reading data from the data source
    • Default: This option allows users to browse and select the required Schema and Table/View from the chosen connection. It is Ideal for users who prefer navigating schemas and tables visually without writing SQL.
    • SQL: This option allows users to manually input custom SQL queries to retrieve specific data from the selected connection.It provides flexibility for advanced users to query data directly, offering precise control over the data extraction process.
  4. Preview Data: Clicking the Preview Data button allows users to view a sample of the data retrieved based on the selected table (Default) or SQL query.

Checks

If the user has selected a schema and table but has not provided their own SQL, the checksum rule will automatically create the following default check:

S.[source_count] - T.[target_count] == 0

This default condition calculates the difference between the source and target counts and checks if the result equals zero.

  • If true: The rule passes, indicating success.
  • If false: The rule fails, signaling a mismatch.

Customizing the Check: Users can edit and modify the default check to include any test condition based on their specific requirements using the two variables:

  • S.[column_name]: Refers to the source colum name.
  • T.[column_name]: Refers to the target column name.

Examples

ExampleDescriptionCode
Tolerance for Row CountAllowing a small tolerance of 5 rows between source and target count.
S.[source_count] - T.[target_count] <= 5
Percentage DifferenceAllowing a difference within 1% of the claim amount.
S.[claim_amount] - T.[claim_amount] <= (S.[claim_amount] * 0.01)
important
  • The Check component activates only when both source and target details are correctly filled.
  • Only one check is allowed in the checksum rule, and this check is mandatory. It cannot be deleted or deactivated.

Summary

Users can configure result override as well as attach custom variables to the Rule execution event.

How To: Create a Checksum Rule

To create this rule user needs at least contributor or Owner permissions. User can follow the following steps:

  1. Select Home > Data Testing >Rules
  2. Select the workspace on right hand top corner
  3. Click on New Rule
  4. Select the folder from workspace
  5. Enter Rule Name - Only alphanumeric, hyphen, underscore, period, at sign and spaces characters are allowed.
  6. Engine Type – High Throughput or Standard
  7. Choose Template – Checksum
  8. Fill all the values mentioned in the components section
  9. Publish the rule

This video shows you how to create a Checksum Rule.

Exit Code

The exit code of a Rule execution indicates whether the Rule succeeded, failed, or encountered an error.

Exit CodeStatusNote
== 0SuccessThis indicates that the rules executed were successful with all records. In this case, the source and target count match.
> 0Failure (Warning/ Critical/ Blocker)This indicates that the rules executed failed with few or all records. In this case, the source and target count do not match. The positive number is an absolute difference between the count of source and target tables.
< 0ErrorThis indicates that the rule itself failed during execution. Every code has its own significance. In case of error, revisit the rule, input records or the product health to see if you can figure out the mistake. If not, contact iceDQ’s support team.
-38ErrorThe Check evaluated to FALSE but the difference of Source & Target is 0.
Note

When the Check evaluates to FALSE and difference is 0 the engine sets the Exit Code to -38 to prevent a False Positive result.


Scenarios

Following scenarios will help you understand how the Exit Code of a Checksum Rule is calculated.

Scenario #1

In this case, the rule fails with an exit code of 50 because the source returns 50 records more than the target.

Checksum Rule Scenario #1

Scenario #2

In this case, the rule passes with an exit code of 0 because the source and target return the same number of records and the check evaluates to True.

Checksum Rule Scenario #2


Other Actions

Once the configuration is complete, users have following options:

Field NameDescription
Discard RuleThis option is available throughout the process until the Publish option is hit. During the initial creation, if discard is pressed, the entire rule will be discarded. Once the rule is published and then any changes are made, discard will roll back only the changes made during the last edit.
PublishOnce the rule is configuration is complete, the publish option commits the rule created. Please Note
RunOnce the Rule is published, the Run option will activate. This option will actually execute the rule and show the result.
Recent RunsWill highlight the box that displays the recent runs.
DeleteDeletes the rule. This option is activated once the rule is published.
More >DuplicateDuplicates the rule
More>ShareYou can share the link to other users
More>Deactivate/ActivateThis option toggles between activation and deactivation of rules
SubscriptionsThis option is used to send the email notification to selects users on test Success, test Failures and Rule execution Errors

Consideration

  1. In a Checksum Rule the SQL must return one numeric value with one column only.
  2. Always use an alias when writing an aggregate function and avoid using spaces and special characters in the name.
  3. A Rule Name is unique inside a folder but can be changed anytime.
  4. In Checksum Rule write the groovy expression such that it evaluates to either True or False.
note

Checksum Rule does not generate an Exception Report since it evalutes only two numeric values and no other data.