How to Perform Data Validation for Numeric Patterns?
In this use case, we will focus on validating ZIP codes and phone numbers in our data to ensure they conform to predefined standard formats. Our objective is to verify that both ZIP codes and phone numbers adhere to expected patterns. Let's go through the step-by-step process of creating and executing a validation rule for this purpose.
Steps
Here are the steps outlined in the video:
Create Rule
- Begin by creating a new validation rule with an appropriate name.
Set Up Source Data Connection
- Configure a source data connection to the source database, "Adventureworks." The selected schema is "person," and the table used for validation is "address phone."
Preview Data
- Preview the data from the source table, focusing on the columns containing postal codes and phone numbers. These are the columns we aim to validate.
Define Validation Criteria
-
To validate postal codes, establish the expected format: a 5-digit number followed by an optional hyphen and a 4-digit number.
-
For phone numbers, set the expected format: an optional international code (e.g., +1 for USA), followed by a three-digit area code, a three-digit number, and a four-digit number. Allow optional hyphens, spaces, or dots between the numbers. Optionally, enclose the area code in brackets.
Create Validation Checks
-
Create validation checks for each column using pattern-based checks.
-
For postal codes, select "Zip Code" as the pattern and verify that the postal code consists of a 5-digit number followed by an optional hyphen and a 4-digit number.
-
For phone numbers, select "Phone" as the pattern and verify that the phone number consists of a three-digit area code, a three-digit number, and a four-digit number. The numbers can be separated by hyphens, dots, or spaces.
-
Save, Publish, and Execute
- Save the validation checks and proceed to publish and execute the rule.
Review Results
-
After execution, review the results. In the example provided, you may notice a warning and a few failures. Investigate further by clicking on the instance ID.
-
For postal codes, failed records may not conform to the expected 5-digit and 4-digit format.
-
Regarding phone numbers, you might find discrepancies such as missing or incorrect international codes.
-
Video: How to Perform Data Validation for Numeric Patterns?
Conclusion
By employing pattern-based checks using regex pattern matching, you can successfully validate ZIP codes and phone numbers in your data. This ensures that the data adheres to predefined format standards, enhancing data quality and accuracy.