Skip to main content

How to Perform Validations for Data Patterns in File?


In this use case, we will focus on validating data in an incoming source file, ensuring that it adheres to standard predefined patterns. Our goal is to validate various data columns such as names, emails, phone numbers, dates of birth, credit card numbers, state codes, zip codes, and modified dates against their expected formats. Let's explore the step-by-step process of achieving this.

Steps

Here are the steps outlined in the video:

Create Rule

  1. Start by creating a new validation rule. Provide an appropriate name for the rule.

Set Up Source Data Connection

  1. Configure a source data connection with the connection type as "File." Select "Flat File Native" as the connection, and choose the file "customers.csv." Preview the data, excluding the first row containing headers.

Define Data Columns

  1. Examine the schema and data of the source file, which includes columns such as names, email addresses, phone numbers, dates of birth, credit card numbers, addresses, and more.

Create Individual Checks

  1. Create individual checks for each column to verify whether the data conforms to the expected format. For example, for the SSN (Social Security Number) column, use a predefined pattern that ensures the SSN does not start with "000" or "666," follows a specific format, and has a valid structure. Repeat this process for other columns, such as email addresses, phone numbers, date of birth, credit card numbers, state codes, zip codes, and modified dates.

Add Source Columns

  1. Include the "Customer ID" column from the source data. This addition helps identify specific customer records that do not pass the validation checks.

Publish & Run

  1. Review the list of checks that will be performed on the source data file. Verify that the columns adhere to the expected predefined formats. Publish the rule and execute it.

Review Results

  1. After execution, review the results. In the example provided, out of 100 records in the file, 51 records have issues. Investigate further by clicking on the instance ID. For instance, verify failures for the SSN column, checking for issues such as SSNs starting with "9" or "00" and ensuring the correct length of 9 digits.

  2. Continue checking failures for other columns, including email addresses, phone numbers, date of birth, credit card numbers, state codes, zip codes, and modified dates.

Video: How to Perform Validations for Data Patterns in File?

Conclusion

By following this use case, you have learned how to validate data in an incoming source file using predefined patterns. Ensuring that data conforms to standard formats enhances data quality and reliability. Identifying and addressing validation failures is essential for data accuracy in your projects.