Skip to main content

How to Validate Flat File with its Control File?


In this use case, we will demonstrate how to compare the data in our control file with the data in our actual file. The control file contains crucial information about the actual file, including record count, file size, delimiter used, and more. By comparing these two datasets, we can ensure their accuracy and identify any discrepancies. Let's dive in and create a checksum rule for this purpose.

Steps

Here are the steps outlined in the video:

Create Rule

  1. Start by creating a new checksum rule with an appropriate name for clarity.

Set Up Source Data Connection

  1. Configure a source data connection with the connection type as "File." Use "Flat File SQL" as the connection, and select "Delimited" as the file type. Choose the source file, "customer.csv." Preview the data, and assign a suitable table name, such as "customers." Click "Get Schema," save the schema, and refresh the data to view the total record count, which in this case is 100.

Set Up Target Data Connection

  1. Configure a target data connection with the same settings as the source: connection type "File," "Flat File SQL" connection, and "Delimited" file type. The target file, "customer_stats.csv," serves as a control file and contains information such as the expected record count. Preview the data, assign a table name (e.g., "Control File"), and click "Get Schema." Save the schema and refresh the data to observe the control file's record count, which should match the expected source file count (100).

Define Check

  1. Name the "record count" column in the control file appropriately and use an aggregate function like "SUM" to compare the control file's record count with the actual source file count.

Verify Checks

  1. Verify the auto-populated checks by comparing the record counts in the control file and the actual source file.

Publish & Execute

  1. Publish the rule and execute it.

Review Results

  1. After execution, review the results. In the example provided, the rule executes successfully with a "success" status, and both the source count and target count display 100.

Video: How to Validate Flat File with its Control File?

Conclusion

By following this use case, you have learned how to compare data in a control file with data in an actual file using iceDQ. This process helps ensure data accuracy and identify discrepancies between the two datasets. Validating data in this way enhances data quality and reliability in your projects.