Read Delimited File
Users can read data from delimited files using our Flat File Native Connector. To access the specific customer data mentioned in this article, download the following flat files customer_12182023.txt , customer_12202023.txt.
Currently supported extensions are .txt
, .csv
, .dat
. In future release .tar
, .tar.gz
will be supported.
Understanding Settings (Source/ Target)
Flat File Native connector has specific configurations. Please read the following properties carefully to understand the behaviour of the connector.
Property | Description | Note |
---|---|---|
File Type | Specify the file format: delimited or fixed width. | |
Single File | Enable, if reading data from one file. | If disabled, the driver reads and unifies all flat files in the folder, requiring identical structure. |
Parameterised Path | Allows users to parameterize file path and name using parameter key | Rule must have a default parameter file. |
Column Delimiter | String used to separate the columns. | Default is comma (,). Only applicable for Delimited files. |
Row Delimiter | String used to separate the rows. | Default is newline character (\n) |
Text Qualifier | Character used to enclose the column names and it's data. | Ensures accurate field interpretation by handling default delimiters in delimited files. |
Skip Row From Top | Number of rows to SKIP before reading the data. | Default value is zero (0) |
Header Row | Row number where the header row exists. | Default value is zero (0) |
Skip Row From Bottom | Number of rows to SKIP from reading from bottom. | Default value is zero (0) |
Trim Spaces | If enabled, TRIMS the leading and trailing spaces. | |
Read Empty Value as NULL | If enabled, treats an empty value as NULL | |
Read White Space as NULL | If enabled, treats a white space as NULL | |
Treat as NULL value | Specify the fixed string that should be treated as NULL | For example, specify "NA", if you want the string in the file should be treated as NULL. |
Filter Condition
To filter data within the flat file before validation or comparison, provide a Groovy expression that evaluates to either TRUE or FALSE using the file's columns.
Records for which the expression evaluates to TRUE
will be included, while those with FALSE
will be retained.
For instance, to read only records with "USA" in the country column, use the expression S.[country] == 'USA'.
Generate Schema/Refresh Schema
When setting up the file for the first time always click on Generate Schema
for the connector to read the data from file based on the specific configurations. You must also click Refresh Schema
whenever they add or change a property.
Advance Schema
Users can change the default column names and datatypes identified by the connector in the Advance tab
- Update the column name, datatype and format attributes in the JSON
- Click
Preview Data
to read the data with applied changes
Do not click Refresh Schema
, as it will revert manual schema modifications.
Below is a sample JSON structure for defining column sequence, name, datatype, and format within a "columns" array.
{
"columns": [
{
"index": 1,
"name": "customerid",
"datatype": "numeric"
},
{
"index": 2,
"name": "acctnumber",
"datatype": "Text"
},
{
"index": 3,
"name": "birthdate",
"datatype": "date",
"format": "MM/dd/yy"
}
]
}
How To: Read from a Delimited File
Following video shows how to read a delimited flat file using the native file connector (no sql).
How To: Parameterize Filename, Read from Multiple Files
Following video shows parameterize file name or read from multiple files in a folder using the native file connector (no sql).
Considerations
- Following datatypes are supported: text, numeric, date, time, timestamp and boolean.
- Structure of all the files in a folder should be same when reading multiple files at once.
- You can only use one parameterization method at a time: either specify a key or use a wildcard.