Query Delimited File
Users can access delimited flat files using SQL through the configured Flat File SQL Connector. To access the specific customer data mentioned in this article, download the following flat files customer_12182023.txt , customer_12202023.txt.
Currently supported extensions are .txt
, .csv
, .dat
. In future release .tar
, .tar.gz
will be supported.
Understanding Settings (Source/ Target)
Flat File SQL connector has specific configurations. Please read the following properties carefully to understand the behaviour of the connector.
Single File
This property specifies whether the driver should read data from a single flat file or a folder of flat files. If this property is enabled, the driver will read data from a single flat file with supported extensions. If this property is disabled, the driver will read data from all the flat files in the folder and union them together. The structure of all the flat files in the folder must be exactly the same.
Parameterised Path
This property allows users to parameterize the file path and name using the parameter key. For this to work, the rule must have a default parameter file associated with it.
Column Delimiter
Set the character used to divide columns in the file with this property. The default is a comma (,).
Row Delimiter
This property sets the row delimiter, which separates rows in the file. It defaults to a newline character (\n).
Text Qualifier
Specify the special character user to enclose the fields that might contain the delimiter character itself. This prevents misinterpretation of those fields during data parsing. See the following example.
- Without qualifier:
John Doe,New York,NY,USA
(incorrectly parsed if a city name contains a comma) - With qualifier:
"John Doe","New York, NY",USA
(correctly parsed as three fields)
Skip Row From Top
To bypass unnecessary content at the file's start, set the number of rows to SKIP using this property. The default value is zero (0).
Fixed String as NULL
Specify the TEXT/ STRING you would like the connector to process as a NULL value. For example, "NA" string in the file should be treated as NULL.
Trim Data
Enable this property if you want the connector to TRIM leading and trailing spaces when reading the file. By default it is enabled.
Header Row
Enable this property if column names should be picked from the Header Row. This is dependent on Skip Row From Top
property. See the example below.
- If
Skip Row From Top
is0
then connector will consider the first row as Header Row. - If
Skip Row From Top
is1
then connector will consider the second row as Header Row.
Detect Datatype
Enable this property if you want the driver to identify datatype of all the columns by reading first 100 records. This might change how the value is displayed or read for DATE, DATETIME and BOOLEAN datatype.
Additional Properties
The connector has some default properties that users can modify or add to, which changes how the connector processes flat file data.
Property | Value | Note |
---|---|---|
Table Name | It is auto generated which the user can modify. |
Schema
Schema defines the structure of a flat file by identifying its columns and data types. Schemas are stored as separate objects in a repository, each with a unique schema ID.
Users must click Get Schema
and Save Schema
whenever they add or change a property. Schema and Flat file together allow SQL to generate accurate output.
How To: Read Delimited File
This video shows how to read data from a single delimited flat file.
How To: Parameterize File, Read Multiple Files
This video shows how to parameterize filenames and paths for versatile file handling, and for reading data from multiple identically structured files stored within a folder.
SQL Functions
STRING
List of all the String functions.
CONCAT(str_1, str_2 [, str_n])
Returns concatenated string.
SELECT CONCAT('Hello, ', 'world!');
Output: 'Hello, world!'
MATH
List of all the Numeric functions.
ABS (numeric)
Returns the absolute (positive) value of the specified numeric expression.
SELECT ABS(-15);
Output: 15
DATE
CURRENT_DATE()
Returns the current date value.
SELECT CURRENT_DATE();
Output: 2018-02-01