Skip to main content

Parquet


Parquet is a columnar storage file format designed for big data workloads. The Parquet connector enables efficient reading and writing of Parquet files, improving performance and storage efficiency in analytical environments.


Prerequisites

  • Verify that the storage location or folder is accessible from the cluster.
  • Gather valid user credentials and ensure the user has appropriate permissions to read the files.
  • Ensure at least one file is present in the folder.

Connecting to Amazon S3

AWS EC2 Role Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *  Name that uniquely identifies the connection.Prod_file_conn
Driver *Driver that is used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Container *The name of the S3 bucket that contains necessary Parquet files.acme-prod-bucket
Folder Path *The path which exists within the bucket, that leads to the folder containing Parquet files./data/parquet-files/
Region *AWS region, where the S3 bucket is located.us-east-1
Type *The connection type for EC2 role authentication can only be a System Connection. Refer Connections for more details.System

AWS IAM Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *   A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Container *The name of the S3 bucket that contains the Parquet files.prod-bucket
Folder Path *The directory path within the bucket that points to the folder containing Parquet files./data/parquet-files/
Region *The AWS region where the S3 bucket is located.us-east-1
Type *The connection type – either System Connection or User Connection. Refer Connections for more details.System
Access Key *The key used to authenticate the connection to the storage account.zsdrhg456dfhsz8jhm,khujm56y3
Secret Key *The secret key used to authenticate the connection to the storage account.FXdzEJr////d81nSTEXAMPLETOKEN==

Connecting to Azure Blob Storage

Azure Access Key Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *   A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Container *The name of the storage container where the Parquet files are stored.sales-data-container
Folder Path *The directory path within the container that points to the folder containing Parquet files./data/parquet-files/
Type *The connection type – either System Connection or User Connection. Refer Connections for more details.System
Account Name *The storage account name used for authentication.salesstoreadmin
Access Key *The key used to authenticate the connection to the storage account.werghaqehglisfgeshzdfhuotyrg

Connecting to Azure Data Lake

Azure Access Key Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *   A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Container *The name of the storage container where the Parquet files are stored.sales-data-container
Folder Path *The directory path within the container that points to the folder containing Parquet files./data/parquet-files/
Type *The connection type – either System Connection or User Connection. Refer Connections for more details.System
Account Name *The storage account name used for authentication.salesstorageadmin
Access Key *The key used to authenticate the connection to the storage account.asgtd482tyujkmkiuyg456798xfbdzst

Connecting to FTP

Anonymous Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *   A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Host *The IP address or hostname of the FTP server.192.168.44.31
Port *The port on which the FTP server listens.21
Folder Path *The directory path on the server that points to the folder containing Parquet files./data/parquet-files/
Type *For anonymous authentication, the connection type must be configured as a System Connection. Refer Connections for more details.System

Username and Password Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *  A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Host *The IP address or hostname of the FTP server.192.168.44.31
Port *The port on which the FTP server listens.21
Folder Path *The directory path on the server that points to the folder containing Parquet files./data/parquet-files/
Type *The connection type – either System Connection or User Connection. Refer Connections for more details.System
Username *The FTP username with necessary privileges.app_service_account
Password *The password associated with the specified username.App$erv1ceP@ss2025

Connecting to Google Cloud Storage

Service Account Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *   A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Project ID *The unique identifier of the Google Cloud project where the storage bucket resides.my-gcp-project-123456
Container *The name of the storage container where the Parquet files are stored.parquet-data-container
Folder Path *The directory path within the bucket that points to the folder containing Parquet files.data/parquet-files/
Type *For service account authentication, the connection type must be configured as a System Connection. Refer Connections for more details.System

Connecting to Local Storage

Users can read files from local storage, which may refer to a server directory for uploaded files or a network-mounted directory accessible to the cluster. Use one or more properties from the table below to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Folder Path *The directory path that points to the folder containing Parquet files./data/parquet-files/
warning

The connector does not support authentication for accessing files from local storage.


Connecting to SFTP

Username and Password Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *  A unique name that identifies the connection.Prod_file_conn
Driver *The driver used to establish the connection. By default, one driver is available.Parquet Custom JDBC
Host *The IP address or hostname of the SFTP server.186.269.54.87
Port *The port on which the SFTP server listens.22
Folder Path *The directory path on the server that points to the folder containing Parquet files./data/parquet-files/
Type *The connection type – either System Connection or User Connection. Refer Connections for more details.System
Username *The SFTP username with necessary privileges.dev_ops_user
Password *The password associated with the specified username.D0v0ps#Secure2025!

Custom Properties

The following optional connection properties can be configured as needed:

PropertyDefault ValuePossible ValuesDescription
BatchSize0Numeric valueSpecifies the maximum number of rows included in each batch operation. Set to 0 to submit the entire batch as a single request.
MaxRow-1Numeric valueLimits the number of rows returned when no aggregation or GROUP BY is used in the query.
RowScanDepth100Numeric valueThe number of rows to scan when dynamically determining columns for the table.
Pagesize1000Numeric valueThe number of rows to return per page from the file.
Timeout60Numeric valueThe time in seconds until a timeout error is thrown, canceling the operation. Set 0 for unlimited time.

Supported Datatypes

The following data types are supported:

  • NUMBER
  • INT32
  • INT64
  • DECIMAL
  • FLOAT
  • DOUBLE
  • VARCHAR (BINARY UTF8)
  • BOOLEAN
  • DATE
  • TIME
  • TIMESTAMP
  • MAP
  • STRUCT
  • GROUP (LIST)

Unsupported Datatypes

The following data types are not supported:

  • TIMESTAMP_NANOS
  • TIME_NANOS
  • TIMESTAMP_TZ
  • DATETIME WITH TIMEZONES
  • LISTS (outside group-annotated arrays)
  • DICTS
  • SETS
  • BINARY or IMAGE data