Skip to main content

Impala


Apache Impala is a massively parallel processing (MPP) SQL query engine for Hadoop that enables low-latency, interactive analysis directly on data stored in HDFS and Apache Hive. It provides high performance for BI and analytics by avoiding MapReduce and using native execution.


Prerequisites

The following prerequisites must be met for a user to create and test a successful connection.

  • The Impala server must be accessible from the iceDQ server.
  • Valid credentials to access the database.
  • Impala JDBC version 42 or above.

Authentication Mechanisms

The following authentication mechanisms are supported.

  • Username & Password
  • Kerberos
  • Kerberos Ticket Cache

Connection Properties

Username and Password Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *       Name that uniquely identifies the connection.Impala_Prod_conn
Driver *Driver used to establish the connection. One driver is available by default.Cloudera Impala Native JDBC
Custom JDBC URLStandardized string used to define the connection details. Use this format supported by the driver: jdbc:Impala://[host]:[port]/[database]jdbc:impala://192.168.61.90:21050/dev_db
Use SSLSecure Sockets Layer (SSL) option enables encrypted communication from iceDQ to Impala. Refer the SSL section below for setup instructions.
Host *IP address or hostname of the Impala server.impala.cloudera.acme.com or 192.168.26.45
Port *Port on which the Impala Server listens. Default is 21050 for Impala.21050
Database *Name of the Impala database.dev_db
Type *The connection type – either System Connection or User Connection. Refer to Connections for more details.System or User
Username *Impala login username with necessary privileges.john_doe
Password *Password associated with the specified username.impala_password

Kerberos Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *          Name that uniquely identifies the connection.Impala_Dev_Conn
Driver *Driver used to establish the connection. By default, one driver is available.Cloudera Impala Native JDBC
Custom JDBC URLFull JDBC URL for the connection. Optional if Host, Port, and Database are provided separately. Example format jdbc:impala://[host]:[port]/[database];KrbHostFQDN=[host];sslTrustStore=${ssl_trust_store_path}; SSLTrustStorePwd=${trust_store_password}jdbc:impala://impala.acme.com:21050/dev_db;
Kerberos Config *Path to the Kerberos configuration file (typically krb5.conf).User needs to upload the config file.
Use SSLSecure Sockets Layer (SSL) option enables encrypted communication from iceDQ to Impala. Refer the SSL section below for setup instructions.
Service PrincipalThe unique Kerberos identity of the Impala service (usually in the format service/host@REALM). Used to request service tickets.impala/impala.[email protected]
Host *IP address or hostname of the Impala server.impala.acme.com
Port *Port on which the Impala daemon listens. Default is 21050.21050
Database *Name of the Impala database to connect to.dev_db
Type *The connection type – either System Connection or User Connection. Refer to Connections for more details.System or User
User Principal *Kerberos principal name of the user (e.g., user@REALM).[email protected]
User Keytab *Path to the keytab file that contains the encrypted credentials of the user principal. Used for authentication.User needs to upload the keytab file.

Kerberos Ticket Cache Authentication

Use the following properties to create a valid connection. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Connection Name *          Name that uniquely identifies the connection.Impala_Demo_Conn
Driver *Driver used to establish the connection. By default, one driver is available.Cloudera Impala Native JDBC
Custom JDBC URLFull JDBC URL for the connection. Optional if Host, Port, and Database are provided separately. Example format jdbc:impala://[host]:[port]/[database];KrbHostFQDN=[host];sslTrustStore=${ssl_trust_store_path}; SSLTrustStorePwd=${trust_store_password}jdbc:impala://impala.acme.com:21050/default;
Kerberos Config *Path to the Kerberos configuration file (typically krb5.conf).User needs to upload the config file.
Use SSLSecure Sockets Layer (SSL) option enables encrypted communication from iceDQ to Impala. Refer the SSL section below for setup instructions.
Service PrincipalThe unique Kerberos identity of the Impala service (usually in the format service/host@REALM). Used to request service tickets.impala/impala.[email protected]
Host *IP address or hostname of the Impala server.impala.acme.com
Port *Port on which the Impala daemon listens. Default is 21050.21050
Database *Name of the Impala database to connect to.default
Type *The connection type – either System Connection or User Connection. Refer to Connections for more details.System or User
User Principal *Kerberos principal name of the user (e.g., user@REALM).[email protected]
Ticket Cache File Name *Path to the Kerberos ticket cache.User needs to upload the keytab file.

SSL Configuration

The Secure Sockets Layer (SSL) option enables encrypted communication between iceDQ and the Hive server. iceDQ supports SSL through a Java Truststore. Use the following properties to configure SSL. Properties marked with an asterisk (*) are required.

NameDescriptionExample Values
Java TruststoreA valid .jks file (Java Keystore) containing trusted SSL certificates.User needs to upload the Truststore file.
Java Truststore PasswordPassword used to access the Java Truststore. Optional if the truststore does not require one.my_password

JAAS Properties

With respect to Kerberos authentication, JAAS properties refer to configuration settings used by the Java Authentication and Authorization Service (JAAS) to authenticate a user or application using Kerberos. For a comprehensive list of supported properties, refer to the JAAS Reference section of the Cloudera documentation.


Custom Properties

Custom properties are optional connection parameters in the Hive driver that allow customization of settings such as timeouts and proxy configurations. A list of supported properties is available here. The availability and behavior of custom connection properties may vary depending on the Hive JDBC driver version in use.


Supported Datatypes

The following datatypes are supported:

  • ARRAY
  • BIGINT
  • BINARY
  • BOOLEAN
  • CHAR
  • DATE
  • DECIMAL
  • DOUBLE
  • FLOAT
  • INT
  • MAP
  • SMALLINT
  • STRUCT
  • TIMESTAMP
  • TINYINT
  • VARCHAR