Working with JSON
JSON is a lightweight, human-readable data format stored as text files with the .json extension. It can be validated and compared using rules in iceDQ. Users can access the JSON Files using SQL through the configured JSON Connector.
Understanding JSON Structure
JSON data is structured as key-value pairs, where each key is a string and each value can be any of the following data types: String, Number, Boolean, Array, Object
{
"Organization": "Acme Inc",
"OrganizationID": 100,
"isActive": true,
"Location": ["US", "IND", "AUS"],
"Customers": [
{
"CustomerID": 11012,
"Name": "Lauren Walker",
"BirthDate": "1972-01-18"
},
{
"CustomerID": 11013,
"Name": "Ian Jenkins",
"BirthDate": "1972-08-06"
}
]
}
Below is the breakdown of the above json.
- Organization (String), OrganizationID (Number), isActive (Boolean) are key, value pairs
- Location is an array of values (String)
- Customers is an array of objects (Customer)
{"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1972-01-18"}
is a Customer object
Understanding JSONPath
JSONPath is a query language for JSON, similar to XPath for XML. It allows you to select and extract data from a JSON document.
JSONPath expressions are used to traverse the path to an element in the JSON structure. It can be written in either dot notation or bracket notation. Dot notation is used to traverse the JSON structure by following the path to the element you want to extract. Bracket notation is used to extract elements from arrays.
Since a JSON structure is usually anonymous and doesn't necessarily have a "root member object" JSONPath assumes the abstract name $ assigned to the outer level object.
Following table shows different expressions and their output for above JSON example.
Expression | Output | Note |
---|---|---|
$ | Complete JSON document is return | |
$.Organization | "Acme Inc" | Organization name is extracted from JSON |
$.Location[0] | "US" | Returns zero index of the Location array |
$.Customers[1] | { "CustomerID": 11013, "Name": "Ian Jenkins", "BirthDate": "1972-08-06" } | Returns customer object on first index in Customers array. |
Below are reference links to learn more about JSONPath.
Additional JSON Formats
Below are two additional formats which the connector can read from a JSON file.
JSON Lines
In this format each object is stored as a new line in the JSON file. The objects are not wrapped in an array.
{"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1989-01-18"}
{"CustomerID":11013,"Name":"Ian Jenkins","BirthDate":"2000-08-06"}
JSON Rows
In this format the keys are stored in a separate array and data is stored in a separate array. Sequence of data values matches that of the sequence of keys.
{
"report": {
"dimensions": [
"CustomerID",
"Name",
"BirthDate"
],
"data": [
[
[
11012,
"Lauren Walker",
"1989-01-18"
],
[
11013,
"Ian Jenkins",
"2000-08-06"
]
]
]
}
}
Parsing JSON Data
For this example following test data customer-order.json has been used.
This connector can convert object arrays and nested object arrays into a single table using an implicit join. It can automatically identify all object arrays in the JSON payload by parsing JSON paths, or users can specify which object arrays to convert by defining their own JSON paths.
When we run the following SQL against the customer-order.json
file, the connector will:
- Flatten and implicitly join the
customers
object array and its nestedaddress
object array. - Flatten and implicitly join the
orders
object array and its nestedproducts
object array. - Cross join the results of steps 1 and 2 to generate the final table.
Example
SELECT id, name, zip, coordinates, customer_id, order_id, product_name, [metrics.price]
FROM customer_order
WHERE zip IN (12345, 94105) OR order_id in (1, 3)
Output
id | name | zip | coordinates | customer_id | order_id | product_name | metrics.price |
---|---|---|---|---|---|---|---|
1 | Ian Walker | 12345 | [-98.808,48.11] | null | null | null | null |
2 | Jenny Blob | 94105 | null | null | null | null | null |
null | null | null | null | 1 | 1 | Shirt | 50 |
null | null | null | null | 1 | 1 | Shoes | 50 |
null | null | null | null | 2 | 3 | Dress | 150 |
To understand how the connector behaves, you need to understand some additional key properties.
RowScanDepth
It determines how many object arrays the connector scans to find rows and columns. It follows nested objects, counting each object array as a row. The default value is 100.
FlattenObjects
This key property determines the columns of the output table by flattening nested objects into a series of columns when set to TRUE (the default).
Let's consider following document.
{
"id": 1,
"name": "Ian Walker",
"email": "[email protected]",
"address": { "state": "CA", "zip": "12345"},
"coordinates": [-98.808, 48.11]
}
When this property is enabled, the output table will have the following structure:
COLUMN | DATATYPE |
---|---|
id | integer |
name | string |
string | |
address.state | string |
address.zip | string |
coordinates | string |
and the data will look like this.
id | name | address.state | address.zip | coordinates | |
---|---|---|---|---|---|
1 | Ian Walker | [email protected] | CA | 12345 | [-98.808, 48.11] |
FlattenArrays
It is another key property that can be used to flatten array values into columns of their own. However, It is best to leave other unbounded arrays as they are and piece out the data for them as needed using JSON Functions.
Let's consider following document
{"coordinates": [-98.808, 48.11]}
When this property value is set to 2, the output table will have the following structure:
COLUMN | DATATYPE |
---|---|
coordinates.0 | float |
coordinates.1 | float |
Other Properties
- FlattenRowLimit: The maximum number of rows that can result from a single flattened element. Default is 250000.
- QualifyColumns: Controls whether the provider will use relative column names. Possible values None (default), Parent, Full
Understanding Settings (Source/ Target)
JSON file connectors have specific configurations. Please read the following properties carefully to understand the behaviour of the connector.
Single File
This property specifies whether the driver should read data from a single JSON file or a folder of JSON files. If this property is enabled, the driver will read data from a single JSON file with the .json
extension. If this property is disabled, the driver will read data from all the .json
files in the folder and union them together. The structure of all the JSON files in the folder must be exactly the same.
Parameterised Path This property allows users to parameterize the file path and name using the parameter key. For this to work, the rule must have a default parameter file associated with it.
JSON Path Users can explicitly provide a JSON Path for the connector to identify the correct structure (nodes and attributes) in the JSON data. By default, the connector will identify the JSON Path by reading the first 200 records.
Additional Properties
The connector has some default properties that users can modify or add to, which changes how the connector processes JSON data.
Property | Value | Note |
---|---|---|
Table Name | It is auto generated which the user can modify. | |
JSON Format | JSON (Default), JSONRows, JSONLines | Input a value based on the JSON File Format. |
DataModel | FlattenedDocuments (Default), Documents | |
FlattenObject | True (Default), False | View description |
FlattenArrays | Any numeric value. 0 is default. | View description |
QualifyColumns | None (Default), Parent, Full | Qualify column names based on nested level. |
FlattenRowLimit | Any numeric value. 250000 is default. | Limits the numbers of output rows. |
RowScanDepth | Any numeric value. 100 is default. | Rows (objects) to scan when dynamically determining the schema. |
Schema
Schema defines the structure of a JSON file by identifying its nodes, attributes, and data types. Schemas are stored as separate objects in a repository, each with a unique schema ID.
Users must click Get Schema
and Save Schema
whenever they add or change a property. Schema and JSON file together allow SQL to generate accurate output.
How To: Read Standard JSON
This video shows you how to read data from the customer-order.json JSON file.
How To: Read JSONLines
This video shows you how to read data from this customer-ldl.json Line Delimited JSON file or also known as JSONL.
JSON Functions
The connector can return JSON data as column values. You can utilize the following JSON functions to extract the desired output. The examples below utilize the following JSON structure
[
{"id": 1, "name": "Ian Walker", "email": "[email protected]"},
{"id": 2, "name": "Jenny Blob", "email": "[email protected]"},
{"id": 3, "name": "Kelly Price", "email": "[email protected]"},
{"id": 4, "name": "Raj Mehra", "email": "[email protected]"}
]
JSON_EXTRACT
Use this function to extract individual values from an JSON object using JSON Path dot notation.
SELECT JSON_EXTRACT(name,'[0].name') as customer_name, JSON_EXTRACT(email, '[0].email') as customer_email FROM customer
Output will be
customer_name | customer_email |
---|---|
Ian Walker | [email protected] |
JSON_COUNT
Use this function to get the count of the array within a JSON object.
SELECT JSON_COUNT(id,'[x]') as no_of_customers FROM customer
Output will be
no_of_customers |
---|
4 |
JSON_MIX / JSON_MAX
Use these functions to get lowest or highest numeric value of an array.
SELECT JSON_MIN(id,'[x].id') as lowest_id, JSON_MAX(id,'[x].id') as highest_id FROM customer
Output will be
lowest_id | highest_id |
---|---|
1 | 4 |