Skip to main content

Working with JSON


JSON is a lightweight, human-readable data format stored as text files with the .json extension. It can be validated and compared using rules in iceDQ. Users can access the JSON Files using SQL through the configured JSON Connector.

Understanding JSON Structure

JSON data is structured as key-value pairs, where each key is a string and each value can be any of the following data types: String, Number, Boolean, Array, Object

{
"Organization": "Acme Inc",
"OrganizationID": 100,
"isActive": true,
"Location": ["US", "IND", "AUS"],
"Customers": [
{
"CustomerID": 11012,
"Name": "Lauren Walker",
"BirthDate": "1972-01-18"
},
{
"CustomerID": 11013,
"Name": "Ian Jenkins",
"BirthDate": "1972-08-06"
}
]
}

Below is the breakdown of the above json.

  • Organization (String), OrganizationID (Number), isActive (Boolean) are key, value pairs
  • Location is an array of values (String)
  • Customers is an array of objects (Customer)
  • {"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1972-01-18"} is a Customer object

Understanding JSONPath

JSONPath is a query language for JSON, similar to XPath for XML. It allows you to select and extract data from a JSON document.

JSONPath expressions are used to traverse the path to an element in the JSON structure. It can be written in either dot notation or bracket notation. Dot notation is used to traverse the JSON structure by following the path to the element you want to extract. Bracket notation is used to extract elements from arrays.

Since a JSON structure is usually anonymous and doesn't necessarily have a "root member object" JSONPath assumes the abstract name $ assigned to the outer level object.

Following table shows different expressions and their output for above JSON example.

ExpressionOutputNote
$Complete JSON document is return
$.Organization"Acme Inc"Organization name is extracted from JSON
$.Location[0]"US"Returns zero index of the Location array
$.Customers[1]{ "CustomerID": 11013, "Name": "Ian Jenkins", "BirthDate": "1972-08-06" }Returns customer object on first index in Customers array.

Below are reference links to learn more about JSONPath.


Additional JSON Formats

Below are two additional formats which the connector can read from a JSON file.

JSON Lines

In this format each object is stored as a new line in the JSON file. The objects are not wrapped in an array.

{"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1989-01-18"}
{"CustomerID":11013,"Name":"Ian Jenkins","BirthDate":"2000-08-06"}

JSON Rows

In this format the keys are stored in a separate array and data is stored in a separate array. Sequence of data values matches that of the sequence of keys.

{
"report": {
"dimensions": [
"CustomerID",
"Name",
"BirthDate"
],
"data": [
[
[
11012,
"Lauren Walker",
"1989-01-18"
],
[
11013,
"Ian Jenkins",
"2000-08-06"
]
]
]
}
}

Parsing JSON Data

For this example following test data customer-order.json has been used.

This connector can convert object arrays and nested object arrays into a single table using an implicit join. It can automatically identify all object arrays in the JSON payload by parsing JSON paths, or users can specify which object arrays to convert by defining their own JSON paths.

When we run the following SQL against the customer-order.json file, the connector will:

  1. Flatten and implicitly join the customers object array and its nested address object array.
  2. Flatten and implicitly join the orders object array and its nested products object array.
  3. Cross join the results of steps 1 and 2 to generate the final table.

Example

SELECT id, name, zip, coordinates, customer_id, order_id, product_name, [metrics.price] 
FROM customer_order
WHERE zip IN (12345, 94105) OR order_id in (1, 3)

Output

idnamezipcoordinatescustomer_idorder_idproduct_namemetrics.price
1Ian Walker12345[-98.808,48.11]nullnullnullnull
2Jenny Blob94105nullnullnullnullnull
nullnullnullnull11Shirt50
nullnullnullnull11Shoes50
nullnullnullnull23Dress150

To understand how the connector behaves, you need to understand some additional key properties.

RowScanDepth

It determines how many object arrays the connector scans to find rows and columns. It follows nested objects, counting each object array as a row. The default value is 100.

FlattenObjects

This key property determines the columns of the output table by flattening nested objects into a series of columns when set to TRUE (the default).

Let's consider following document.

{
"id": 1,
"name": "Ian Walker",
"email": "[email protected]",
"address": { "state": "CA", "zip": "12345"},
"coordinates": [-98.808, 48.11]
}

When this property is enabled, the output table will have the following structure:

COLUMNDATATYPE
idinteger
namestring
emailstring
address.statestring
address.zipstring
coordinatesstring

and the data will look like this.

idnameemailaddress.stateaddress.zipcoordinates
1Ian Walker[email protected]CA12345[-98.808, 48.11]

FlattenArrays

It is another key property that can be used to flatten array values into columns of their own. However, It is best to leave other unbounded arrays as they are and piece out the data for them as needed using JSON Functions.

Let's consider following document

{"coordinates": [-98.808, 48.11]}

When this property value is set to 2, the output table will have the following structure:

COLUMNDATATYPE
coordinates.0float
coordinates.1float

Other Properties

  • FlattenRowLimit: The maximum number of rows that can result from a single flattened element. Default is 250000.
  • QualifyColumns: Controls whether the provider will use relative column names. Possible values None (default), Parent, Full

Understanding Settings (Source/ Target)

JSON file connectors have specific configurations. Please read the following properties carefully to understand the behaviour of the connector.

Single File This property specifies whether the driver should read data from a single JSON file or a folder of JSON files. If this property is enabled, the driver will read data from a single JSON file with the .json extension. If this property is disabled, the driver will read data from all the .json files in the folder and union them together. The structure of all the JSON files in the folder must be exactly the same.

Parameterised Path This property allows users to parameterize the file path and name using the parameter key. For this to work, the rule must have a default parameter file associated with it.

JSON Path Users can explicitly provide a JSON Path for the connector to identify the correct structure (nodes and attributes) in the JSON data. By default, the connector will identify the JSON Path by reading the first 200 records.

Additional Properties

The connector has some default properties that users can modify or add to, which changes how the connector processes JSON data.

PropertyValueNote
Table NameIt is auto generated which the user can modify.
JSON FormatJSON (Default), JSONRows, JSONLinesInput a value based on the JSON File Format.
DataModelFlattenedDocuments (Default), Documents
FlattenObjectTrue (Default), FalseView description
FlattenArraysAny numeric value. 0 is default.View description
QualifyColumnsNone (Default), Parent, FullQualify column names based on nested level.
FlattenRowLimitAny numeric value. 250000 is default.Limits the numbers of output rows.
RowScanDepthAny numeric value. 100 is default.Rows (objects) to scan when dynamically determining the schema.

Schema

Schema defines the structure of a JSON file by identifying its nodes, attributes, and data types. Schemas are stored as separate objects in a repository, each with a unique schema ID.

Users must click Get Schema and Save Schema whenever they add or change a property. Schema and JSON file together allow SQL to generate accurate output.


How To: Read Standard JSON

This video shows you how to read data from the customer-order.json JSON file.


How To: Read JSONLines

This video shows you how to read data from this customer-ldl.json Line Delimited JSON file or also known as JSONL.


JSON Functions

The connector can return JSON data as column values. You can utilize the following JSON functions to extract the desired output. The examples below utilize the following JSON structure

[
{"id": 1, "name": "Ian Walker", "email": "[email protected]"},
{"id": 2, "name": "Jenny Blob", "email": "[email protected]"},
{"id": 3, "name": "Kelly Price", "email": "[email protected]"},
{"id": 4, "name": "Raj Mehra", "email": "[email protected]"}
]

JSON_EXTRACT

Use this function to extract individual values from an JSON object using JSON Path dot notation.

SELECT JSON_EXTRACT(name,'[0].name') as customer_name, JSON_EXTRACT(email, '[0].email') as customer_email FROM customer

Output will be

customer_namecustomer_email
Ian Walker[email protected]

JSON_COUNT

Use this function to get the count of the array within a JSON object.

SELECT JSON_COUNT(id,'[x]') as no_of_customers FROM customer

Output will be

no_of_customers
4

JSON_MIX / JSON_MAX

Use these functions to get lowest or highest numeric value of an array.

SELECT JSON_MIN(id,'[x].id') as lowest_id, JSON_MAX(id,'[x].id') as highest_id FROM customer

Output will be

lowest_idhighest_id
14