Working with JSON

JSON is a lightweight, human-readable data format stored as text files with the .json extension. It can be validated and compared using rules in iceDQ. Users can access the JSON Files using SQL through the configured JSON Connector.

Understanding JSON Structure

JSON data is structured as key-value pairs, where each key is a string and each value can be any of the following data types: String, Number, Boolean, Array, Object

{
  "Organization": "Acme Inc",
  "OrganizationID": 100,
  "isActive": true,
  "Location": ["US", "IND", "AUS"],
  "Customers": [
    {
      "CustomerID": 11012,
      "Name": "Lauren Walker",
      "BirthDate": "1972-01-18"
    },
    {
      "CustomerID": 11013,
      "Name": "Ian Jenkins",
      "BirthDate": "1972-08-06"
    }
  ]
}

Below is the breakdown of the above json.

Organization (String), OrganizationID (Number), isActive (Boolean) are key, value pairs
Location is an array of values (String)
Customers is an array of objects (Customer)
{"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1972-01-18"} is a Customer object

Understanding JSONPath

JSONPath is a query language for JSON, similar to XPath for XML. It allows you to select and extract data from a JSON document.

JSONPath expressions are used to traverse the path to an element in the JSON structure. It can be written in either dot notation or bracket notation. Dot notation is used to traverse the JSON structure by following the path to the element you want to extract. Bracket notation is used to extract elements from arrays.

Since a JSON structure is usually anonymous and doesn't necessarily have a "root member object" JSONPath assumes the abstract name $ assigned to the outer level object.

Following table shows different expressions and their output for above JSON example.

Expression	Output	Note
$	Complete JSON document is return
$.Organization	"Acme Inc"	Organization name is extracted from JSON
$.Location[0]	"US"	Returns zero index of the Location array
$.Customers[1]	`{ "CustomerID": 11013, "Name": "Ian Jenkins", "BirthDate": "1972-08-06" }`	Returns customer object on first index in Customers array.

Below are reference links to learn more about JSONPath.

Additional JSON Formats

Below are two additional formats which the connector can read from a JSON file.

JSON Lines

In this format each object is stored as a new line in the JSON file. The objects are not wrapped in an array.

{"CustomerID":11012,"Name":"Lauren Walker","BirthDate":"1989-01-18"}
{"CustomerID":11013,"Name":"Ian Jenkins","BirthDate":"2000-08-06"}

JSON Rows

In this format the keys are stored in a separate array and data is stored in a separate array. Sequence of data values matches that of the sequence of keys.

{
  "report": {
    "dimensions": [
      "CustomerID",
      "Name",
      "BirthDate"
    ],
    "data": [
      [
        [
          11012,
          "Lauren Walker",
          "1989-01-18"
        ],
        [
          11013,
          "Ian Jenkins",
          "2000-08-06"
        ]
      ]
    ]
  }
}

Parsing JSON Data

For this example following test data customer-order.json has been used.

This connector can convert object arrays and nested object arrays into a single table using an implicit join. It can automatically identify all object arrays in the JSON payload by parsing JSON paths, or users can specify which object arrays to convert by defining their own JSON paths.

When we run the following SQL against the customer-order.json file, the connector will:

Flatten and implicitly join the customers object array and its nested address object array.
Flatten and implicitly join the orders object array and its nested products object array.
Cross join the results of steps 1 and 2 to generate the final table.

Example

SELECT id, name, zip, coordinates, customer_id, order_id, product_name, [metrics.price] 
    FROM customer_order 
        WHERE zip IN (12345, 94105) OR order_id in (1, 3)

Output

id	name	zip	coordinates	customer_id	order_id	product_name	metrics.price
1	Ian Walker	12345	[-98.808,48.11]	null	null	null	null
2	Jenny Blob	94105	null	null	null	null	null
null	null	null	null	1	1	Shirt	50
null	null	null	null	1	1	Shoes	50
null	null	null	null	2	3	Dress	150

To understand how the connector behaves, you need to understand some additional key properties.

RowScanDepth

It determines how many object arrays the connector scans to find rows and columns. It follows nested objects, counting each object array as a row. The default value is 100.

FlattenObjects

This key property determines the columns of the output table by flattening nested objects into a series of columns when set to TRUE (the default).

Let's consider following document.

{
  "id": 1,
  "name": "Ian Walker",
  "email": "[email protected]",
  "address": { "state": "CA", "zip": "12345"},
  "coordinates": [-98.808, 48.11]
}

When this property is enabled, the output table will have the following structure:

COLUMN	DATATYPE
id	integer
name	string
email	string
address.state	string
address.zip	string
coordinates	string

and the data will look like this.

id	name	email	address.state	address.zip	coordinates
1	Ian Walker	[email protected]	CA	12345	[-98.808, 48.11]

FlattenArrays

It is another key property that can be used to flatten array values into columns of their own. However, It is best to leave other unbounded arrays as they are and piece out the data for them as needed using JSON Functions.

Let's consider following document

{"coordinates": [-98.808, 48.11]}

When this property value is set to 2, the output table will have the following structure:

COLUMN	DATATYPE
coordinates.0	float
coordinates.1	float

Other Properties

FlattenRowLimit: The maximum number of rows that can result from a single flattened element. Default is 250000.
QualifyColumns: Controls whether the provider will use relative column names. Possible values None (default), Parent, Full

Understanding Settings (Source/ Target)

JSON file connectors have specific configurations. Please read the following properties carefully to understand the behaviour of the connector.

Single File This property specifies whether the driver should read data from a single JSON file or a folder of JSON files. If this property is enabled, the driver will read data from a single JSON file with the .json extension. If this property is disabled, the driver will read data from all the .json files in the folder and union them together. The structure of all the JSON files in the folder must be exactly the same.

Parameterised Path This property allows users to parameterize the file path and name using the parameter key. For this to work, the rule must have a default parameter file associated with it.

JSON Path Users can explicitly provide a JSON Path for the connector to identify the correct structure (nodes and attributes) in the JSON data. By default, the connector will identify the JSON Path by reading the first 200 records.

Additional Properties

The connector has some default properties that users can modify or add to, which changes how the connector processes JSON data.

Property	Value	Note
Table Name		It is auto generated which the user can modify.
JSON Format	JSON (Default), JSONRows, JSONLines	Input a value based on the JSON File Format.
DataModel	FlattenedDocuments (Default), Documents
FlattenObject	True (Default), False	View description
FlattenArrays	Any numeric value. 0 is default.	View description
QualifyColumns	None (Default), Parent, Full	Qualify column names based on nested level.
FlattenRowLimit	Any numeric value. 250000 is default.	Limits the numbers of output rows.
RowScanDepth	Any numeric value. 100 is default.	Rows (objects) to scan when dynamically determining the schema.

Schema

Schema defines the structure of a JSON file by identifying its nodes, attributes, and data types. Schemas are stored as separate objects in a repository, each with a unique schema ID.

Users must click Get Schema and Save Schema whenever they add or change a property. Schema and JSON file together allow SQL to generate accurate output.

How To: Read Standard JSON

This video shows you how to read data from the customer-order.json JSON file.

How To: Read JSONLines

This video shows you how to read data from this customer-ldl.json Line Delimited JSON file or also known as JSONL.

JSON Functions

The connector can return JSON data as column values. You can utilize the following JSON functions to extract the desired output. The examples below utilize the following JSON structure

[
{"id": 1, "name": "Ian Walker", "email": "[email protected]"},
{"id": 2, "name": "Jenny Blob", "email": "[email protected]"},
{"id": 3, "name": "Kelly Price", "email": "[email protected]"},
{"id": 4, "name": "Raj Mehra", "email": "[email protected]"}
]

JSON_EXTRACT

Use this function to extract individual values from an JSON object using JSON Path dot notation.

SELECT JSON_EXTRACT(name,'[0].name') as customer_name, JSON_EXTRACT(email, '[0].email') as customer_email FROM customer

Output will be

customer_name	customer_email
Ian Walker	[email protected]

JSON_COUNT

Use this function to get the count of the array within a JSON object.

SELECT JSON_COUNT(id,'[x]') as no_of_customers FROM customer

Output will be

no_of_customers
4

JSON_MIX / JSON_MAX

Use these functions to get lowest or highest numeric value of an array.

SELECT JSON_MIN(id,'[x].id') as lowest_id, JSON_MAX(id,'[x].id') as highest_id FROM customer

Output will be

lowest_id	highest_id
1	4

Understanding JSON Structure​

Understanding JSONPath​

Additional JSON Formats​

JSON Lines​

JSON Rows​

Parsing JSON Data​

Example​

Output​

RowScanDepth​

FlattenObjects​

FlattenArrays​

Other Properties​

Understanding Settings (Source/ Target)​

Additional Properties​

Schema​

How To: Read Standard JSON​

How To: Read JSONLines​

JSON Functions​

JSON_EXTRACT​

JSON_COUNT​

JSON_MIX / JSON_MAX​

Understanding JSON Structure

Understanding JSONPath

Additional JSON Formats

JSON Lines

JSON Rows

Parsing JSON Data

Example

Output

RowScanDepth

FlattenObjects

FlattenArrays

Other Properties

Understanding Settings (Source/ Target)

Additional Properties

Schema

How To: Read Standard JSON

How To: Read JSONLines

JSON Functions

JSON_EXTRACT

JSON_COUNT

JSON_MIX / JSON_MAX