JSON Parser Transformation
Plugin version: 2.11.0
Parses an input JSON event into a record. The input JSON event could be either a map of string fields to values or it could be a complex nested JSON structure. The plugin allows you to express JSON paths for extracting fields from complex nested input JSON.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Input Field | No | Required. Input field to be parsed as JSON. |
Json Path Mappings | No | Optional. Maps complex JSON to output fields using JSON path expressions. First field defines the output field and the second field specifies the JSON path expression, such as '$.employee.name.first'. |
Output Schema | No | Required. The output schema for the data. |
Parsing Simple JSON
Simple JSON (which is defined as a mapping from key to value) parsing is achieved by specifying just the output schema fields. The field name in the output schema should be the same as the key in the input JSON. The type of the output field should also be the same type as the input value. No implicit conversions are performed on the JSON values.
Here is an example of an event that is to be mapped to the output schema:
{
"id" : 1000,
"first" : "Joltie",
"last" : "Neutrino",
"email_id" : "joltie.neutrino@gmail.com",
"address" : "666 Mars Street",
"city" : "Marshfield",
"state" : "MR",
"country" : "Marshyland",
"zip" : 34553423,
"planet" : "Earth"
}
The output schema should be specified as:
Field | Type | Null |
---|---|---|
id | int |
|
first | string |
|
last | string |
|
email_id | string |
|
address | string |
|
city | string |
|
state | string |
|
country | string |
|
zip | long |
|
Note: The field "planet" has not been included in the output schema, meaning that the field would be ignored and not processed when the JSON event is mapped.
Parsing Nested JSON
Parser also allows the extracting of fields from a complex nested JSON. In order to extract fields, it uses JSON paths similar to the XPath expressions for XML. To extract fields using an expression in JSON, this plugin uses the 'JsonPath' library. The plugin allows you to define a mapping from the output fieldname to the JSON path expression that is to be applied on input to extract the value from the JSON event. For example, if you have this nested JSON:
{
"employee" : {
"name" : {
"first" : "Joltie",
"last" : "Neutrino"
},
"email" : "joltie.neutrino@gmail.com",
"address" : {
"street1" : "666, Mars Street",
"street2" : "",
"apt" : "",
"city" : "Marshfield",
"state" : "MR",
"zip" : 34553423,
"country" : "Marshyland"
}
}
}
You could specify the mapping for extracting these fields from the input JSON event as:
first
last
email
street1
city
state
zip
country
The mappings in the plugin will be:
Output Field Name | Input JSON Path Expression |
---|---|
first | $.employee.name.first |
last | $.employee.name.last |
$.employee.email | |
street | $.employee.address.street1 |
city | $.employee.address.city |
state | $.employee.address.state |
zip | $.employee.address.zip |
country | $.employee.address.country |
Expression
The "root member object" for parsing any JSON is referred to as $
, regardless of whether it's an array or an object. It also uses either dot notation or bracket notation for defining the levels of parsing. For example: $.employee.name
or $['employee']['name']
.
Supported Operators
These operators are supported:
Operator | Description |
---|---|
$ | The root element of the query |
* | Wildcard |
.. | Deep scan |
.<name> | Dot notation representing child |
[?(<expression>)] | Filter expression, should be boolean result always |
Supported Functions
The functions perform aggregations at the tail end of the path. The functions take the output of the expression as input for performing aggregations. The aggregation function can be only be applied where the expression results in an array.
Function | Type | Description |
---|---|---|
min | double | Minimum value of array of numbers |
max | double | Maximum value of array of numbers |
avg | double | Average value of array of numbers |
stddev | double | Standard deviation value of array of numbers |
length | int | Length of the array |
Configuration
Configuration | Description |
---|---|
field | Specifies the input field that should be parsed as a CSV Record |
mapping | Mapping specifying output field name to input JSON path for extracting the field |
schema | Specifies the output schema for the JSON Record |
Created in 2020 by Google Inc.