Parse XML to JSON directive

The PARSE-XML-TO-JSON directive parses an XML document into a JSON structure. The directive operates on an input column of type string. Application of this directive transforms the XML into a JSON document, simplifying further parsing using the PARSE-AS-JSON directive.

Syntax

parse-xml-to-json :column [depth] :keepStrings [boolean]
  • column is the name of the column in the record that is an XML document.

  • depth indicates the depth at which the XML document parsing should terminate processing.

  • keepStrings OPTIONAL boolean value that if true, then values will not be coerced into boolean or numeric values and will instead be left as strings. The default value is false.

Note: keepStrings config was introduced in CDAP 6.10.1.

Usage Notes

The PARSE-XML-TO-JSON directive efficiently parses an XML document and presents it as a JSON object for further transformation.

The XML document contains elements, attributes, and content text. A sequence of similar elements is turned into a JSON array, which can then be further parsed using the PARSE-AS-JSON directive.

During parsing, comments, prologs, DTDs, and <[[ ]]> notations are ignored.

Created in 2020 by Google Inc.