Wrangler User Defined Directives

CDAP provides extensive support for user defined directives (UDDs) as a way to specify custom processing for wrangling. CDAP UDDs can currently be implemented in Java.

The most extensive support is provided for Java functions. Java functions are also more efficient because they are implemented in the same language as CDAP and Wrangler and because additional interfaces and integrations with other CDAP subsystems are supported.

User Defined Directives, also known as UDD, allow you to create custom functions to transform records within Wrangler. CDAP comes with a comprehensive library of functions. There are however some omissions, and some specific cases for which UDDs are the solution.

UDDs, similar to User-defined Functions (UDFs) have a long history of usefulness in SQL-derived languages and other data processing and query systems. While the framework can be rich in their expressiveness, there's just no way they can anticipate all the things a developer wants to do. Thus, the custom UDF has become commonplace in our data manipulation toolbox. In order to support customization or extension, CDAP now has the ability to build your own functions for manipulating data through UDDs.

Developing Wrangler UDDs is by no means rocket science, and is an effective way of solving problems that could either be downright impossible, or does not meet your requirements or very awkward to solve.

User Defined Directive (UDD) or Custom Directives are easier and simpler way for users to build and integrate custom directives with Wrangler. UDD framework allow users to develop, deploy and use data processing directives within the data preparation tool.

Building a custom directive involves implementing four simple methods :

  • D -- define() -- Define how the framework should interpret the arguments.

  • I -- initialize() -- Invoked by the framework to initialize the custom directive with arguments parsed.

  • E -- execute() -- Execute and apply your business logic for transforming the Row.

  • D -- destroy() -- Invoke by the framework to destroy any resources held by the directive.

Related documentation

  • Information about Grammar here

  • Custom Directive Implementation Internals here

  • Migrating directives from version 1.0 to version 2.0 here

  • Various TokenType supported by system here

Created in 2020 by Google Inc.