Plugins
A Plugin is a Java class that extends an application class by implementing an interface expected by the application class. Plugins can be packaged in a separate artifact from the application class that uses it.
Plugin Usage
You tell CDAP that a class is a Plugin by annotating the class with the type and name of the plugin. For example:
@Plugin(type = "runnable")
@Name("noop")
public class NoOpRunnable implements Runnable {
public abstract void run() {
// do nothing
}
}
A program can register a plugin at configure time (application creation time) by specifying the plugin type, name, properties, and assigning an id to the plugin:
public class ExampleWorker extends AbstractWorker {
@Override
public void configure() {
usePlugin("runnable", "noop", "id", PluginProperties.builder().build());
}
}
Once registered, the plugin can be instantiated and used at runtime using the plugin id it was registered with:
public class ExampleWorker extends AbstractWorker {
private Runnable runnable;
@Override
public void configure() {
usePlugin("runnable", "noop", "id", PluginProperties.builder().build());
}
@Override
public void initialize(WorkerContext context) throws Exception {
runnable = context.newPluginInstance("id");
}
@Override
public void run() {
runnable.run();
}
}
Plugin Config
A Plugin can also make use of the PluginConfig class to configure itself. Suppose we want to modify our no-op runnable to print a configurable message. We can do this by adding a PluginConfig, passing it into the constructor, and setting it as a field:
Your extension to PluginConfig must contain only primitive, boxed primitive, or String
types. The PluginConfig passed in to the Plugin has its fields populated using the PluginProperties specified when the Plugin was registered. In this example, if we want the message to be "Hello CDAP!":
The
@Nullable
annotation tells CDAP that the field is not required. Without that annotation, CDAP will complain if no plugin property fordelimiter
is given.Configuration fields can be annotated with an
@Description
that will be returned by the Artifact Microservices Plugin Detail.The
@Macro
annotation makes the fieldmessage
macro-enabled; this allows the value of the fieldmessage
to be a "macro key" whose value will be set at runtime.
Third-Party Plugins
Sometimes there is a need to use classes in a third-party JAR as plugins. For example, you may want to be able to use a JDBC driver as a plugin. In these situations, you have no control over the code, which means you cannot annotate the relevant class with the @Plugin
annotation. If this is the case, you can explicitly specify the plugins when deploying the artifact. For example, if you are using the Microservices, you set the Artifact-Plugins
, Artifact-Version
, and Artifact-Extends
headers when deploying the artifact:
Or, using the CDAP CLI:
where config.json
contains:
Using the CLI:
Plugin Deployment
To make plugins available to another artifact (and thus available to any application created from one of the artifacts), the plugins must first be packaged in a JAR file. After that, the JAR file must be deployed either as a system artifact or a user artifact.
A system artifact is available to users across any namespace. A user artifact is available only to users in the namespace to which it is deployed. By design, deploying as a user artifact just requires access to the Artifact Microservices, while deploying as a system artifact requires access to the filesystem of the CDAP Master. This then requires administrator access and permission.
Plugin Packaging
A Plugin is packaged as a JAR file, which contains the plugin classes and their dependencies. CDAP uses the "Export-Package" attribute in the JAR file manifest to determine which classes are visible. A visible class is one that can be used by another class that is not from the plugin JAR itself. This means the Java package which the plugin class is in must be listed in "Export-Package", otherwise the plugin class will not be visible, and hence no one will be able to use it. This can be done in Maven by editing your pom.xml. For example, if your plugins are in the com.example.runnable
and com.example.callable
packages, you would edit the bundler plugin in your pom.xml:
Deploying as a System Artifact
To deploy the artifact as a system artifact, both the JAR file and a matching configuration file must be placed in the appropriate directory.
CDAP Sandbox:
$CDAP_INSTALL_DIR/artifacts
Distributed CDAP: The plugin JARs should be placed in the local file system and the path can be provided to CDAP by setting the property
app.artifact.dir
in cdap-site.xml. Multiple directories can be defined by separating them with a semicolon. The default path is/opt/cdap/master/artifacts
.
For each plugin JAR, there must also be a corresponding configuration file to specify which artifacts can use the plugins. The file name must match the name of the JAR, except it must have the .json
extension instead of the .jar
extension. For example, if your JAR file is named custom-transforms-1.0.0.jar
, there must be a corresponding custom-transforms-1.0.0.json
file. If your custom-transforms-1.0.0.jar
contains transforms that can be used by both the cdap-data-pipeline
and cdap-data-streams
artifacts, custom-transforms-1.0.0.json
would contain:
This file specifies that the plugins in custom-transforms-1.0.0.jar
can be used by version 6.2.0 of the cdap-data-pipeline
and cdap-data-streams
artifacts. You can also specify a wider range of versions that can use the plugins, with square brackets [ ]
indicating an inclusive version and parentheses ( )
indicating an exclusive version. For example:
specifies that these plugins can be used by versions 3.5.0 (inclusive) to 4.0.0 (exclusive) of the cdap-data-pipeline
and cdap-data-streams
artifacts.
If the artifact contains third-party plugins, you can explicitly list them in the config file. For example, you may want to deploy a JDBC driver contained in a third-party JAR. In these cases, you have no control over the code to annotate the classes that should be plugins, so you need to list them in the configuration:
Once your JARs and matching configuration files are in place, a CDAP CLI command (load artifact
) or a Microservices call to load system artifacts can be made to load the artifacts. As described in the documentation on Artifacts, only snapshot artifacts can be re-deployed without requiring that they first be deleted.
Alternatively, the CDAP Sandbox should be restarted for this change to take effect in local sandbox mode, and cdap-master
services should be restarted in the Distributed mode.
Deploying as a User Artifact
To deploy the artifact as a user artifact, use the Artifact Microservices API Add Artifact or the CLI.
When using the Microservices, you will need to specify the Artifact-Extends
header. Unless the artifact's version is defined in the manifest file of the JAR file you upload, you will also need to specify the Artifact-Version
header.
When using the CLI, a configuration file exactly like the one described in the “Deploying as a System Artifact” must be used.
For example, to deploy custom-transforms-1.0.0.jar
using the Microservices:
Using CLI:
where config.json
contains:
Note that when deploying a user artifact that extends a system artifact, you must prefix the parent artifact name with 'system:'
. This is in the event there is a user artifact with the same name as the system artifact. If you are extending a user artifact, no prefix is required.
You can deploy third-party JARs in the same way except the plugin information needs to be explicitly listed. As described in the documentation on Artifacts, only snapshot artifacts can be re-deployed without requiring that they first be deleted.
Using the Microservices (note that if the artifact version is not in the JAR manifest file, it needs to be set explicitly, as the JAR contents are uploaded without the filename):
Using the CLI (note that the artifact version, if not explicitly set, is derived from the JAR filename):
where config.json
contains:
Deployment Verification
You can verify that a plugin artifact was added successfully by using the Artifact Microservices to retrieve artifact details. For example, to retrieve detail about our custom-transforms
artifact:
Using the CLI:
If you deployed the custom-transforms
artifact as a system artifact, the scope is system
. If you deployed the custom-transforms
artifact as a user artifact, the scope is user
.
You can verify that the plugins in your newly-added artifact are available to its parent by using the Artifact Microservices to list plugins of a specific type. For example, to check if cdap-data-pipeline
can access the plugins in the custom-transforms
artifact:
Using the CLI:
You can then check the list returned to see if your transforms are in the list. Note that the scope here refers to the scope of the parent artifact. In this example it is the system
scope because cdap-data-pipeline
is a system artifact. This is true even if you deployed custom-transforms
as a user artifact because the parent is still a system artifact.
Example Use Case
When writing an application class, it is often useful to create interfaces or abstract classes that define a logical contract in your code, but do not provide an implementation of that contract. This lets you plug in different implementations to fit your needs.
Classic WordCount Example
For example, consider the classic word count example for MapReduce. The program reads files, tokenizes lines in those files into words, and then counts how many times each word appears. The code consists of several classes:
We package our code into a JAR file named wordcount-1.0.0.jar
and add it to CDAP:
We then create an application from that artifact:
This program runs just fine. It counts all words in the input. However, what if we want to count phrases instead of words? Or what if we want to filter out common words such as 'the'
and 'a'
? We would not want to copy and paste our application class and then make just small tweaks.
A Configurable Application
Instead, we would like to be able to create applications that are configured to tokenize the line in different ways. That is, if we want an application that filters stopwords, we want to be able to create it through a configuration:
Similarly, we want to be able to create an application that counts phrases through a configuration:
This is possible by changing our code to use the Plugin framework. The first thing we need to do is introduce a Tokenizer
interface:
Now we change our WordCountMapper
to use the plugin framework to instantiate and use a Tokenizer
:
The key method we added was the initialize
method. In it, we are using CDAP's plugin framework to instantiate a plugin of type Tokenizer
, identified by tokenizerId
. This code runs when the MapReduce program runs. In order for CDAP to know which plugin tokenizerId
refers to, we will need to register the plugin in our application's configure
method. We change our application code to use a configuration object that will specify the name of the Tokenizer
to use, and register that plugin:
CDAP will take whatever is specified in the config
section of the application creation request and convert it into the Config
object expected by the application class. If it receives this request:
the TokenizerConfig
will have its tokenizer
field set to phrase
.
This allows us to configure which tokenizer should be used when creating an application. Since we want other artifacts to implement the Tokenizer
interface, we need to make sure the class is exposed to other artifacts. We do this by including the Tokenizer
's package in the Export-Package
manifest attribute of our JAR file. For example, if our Tokenizer
full class name is com.example.api.Tokenizer
, we need to expose the com.example.api
package in our pom.xml:
We then package the code in a new version of the artifact wordcount-1.1.0.jar
and deploy it:
Implementing Tokenizer Plugins
Finally, we need to implement some tokenizer plugins. Plugins are just Java classes that have been annotated with a plugin type and name:
We package these tokenizers in a separate artifact named tokenizers-1.0.0.jar
. In order to make these plugins visible to programs using them, we need to include their packages in the Export-Packages
manifest attribute. For example, if our classes are all in the com.example.tokenizer
package, we need to expose the com.example.tokenizer
package in our pom.xml:
When deploying this artifact, we tell CDAP that the artifact extends the wordcount
artifact, versions 1.1.0
inclusive to 2.0.0
exclusive:
This will make the plugins available to those versions of the wordcount
artifact. We can now create applications that use the tokenizer we want:
Adding a Plugin Configuration to the Application
After a while, we find that we need to support reading files where words are delimited by a character other than a space. We decide to modify our DefaultTokenizer
to use a PluginConfig
that contains a property for the delimiter:
When we register the plugin, we need to pass in the properties that will be used to populate the PluginConfig
passed to the DefaultTokenizer
. In this example, that means the delimiter
property must be given when registering the plugin:
Now we can create an application that uses a comma instead of a space to split text (re-formatted for display):
Created in 2020 by Google Inc.