Spark Program in Scala Action

The Spark Program in Scala action plugin is available in the Hub.

Executes user-provided Spark code in Scala.

Use this plugin when you want arbitrary Spark code.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Main Class Name

Yes

Required.  The fully qualified class name for the Spark application. It must either be an object that has a main method define inside, with the method signature as def main(args: Array[String]): Unit; or it is a class that extends from the CDAP co.cask.cdap.api.spark.SparkMain trait that implements the run method, with the method signature as def run(implicit sec: SparkExecutionContext): Unit

Scala

Yes

Required. The self-contained Spark application written in Scala. For example, an application that reads from CDAP stream with name streamName, performs a simple word count logic and logs the result can be written as:

import co.cask.cdap.api.spark._ import org.apache.spark._ import org.slf4j._ class SparkProgram extends SparkMain { import SparkProgram._ override def run(implicit sec: SparkExecutionContext): Unit = { val sc = new SparkContext val result = sc.fromStream[String]("streamName") .flatMap(_.split("\\s+")) .map((_, 1)) .reduceByKey(_ + _) .collectAsMap LOG.info("Result is: {}", result) } } object SparkProgram { val LOG = LoggerFactory.getLogger(getClass()) }

Dependencies

Yes

Optional. Extra dependencies for the Spark program. It is a ‘,’ separated list of URI for the location of dependency jars. A path can be ended with an asterisk ‘*’ as a wildcard, in which all files with extension ‘.jar’ under the parent path will be included.

Compile at Deployment Time

No

Optional. Specify whether the code will get validated during pipeline creation time. Setting this to false will skip the validation.

Default is true.

Refer to the CDAP documentation on the enhancements that CDAP brings to Spark.

Created in 2020 by Google Inc.