The Spark Program in Scala action plugin is available in the Hub.
Executes user-provided Spark code in Scala.
Use this plugin when you want arbitrary Spark code.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Main Class Name | Yes | Required. The fully qualified class name for the Spark application. It must either be an |
Scala | Yes | Required. The self-contained Spark application written in Scala. For example, an application that reads from CDAP stream with name import co.cask.cdap.api.spark._ import org.apache.spark._ import org.slf4j._ class SparkProgram extends SparkMain { import SparkProgram._ override def run(implicit sec: SparkExecutionContext): Unit = { val sc = new SparkContext val result = sc.fromStream[String]("streamName") .flatMap(_.split("\\s+")) .map((_, 1)) .reduceByKey(_ + _) .collectAsMap LOG.info("Result is: {}", result) } } object SparkProgram { val LOG = LoggerFactory.getLogger(getClass()) } |
Dependencies | Yes | Optional. Extra dependencies for the Spark program. It is a ‘,’ separated list of URI for the location of dependency jars. A path can be ended with an asterisk ‘*’ as a wildcard, in which all files with extension ‘.jar’ under the parent path will be included. |
Compile at Deployment Time | No | Optional. pecify whether the code will get validated during pipeline creation time. Setting this to Default is true. |
Refer to the CDAP documentation on the enhancements that CDAP brings to Spark.