Spark Program in Scala Action
The Spark Program in Scala action plugin is available in the Hub.
Executes user-provided Spark code in Scala.
Use this plugin when you want arbitrary Spark code.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Main Class Name | Yes | Required. The fully qualified class name for the Spark application. It must either be an |
Scala | Yes | Required. The self-contained Spark application written in Scala. For example, an application that reads from CDAP stream with name import co.cask.cdap.api.spark._
import org.apache.spark._
import org.slf4j._
class SparkProgram extends SparkMain {
import SparkProgram._
override def run(implicit sec: SparkExecutionContext): Unit = {
val sc = new SparkContext
val result = sc.fromStream[String]("streamName")
.flatMap(_.split("\\s+"))
.map((_, 1))
.reduceByKey(_ + _)
.collectAsMap
LOG.info("Result is: {}", result)
}
}
object SparkProgram {
val LOG = LoggerFactory.getLogger(getClass())
} |
Dependencies | Yes | Optional. Extra dependencies for the Spark program. It is a ‘,’ separated list of URI for the location of dependency jars. A path can be ended with an asterisk ‘*’ as a wildcard, in which all files with extension ‘.jar’ under the parent path will be included. |
Compile at Deployment Time | No | Optional. Specify whether the code will get validated during pipeline creation time. Setting this to Default is true. |
Refer to the CDAP documentation on the enhancements that CDAP brings to Spark.
Created in 2020 by Google Inc.