Page Comparison

...

Property

Macro Enabled?

Description

Main Class Name

Yes

Required. The fully qualified class name for the Spark application. It must either be an object that has a main method define inside, with the method signature as def main(args: Array[String]): Unit; or it is a class that extends from the CDAP co.cask.cdap.api.spark.SparkMain trait that implements the run method, with the method signature as def run(implicit sec: SparkExecutionContext): Unit

Scala

Yes

Required. The self-contained Spark application written in Scala. For example, an application that reads from CDAP stream with name streamName, performs a simple word count logic and logs the result can be written as:

Code Block

import co.cask.cdap.api.spark._
import org.apache.spark._
import org.slf4j._

class SparkProgram extends SparkMain {
  import SparkProgram._

  override def run(implicit sec: SparkExecutionContext): Unit = {
    val sc = new SparkContext
    val result = sc.fromStream[String]("streamName")
      .flatMap(_.split("\\s+"))
      .map((_, 1))
      .reduceByKey(_ + _)
      .collectAsMap

    LOG.info("Result is: {}", result)
  }
}

object SparkProgram {
  val LOG = LoggerFactory.getLogger(getClass())
}

Dependencies

Yes

Optional. Extra dependencies for the Spark program. It is a ‘,’ separated list of URI for the location of dependency jars. A path can be ended with an asterisk ‘*’ as a wildcard, in which all files with extension ‘.jar’ under the parent path will be included.

Compile at Deployment Time

No

Optional. pecify Specify whether the code will get validated during pipeline creation time. Setting this to false will skip the validation.

Default is true.

...

Versions Compared

Old Version 2

New Version Current

Key