Spark Program in Python Action
The Spark Program in Python action plugin is available in the Hub.
Executes user-provided Spark code in Python. You can use this plugin when you want to run arbitrary Spark code.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Python | Yes | Required. The self-contained Spark application written in Python. For example, the Naive Bayes Machine Learning from the official Spark documentation can be written as: # Import libraries
from pyspark import *
from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.regression import LabeledPoint
def parseLine(line):
parts = line.split(',')
label = float(parts[0])
features = Vectors.dense([float(x) for x in parts[1].split(' ')])
return LabeledPoint(label, features)
sc = SparkContext()
data = sc.textFile("${input.path}").map(parseLine)
# Split data aproximately into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=0)
# Train a naive Bayes model.
model = NaiveBayes.train(training, 1.0)
# Make prediction and test accuracy.
predictionAndLabel = test.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count() / test.count()
# Save and load model
model.save(sc, "${output.path}")
With the |
Extra python libraries | Yes | Optional. Extra libraries for the PySpark program. It is a ',' separated list of URI for the locations of extra .egg, .zip and .py libraries. |
Created in 2020 by Google Inc.