...
Story 2: Multiple Sources
Option 1: Introduce different types of connections. One for data flow, one for control flow
...
You could also say that connections into a source imply control flow, or connections into an action imply control flow.
Story 2: Spark ML in a pipeline
Add a plugin type "sparkML" that is treated like a transform. But instead of being a stage inside a mapper, it is a program in a workflow. The application will create a transient dataset to act as the input into the program, or an explicit source can be given.
Code Block |
---|
{
"stages": [
{
"name": "customersTable",
"plugin": {
"name": "Database",
"type": "batchsource", ...
}
},
{
"name": "categorizer",
"plugin": {
"name": "SVM",
"type": "sparkML", ...
}
},
{
"name": "models",
"plugin": {
"name": "Table",
"type": "batchsink", ...
}
},
],
"connections": [
{ "from": "customersTable", "to": "categorizer" },
{ "from": "categorizer", "to": "models" }
]
} |