...
You could also say that connections into a source imply control flow, or connections into an action imply control flow.
Story
...
3: Spark ML in a pipeline
Add a plugin type "sparkML" that is treated like a transform. But instead of being a stage inside a mapper, it is a program in a workflow. The application will create a transient dataset to act as the input into the program, or an explicit source can be given.
Code Block |
---|
{ "stages": [ { "name": "customersTable", "plugin": { "name": "Database", "type": "batchsource", ... } }, { "name": "categorizer", "plugin": { "name": "SVM", "type": "sparkML", ... } }, { "name": "models", "plugin": { "name": "Table", "type": "batchsink", ... } }, ], "connections": [ { "from": "customersTable", "to": "categorizer" }, { "from": "categorizer", "to": "models" } ] } |
...
Story 6: Join
Add a join plugin type. Different implementations could be inner join, left outer join, etc.
Code Block |
---|
{
"stages": [
{
"name": "customers",
"plugin": {
"name": "Table",
"type": "batchsource", ...
}
},
{
"name": "purchases",
"plugin": {
"name": "Table",
"type": "batchsource", ...
}
},
{
"name": "customerPurchaseJoin",
"plugin": {
"name": "inner",
"type": "join",
"properties": {
"left": "customers.id",
"right": "purchases.id",
"rename": "customers.name:customername,purchases.name:itemname"
}
}
},
...
],
"connections": [
{ "from": "customers", "to": "customerPurchaseJoin" },
{ "from": "purchases", "to": "customerPurchaseJoin" },
{ "from": "customerPurchaseJoin", "to": "sink" },
]
} |
Java API for join plugin type: