Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You could also say that connections into a source imply control flow, or connections into an action imply control flow.

Story

...

3: Spark ML in a pipeline

Add a plugin type "sparkML" that is treated like a transform.  But instead of being a stage inside a mapper, it is a program in a workflow.  The application will create a transient dataset to act as the input into the program, or an explicit source can be given.

Code Block
{
  "stages": [
    {
      "name": "customersTable",
      "plugin": {
        "name": "Database",
        "type": "batchsource", ...
      }
    },    
    {
      "name": "categorizer",
      "plugin": {
        "name": "SVM",
        "type": "sparkML", ...
      }
    },
    {
      "name": "models",
      "plugin": {
        "name": "Table",
        "type": "batchsink", ...
      }
    },
  ],
  "connections": [
    { "from": "customersTable", "to": "categorizer" },
    { "from": "categorizer", "to": "models" }
  ]
}

...

Story 6: Join

Add a join plugin type.  Different implementations could be inner join, left outer join, etc.

Code Block
{
  "stages": [
    {
      "name": "customers",
      "plugin": {
        "name": "Table",
        "type": "batchsource", ...
      }
    },
    {
      "name": "purchases",
      "plugin": {
        "name": "Table",
        "type": "batchsource", ...
      }
    },
    {
      "name": "customerPurchaseJoin",
      "plugin": {
        "name": "inner",
        "type": "join",
        "properties": {
          "left": "customers.id",
          "right": "purchases.id",
          "rename": "customers.name:customername,purchases.name:itemname"
        }
      }
    },
    ...
  ],
  "connections": [
    { "from": "customers", "to": "customerPurchaseJoin" },
    { "from": "purchases", "to": "customerPurchaseJoin" },
    { "from": "customerPurchaseJoin", "to": "sink" },
  ]
}

Java API for join plugin type: