Dynamic plugin property filters for CDAP Pipelines


Problem 

Today we have plugins with a lot of properties where the user of the plugin doesn’t necessarily have to iterate through all the properties to configure the it. There is also a compounded use case where some properties, say propertyX & propertyY, within the plugin determine whether other properties, say propertyW & propertyZ needs to be configured. There is no generic way to specify these conditions in widget json which adds the overhead to the user to know which to configure and which they needn’t.

Usecases

Some of the usecases we have seen so far are,


  1. HTTP source – HTTP Batch Source#Pagination – based on a pagination type (which is select widget), different sets of properties need to be shown. 
  2. SQL server plugin – Microsoft SQL Server database plugin – based on an authentication type (also a select widget), different properties need to be shown.
  3. File source – based on the format, different properties need to be shown (delimiter should only be for 'delimited', compression should show up for avro/parquet, etc).
  4. DB source – if numSplits is 1, bounding query does not need to be shown
  5. S3 source – if 'Authentication Method' is set to IAM, accessId and accessKey do not need to be shown
  6. Projection – if 'keep' is specified, 'drop' should not be specified and vice versa.
  7. Salesforce source – if an SObject is given, a SOQL query should not be given and vice versa

Based on some of the usecases we could narrow down to a more refined problem statement which could be,

  1. We need to show properties based on the value of a property
  2. Or, we need to disable configuring properties based on the value of a property

Proposed solution(s)

Before we propose a solution it would be easier to define an abstract syntax of how the condition will look like how we want to branch the plugin properties to be shown or hidden.

Syntax

The task at hand would be to now translate a condition say,

condition: propertyX == "Yes" && propertyY == "internal"
show/hide: ["propertyW", "propertyZ"]

To a valid widget JSON spec. 

A few things to consider,

  • We might most likely have simple conditions where based on value of a single property we show/hide a set of properties

  • We might most likely only need equality checks as all plugin properties are always strings (for now)

Considering the simpler use cases the representation mentioned above still holds true. If in the future if we need to add new operators (greater than?, less than? etc.,) we could do so and the syntax should allow it. Based on the above assumption we could represent a condition like,


if (propertyX == "Yes") && (propertyY == "true") then
  show/hide "propertyZ" and "propertyW"

Would translate to the widget JSON as,

{
  "conditionAND": [
    {
      "property": "propertyX",
      "operator": "equals",
      "value": "Yes"
    },
    {
      "property": "propertyY",
      "operator": "equals",
      "value": "true"
    }
  ],
  "show": [
	{
	  type: "property"/"group",
	  name: "propertyW"
	},
	{
	  type: "property"/"group",
	  name: "propertyZ"
	}
  ],
}

Things to note:

  • We could default condition to conditionAND to simplify things for users to start with

  • We could have operator as optional defaulting to equals

Proposal I

Specify the condition as part of individual plugin properties. To give an example of how this looks like,

{
  "outputs": [],
  "jump-config": {},
  "metadata": {},
  "configuration-groups": [
    {
      "label": "Basic",
      "properties": [
        {
          "widget-type": "textbox",
          "name": "propertyZ",
          "label": "Property Z",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage"
          },
          "show-condition": [
            {
                "AND": [
                  {
                    "property": "propertyX",
                    "operator": "equals",
                    "value": "Yes"
                  },
                  {
                    "property": "propertyY",
                    "operator": "equals",
                    "value": "true"
                  }
                ],
            }
          ]  
        },
        {
          "widget-type": "textbox",
          "name": "propertyW",
          "label": "Property W",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage"
          },
          "show-condition": [
            {
              "AND": [
                {
                  "property": "propertyX",
                  "operator": "equals",
                  "value": "Yes"
                },
                {
                  "property": "propertyY",
                  "operator": "equals",
                  "value": "true"
                }
              ],
            }
          ]
        },
      ]
    },
    {
      "label": "Group 2",
      "properties": [
        {
          "widget-type": "textbox",
          "name": "propertyJ",
          "label": "Property J",
          "widget-attributes": {
            "default": "auto-detect"
          },
          "hide-condition": [
            {
              "AND": [
                {
                  "property": "propertyX",
                  "operator": "equals",
                  "value": "Yes"
                },
                {
                  "property": "propertyY",
                  "operator": "equals",
                  "value": "true"
                }
              ],
            }
          ]  
        },
        {
          "widget-type": "textbox",
          "name": "propertyK",
          "label": "Property K",
          "widget-attributes": {
            "default": "auto-detect"
          },
          "hide-condition": [
            {
              "AND": [
                {
                  "property": "propertyX",
                  "operator": "equals",
                  "value": "Yes"
                },
                {
                  "property": "propertyY",
                  "operator": "equals",
                  "value": "true"
                }
              ],
            }
          ]  
        },
      ]
    },
  "display-name": "BigQuery",
  "icon": {...}
}


Things to note,

  • The show-condition/hide-condition in the above widget JSON can be replaced by disable-condition which would disable the field instead of show/hide. We will be always showing/hiding plugin properties

Pros

  • All the conditions required to show the widget lies within the property widget JSON. It has the property name, widget-type, widget-attributes & the filter conditions that we need to apply for it to show or hide

Cons

  • It becomes redundant in specifying the same filtering conditions across multiple widget properties 


Proposal II

To have a separate section for filters at the same level as configuration-groups. An example of how this will look like would be,

INITIAL PROPOSAL
================
{
  "outputs": [],
  "display-name": "",
  "configuration-groups": [
    {
      "label": "Basic",
      "properties": [
        { 
          "widget-type": "textbox",
          "name": "propertyZ",
          "label": "Property Z",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage"
          }
        },
        {
          "widget-type": "textbox",
          "name": "propertyW",
          "label": "Property W",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage"
          }
        }
      ]
    },
    {
      "Label": "Group 2",
      "properties": [
        {
          "name": "propertyJ",
          "Label": "Property J",
          "widget-type": "textbox",
          "widget-attributes": {
            "default": "auto-detect"
         },
        },
        {
          "name": "propertyK",
          "Label": "Property K",
          "widget-type": "textbox",
          "widget-attributes": {
            "default": "auto-detect"
          }
        }
      ],
    }
  ],
  "filters": [
    {
      "condition": {
        "property": "propertyX",
        "operator": "equal",
        "value": "Yes"
      },
      "show": [
		{
		  type: "property" | "group",
		  name: "propertyW"
		},
	    {
		  type: "property" | "group",
		  name: "propertyZ"
		}
	  ],
    }
  ]
}

Things to note,

  • A condition will always need to be followed by a show/hide property as a directive for the filter (indicating whether the filter will show or hide properties)

  • The elseShow part also assumes that this condition is binary. There could be use cases where if the plugin property can have three different values then this would need to be broken down to three different conditions which makes the elseShow an optional one. An example widget JSON would look something like this, 

    Based on the review I am removing the elseShow clause and switching the representation to show & hide to make it simpler. 


"filters": [
  {
    "name": "Filter1",
    "condition": {
      "property": "propertyX",
      "operator": "equal",
      "value": "10"
    },
    "show": [
	  {
		type: "property"/"group",
	    name: "propertyW"
	  },
	  {
		type: "property"/"group",
	    name: "propertyZ"
	  }
	],
  },
  {
    "name": "Filter2",
    "condition": {
      "expression": "propertyX == 20"
    },
    "show": [
	  {
		type: "property"/"group",
	    name: "propertyA"
	  },
	  {
		type: "property"/"group",
	    name: "propertyB"
	  }
	],
  },
  {
    "name": "Filter3",
    "condition": {
      "expression": "propertyX == 30"
    },
    "show": [
	  {
		type: "property"/"group",
	    name: "propertyC"
	  },
	  {
		type: "property"/"group",
	    name: "propertyD"
	  }
	],
  }
]


Things to note: 

  • Each condition object under a filter specifies the condition based on which the properties under "show" gets shown or hidden.
  • Expression:
    • A plugin developer can instead choose to define a javascript expression that evaluates to a boolean under condition.
    • The javascript expression will only consider numbers, strings and boolean while evaluating conditions. Since context for evaluating the expression will be the plugin properties we expect the condition to be composed of numbers, boolean and strings. This requires we convert the values of the plugin properties to appropriate types instead of having everything as string.
    • The expression is represented in javascript expression language. This not to be confused with Jexl defined as part of the Apache foundation for java language
    • More reference to the javascript expression language can be found here. Jexl Github repo
    • Even though the documentation talks about using expressions that evaluates to some value, the CDAP UI will only be looking for boolean results to show/hide properties under a filter. Say for instance if the plugin developer chooses to ternary that evaluates to a string (splitBy > 1 ? "value1" : "value2") in this case both are truth'y responses and will show the properties all the time. It is imperative that the expressions have to evaluate to a boolean for UI to correctly show/hide plugin properties as part of the filter.
    • The context for the expression would be the map of plugin properties. As an example,
    Database plugin:
    {
      "jdbcPluginType": "jdbc",
      "numSplits": "2",
      "enableAutoCommit": "false",
      "columnNameCase": "No change",
      "transactionIsolationLevel": "TRANSACTION_SERIALIZABLE",
      "referenceName": "ref1",
      "jdbcPluginName": "mysql",
      "connectionString": "jdbc:mysql://localhost/mydb?useLegacyDatetimeCode=false",
      "importQuery": "select * from query_table where $CONDITION",
      "boundingQuery": "select * from query_table",
      "splitBy": "date",
      "user": "username",
      "password": "password"
    }

    And the plugin developer can choose to show splitBy property if the numSplits is greater than 1
    The following JSON shows the usage of plugin properties inside the expression.

    Example with expression under condition:
    "filters": [
      {
        "name": "showBoundingQuery",
        "condition": {
          "expression": "numSplits > 1"
        },
        "show": [
          {
            "type": "property",
            "name": "boundingQuery"
          },
          {
            "type": "property",
            "name": "splitBy"
          }
        }
      }
    ]


    Since we don't have a way for the plugin developer to restrict the usage of expression, if the plugin widget spec has filters with both operator-value-property combination and an expression the precedence will be given to the expression and the rest of the properties in the condition object will be ignored.

  • On a high level we could see this format as a list of filters with specific names with a condition to show or hide certain properties.
  • The name is purely used for readability for plugin developer. This will not be surfaced in UI or anywhere else.
  • Default operator: In the condition object above operator property under each filter is optional. If not specified it is by default equal . We need to decide on what other operators in the future would make sense depending on the type of data we allow for a plugin property.

  • Operator list: As we have restricted to only show when a condition is satisfied, it is imperative we need to provide operators that involves more than just equality check. Currently we should support,

    • equal to

    • not equal to

    • exists

    • does not exist

  • Handling required properties: If a required property is set to be hidden the UI should default it to be never hidden. There could be a case where a plugin developer could potentially configure their widget JSON to hide a required property which will make deploying the pipeline to fail.

  • default case: We will only "show" properties when the conditions are matched for any specific filter. This means the "default" case for the UI is to hide all the properties across all the filters. This would also mean the UI will look through all the filters and hide all those properties to start with unless the condition is true. In the above example properties "PropertyW, PropertyZ, PropertyA, PropertyB, PropertyC & PropertyD" will hidden when the user opens the plugin configuration for the first time. Once the propertyX is set with an appropriate value, only those properties whose condition evaluates to true gets shown. We will not be having a "hide" property under individual filters.

  • When the plugin user resets the toggle property (in the above example propertyX) values for properties that are now hidden based on this action will be reset. The user will have to provide value after every reset.

  • Macro usage: When user provides a macros (${macroValue}) then UI should invalidate all the filters concerning that specific property. At this point UI cannot determine if it can show or hide anything based on the value of the toggle property. Example, if PropertyX from the above example is given a macro then all the properties "PropertyW, PropertyZ, PropertyA, PropertyB, PropertyC & PropertyD" will become visible as then its up to the user to determine which one to configure.


Error Validation:

This opens up a lot of corner cases during stage level error validation. One of the possible scenarios would be, 

Intersecting condition

    1. Plugin developer specifies PropertyC to be shown only when PropertyX is 30

    2. Plugin developer is also expecting(implicitly required) PropertyC when both PropertyX and PropertyL have some value.

    3. Plugin user specifies a value of 20 to PropertyC and provides a value to PropertyL. 

    4. Now during stage level validation, the backend will error out saying missing value for PropertyC.

    5. At this point the user would be confused because they wouldn't even be seeing PropertyC as it is hidden.

At this point the UI should show the hidden property as it is required by some logic set by the plugin developer during stage level validation (once user clicks on validate button in plugin modal).

Already set hidden property

This means if the property has a value the UI will be forced to show the property even if it should be hidden based on the filter conditions. 

A hidden property could have a value,

    • if the user either imports the pipeline JSON or,

    • if the user clones an older pipeline or,

    • if it gets populated by the "browse" functionality that we recently introduced.


However this will be invalidated once the plugin user has changed the toggle property. After this the change all the properties that are hidden will be reset upon which the UI can fallback to the filter logic in plugin widget spec. 


Pros

  • Is concise in the representation of the filters to apply on various conditions

Cons

  • Introduces a new section at the same level as configuration-groups. Might cause problem for plugin developers to specify conditions outside the scope of the widget for a plugin property.


Proposal III

Provide the ability to subgroup related fields under a condition. This way the user experience becomes,

if propertyX is value1 then  
  all the properties that gets shown form subgroup1
if propertyX is value2 then
  all the properties that gets shown form subgroup2
default
  show none


Based on this assumption the example widget JSON would like this,

{
  "outputs": [],
  "display-name": "",
  "configuration-groups": [{
      "label": "Basic",
      "properties": [{
          "widget-type": "condition-branch",
          "name": "propertyZ",
          "label": "Property Z",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage",
            "type": "select",
            "options": ["yes", "no"]
          },
          "sub-groups": [
            {
              "branch": "yes",
              "properties": [
                {
                  "name": "propertyW",
                  "label": "Property W",
                  "widget-type": "textbox",
                  "widget-attributes": {
                    "placeholder": "auto-detect"
                  }
                },
                {
                  "name": "propertyZ",
                  "label": "Property Z",
                  "widget-type": "textbox",
                  "widget-attributes": {
                    "placeholder": "auto-detect"
                  }
                }
              ]
            },
            {
              "branch": "no",
              "properties": [
                {
                  "name": "propertyJ",
                  "label": "Property J",
                  "widget-type": "textbox",
                  "widget-attributes": {
                    "placeholder": "auto-detect"
                  }
                },
                {
                  "name": "propertyK",
                  "label": "Property K",
                  "widget-type": "textbox",
                  "widget-attributes": {
                    "placeholder": "auto-detect"
                  }
                }
              ]
            }
          ]
        },
        {
          "widget-type": "textbox",
          "name": "propertyW",
          "label": "Property W",
          "widget-attributes": {
            "placeholder": "Name used to identify this source for lineage"
          }
        }
      ]
    },
    {
      "Label": "Group 2",
      "properties": [{
          "name": "propertyJ",
          "Label": "Property J",
          "widget-type": "textbox",
          "widget-attributes": {
            "default": "auto-detect"
          }
        },
        {
          "name": "propertyK",
          "Label": "Property K",
          "widget-type": "textbox",
          "widget-attributes": {
            "default": "auto-detect"
          }
        }
      ]
    }
  ],
}

Pros

  • Helps plugin developers contain the properties to show/hide in one section. Might be easier for users of the plugin to follow what changed based on the value of a plugin property

  • Is self contained, in the sense the property responsible for showing/hiding other plugin properties has the references to all the properties it needs to show/hide in one place.

  • Is extensible in the sense it implicitly supports a “switch” representation by having more “branches

Cons

  • Introduces a new concept of "sub-groups". We might need to think if we have the necessity to show them as under one parent group.

  • Limits the conditions plugin developers can have to a single plugin property value. Use cases where the plugin developer has a slightly complex condition this proposal can’t be used to express them.

Conclusion

The final approach that we are taking is proposal X because of XXX reason (TBD)

Based on discussion we decided to proceed with proposal II. A plugin developer who wishes to hide certain properties based on value for a specific plugin property will add "filters" to the plugin widget JSON spec and specify the condition to show or hide certain properties as mentioned in proposal II.

We are proceeding with this approach because,

  • It is simpler to represent. Filters occupy the same hierarchy as groups and helps us in showing/hiding properties or entire groups based on specific conditions
  • Helps with backward compatibility. Individual version of plugin spec will parse and add the show/hide behavior based on filters. UI that only supports older version of the plugin widget JSON spec will still parse but will ignore the filters and the behavior won't be available in older versions of CDAP UI
  • Has a more extensible architecture to add more types of filters if need be (new condition structure, operators - greater than or less than etc.,)

To give an example of the DBSource use case that was mentioned in beginning,

{
  "metadata": {
    "spec-version": "1.5"
  },
  "display-name": "Database",
  "configuration-groups": [
    ...
  ],
  "outputs": [
    ...
  ],
  "jump-config": {
    ...
  },
  "filters": [
    {
      "name": "showBoundingQuery",
      "condition": {
        "expression": "numSplits > 1"
      },
      "show": [
        {
          "type": "property",
          "name": "boundingQuery"
        },
        {
          "type": "property",
          "name": "splitBy"
        }
      }
    }
  ]
}

Another example of Salesforce plugin would look like,


{
  "metadata": {
    "spec-version": "1.5"
  },
  "configuration-groups": [
    ...
  ],
  "outputs": [
    ...
  ],
  "jump-config": {
    ...
  },
  "filters": [
    {
      "name": "showSOObject",
      "condition": {
        "property": "query",
        "operator": "does not exist"
      },
      "show": [
        {
          "type": "property",
          "name": "sobject"
        }
      ]
    },
    {
      "name": "showSOQuery",
      "condition": {
        "property": "sobject",
        "operator": "does not exist"
      },
      "show": [
        {
          "type": "property",
          "name": "query"
        }
      ]
    }
  ]
}


In the above case initially both the SO Object Query and the SOQL Query will be displayed. Once the user enters a value in either one of the properties the other one gets hidden. If the value is reset(cleared out by the user) then both gets shown

Created in 2020 by Google Inc.