Microsoft SQL Server database plugin
Introduction
A separate database plugin to support MSSQL-specific features and configurations.
Use-Case
Users can choose and install MSSQL source and sink plugins.
Users should see MSSQL logo on plugin configuration page for better experience.
Users should get relevant information from the tool tip:
The tool tip for the connection string should be customized specifically to the MSSQL database,
The tool tip should describe accurately what each field is used for.
Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).
Users should get field level lineage for the source and sink that is being used.
Reference documentation should be updated to account for the changes.
The source code for MSSQL database plugin should be placed in repo under data-integrations org.
Integration tests for MSSQL database plugin should be added in the test repo.
The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
User Stories
User should be able to install MSSQL specific database source and sink plugins from the Hub
Users should have each tool tip accurately describe what each field does
Users should get field level lineage information for the MSSQL source and sink
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for MSSQL source and sink
Users should be able to read all the DB types
Plugin Type
Design Tips
MSSQL support connection using Azure Active Directory(AD), to connect to AD the https://github.com/AzureAD/azure-activedirectory-library-for-java need to be on classpath.
Information about types of AD connections: https://docs.microsoft.com/en-us/sql/connect/jdbc/connecting-using-azure-active-directory-authentication?view=sql-server-2017
MSSQL connector reference: https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-2017
Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins
MSSQL datatypes mappings and conversions: https://docs.microsoft.com/en-us/sql/connect/jdbc/using-basic-data-types?view=sql-server-2017
Design
The suggestion is to create maven submodule MSSQL under database-plugins repo.
Sink Properties
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
Label | String | Label for UI |
|
Reference Name | String | Uniquely identified name for lineage |
|
Host | String | MSSQL host (serverName) | Required (defaults to localhost on UI) |
Port | Number | The port where SQL Server is listening. If the port number is specified in the connection string, no request to SQLbrowser is made. When the port and instanceName are both specified, the connection is made to the specified port. However, the instanceName is validated and an error is thrown if it does not match the port. | Optional (default 1433) |
Database | String | Database name to connect | Required |
Authentication Type | Select | Indicates which SQL authentication method will be used for the connection. Use 'SQL Login' to connect to a SQL Server using username and password properties. 'Active Directory Password' can be used to connect to an Azure SQL Database/Data Warehouse using an Azure AD principal name and password |
|
Username | String | DB username | Required |
Password | Password | User password | Required |
Transaction Isolation Level | Select | Transaction isolation level for queries run by this sink |
|
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties |
|
Table Name | String | Name of a database table to write to |
|
Instance Name | String | The SQL Server instance name to connect to. When it is not specified, a connection is made to the default instance. For the case where both the instanceName and port are specified, see the notes for port. | Optional |
Query Timeout | Number | The number of seconds to wait before a timeout has occurred on a query. The default value is -1, which means infinite timeout. Setting this to 0 also implies to wait indefinitely. | Optional |
Connect Timeout | Number | Time in seconds to wait for a connection to the server before terminating the attempt and generating an error. | Optional |
Column Encryption | Select | Default column encryption setting for all the commands on the connection. When enabled the JDBC driver will transparently encrypt and decrypt sensitive data stored in encrypted database columns in the SQL Server. | Possible values are: 'Enabled' and 'Disabled'. Default: 'Disabled'. |
Encrypt | Select | When set to 'Yes', SQL Server uses SSL encryption for all data sent between the client and server if the server has a certificate installed. | Possible values are: 'Yes' and 'No'. Default: 'No'. |
Trust Server Certificate | Select | When set to 'Yes' (and encryption enabled), SQL Server uses SSL encryption for all data sent between the client and server without validating the server certificate. | Possible values are: 'Yes' and 'No'. Default: 'No'. |
Workstation ID | String | Used to identify the specific workstation in various SQL Server profiling and logging tools. | Optional |
Failover Partner | String | The name or network address of the instance of SQL Server that acts as failover partner. | Optional |
Packet Size | Number | The network packet size used to communicate with SQL Server, specified in bytes. It's not recommended to specify packet size property when the encryption is enabled. Otherwise, the driver might raise a connection error. | Optional |
Current Language | String | Must correspond to the SQL Server language record name and specifies the language environment for the session. The session language determines the datetime formats and system messages. | Optional |
Source Properties
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
Label | String | Label for UI |
|
Reference Name | String | Uniquely identified name for lineage |
|
Host | String | MSSQL host (serverName) | Required (defaults to localhost on UI) |
Port | Number | The port where SQL Server is listening. If the port number is specified in the connection string, no request to SQLbrowser is made. When the port and instanceName are both specified, the connection is made to the specified port. However, the instanceName is validated and an error is thrown if it does not match the port. | Optional (default 1433) |
Database | String | Database name to connect | Required |
Import Query | String | Query for import data | Valid SQL query |
Authentication Type | Select | Indicates which SQL authentication method will be used for the connection. Use 'SQL Login' to connect to a SQL Server using username and password properties. 'Active Directory Password' can be used to connect to an Azure SQL Database/Data Warehouse using an Azure AD principal name and password |
|
Username | String | DB username | Required |
Password | String | User password | Required |
Bounding Query | String | Returns max and min of split-By Filed | Valid SQL query |
Split-By Field Name | String | Field name which will be used to generate splits |
|
Number of Splits to Generate | Number | Number of splits to generate |
|
Transaction Isolation Level | Select | Transaction isolation level for queries run by this sink |
|
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://docs.microsoft.com/en-us/sql/connect/jdbc/setting-the-connection-properties?view=sql-server-2017 |
|
Instance Name | String | The SQL Server instance name to connect to. When it is not specified, a connection is made to the default instance. For the case where both the instanceName and port are specified, see the notes for port. | Optional |
Query Timeout | Number | The number of seconds to wait before a timeout has occurred on a query. The default value is -1, which means infinite timeout. Setting this to 0 also implies to wait indefinitely. | Optional |
Connect Timeout | Number | Time in seconds to wait for a connection to the server before terminating the attempt and generating an error. | Optional |
Column Encryption | Select | Default column encryption setting for all the commands on the connection. When enabled the JDBC driver will transparently encrypt and decrypt sensitive data stored in encrypted database columns in the SQL Server. | Possible values are: 'Enabled' and 'Disabled'. Default: 'Disabled'. |
Encrypt | Select | When set to 'Yes', SQL Server uses SSL encryption for all data sent between the client and server if the server has a certificate installed. | Possible values are: 'Yes' and 'No'. Default: 'No'. |
Trust Server Certificate | Select | When set to 'Yes' (and encryption enabled), SQL Server uses SSL encryption for all data sent between the client and server without validating the server certificate. | Possible values are: 'Yes' and 'No'. Default: 'No'. |
Workstation ID | String | Used to identify the specific workstation in various SQL Server profiling and logging tools. | Optional |
Failover Partner | String | The name or network address of the instance of SQL Server that acts as failover partner. | Optional |
Packet Size | Number | The network packet size used to communicate with SQL Server, specified in bytes. It's not recommended to specify packet size property when the encryption is enabled. Otherwise, the driver might raise a connection error. | Optional |
Current Language | String | Must correspond to the SQL Server language record name and specifies the language environment for the session. The session language determines the datetime formats and system messages. | Optional |
Action Properties
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
Label | String | Label for UI |
|
Host | String | MSSQL host (serverName) | Required (defaults to localhost on UI) |
Port | Number | The port where SQL Server is listening. If the port number is specified in the connection string, no request to SQLbrowser is made. When the port and instanceName are both specified, the connection is made to the specified port. However, the instanceName is validated and an error is thrown if it does not match the port. | Optional (default 1433) |
Database | String | Database name to connect | Required |
Authentication Type | Select | Indicates which SQL authentication method will be used for the connection. Use 'SQL Login' to connect to a SQL Server using username and password properties. 'Active Directory Password' can be used to connect to an Azure SQL Database/Data Warehouse using an Azure AD principal name and password |
|
Username | String | DB username | Required |
Password | String | User password | Required |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties |
|
Database Command | String | Database command to run | Valid SQL query |
Instance Name | String | The SQL Server instance name to connect to. When it is not specified, a connection is made to the default instance. For the case where both the instanceName and port are specified, see the notes for port. | Optional |
Query Timeout | Number | The number of seconds to wait before a timeout has occurred on a query. The default value is -1, which means infinite timeout. Setting this to 0 also implies to wait indefinitely. | Optional |
Application Intent | Select | Declares the application workload type when connecting to a server. | Possible values: 'ReadWrite' and 'ReadOnly'. Default: 'ReadWrite'. |
Connect Timeout | Number | Time in seconds to wait for a connection to the server before terminating the attempt and generating an error. | Optional |
Column Encryption | Select | Default column encryption setting for all the commands on the connection. When enabled the JDBC driver will transparently encrypt and decrypt sensitive data stored in encrypted database columns in the SQL Server. | Possible values are: 'Enabled' and 'Disabled'. Default: 'Disabled'. |
Encrypt | Select |