Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
This plugin allow to use SAP HANA database as both sink and source
Goals
- Users can choose and install SAP HANA source and sink plugins.
Users should see SAP HANA logo on plugin configuration page for better experience.
Users should get relevant information from the tool tip:
The tool tip should describe accurately what each field is used for.
Users should not have to specify any redundant configuration
Users should get field level lineage for the source and sink that is being used.
Reference documentation should be updated to account for the changes.
The source code for SAP HANA database plugin should be placed in repo under data-integrations org.
The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
Integration tests for SAP HANA database plugin should be added in the test repo.
User Stories
- Users should be able to install SAP HANA specific database source and sink plugins from the Hub
Users should have each tool tip accurately describe what each field does
Users should get field level lineage information for the SAP HANA source and sink
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for SAP HANA source and sink
Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design reference
https://github.com/SAP/hana-native-adapters/tree/master/jdbcadapter/src/jdbcadapter
https://help.sap.com/viewer/1efad1691c1f496b8b580064a6536c2d/Cloud/en-US/109397c2206a4ab2a5386d494f4cf75e.html
Design
The suggestion is to create maven submodule saphana under database-plugins repo, as it was done for other plugins
Only SAP HANA express edition can be tested, as we don't have full version at hands.
Compatability matrix only available for paid customers: https://launchpad.support.sap.com/#/notes/1906576 [TODO: how to get this?]
Documentation describes following versions of SAP HANA database [TODO: identify difference between them]:
- 2.0 SPS 04
- 2.0 SPS 03
- 2.0 SPS 02
- 2.0 SPS 01
- 2.0 SPS 00
- 1.0 SPS 12
We also need to understand, which versions we do want to support [TODO: identify this]
Design for the plugin can be derrived from generic JDBC classes, but modified according to the custom properties SAP HANA have.
Common Properties (1.0 SPS12)
The properties, that are specific to source, sink or action will be listed separately.
Section | User Configuration Label | Variable | Type | Options | Label Description | Default | User widget |
---|---|---|---|---|---|---|---|
Standart | Database | databaseName | string | The name of the database to connect to in multi-tenant database container systems. | Text Box | ||
Standart | User | user | string | The user name. Optional, depending on the authentication method used. | Text Box | ||
Standart | Password | password | password | The user password. Optional, depending on the authentication method used. | Text Box | ||
Standart | Schema | currentschema | string | Sets the current schema, which is used for identifiers without schema. | Defaults to current user name | Text Box | |
Standart | Read Only | readOnly | boolean | When enabled, only read-only statements are permitted. Attempting to execute DLL or DML causes an exception. | false | Toggle | |
Advanced | Autocommit | autocommit | boolean | When in autocommit mode, every statement is automatically committed. Otherwise, commits and/or rollbacks must be done manually. | true | Toggle | |
Advanced | Close handles on finalize | closeHandlesOnFinalize | boolean | When enabled, connections, statements, and result sets are automatically closed when their Java finalizers are run. | true | Toggle | |
Advanced | Connection timeout | communicationTimeout | int | Connection timeout in milliseconds. Setting this option to 0 disables the timeout. | 0 | Text Box | |
Advanced | Distribution | distribution | enum | OFF,CONNECTION, STATEMENT, ALL | Choose the distribution mode. Specifying STATEMENT does not include CONNECTION distribution. | STATEMENT | Radio Button |
Advanced | Empty Timestamp is NULL | emptyTimestampIsNull | boolean | When enabled, DAYDATE, SECONDTIME, SECONDDATE, and LONGDATE values inserted as empty strings are returned as NULLs. When disabled, these values are returned as out-of-band values. | false | Toggle | |
Advanced | Encryption | encrypt | boolean | When enabled, all communication is encrypted via SSL. | false | Toggle | |
Advanced | Ignore topology | ignoreTopology | boolean | true = Use the topology unless port-forwarding is detected true = Always ignore the topology | false | Toggle | |
Advanced | Transaction isolation | isolation | enum | READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, SERIALIZABLE | Sets the isolation level for the connection. | READ_COMMITTED | Radio Button |
Advanced | HDB User Key | key | string | The key for the HdbUserStore. | Text Box | ||
Advanced | Locale | locale | string | ISO locale code | The client locale | Text Box | |
Advanced | Packet size | packetsize | int | Sets the maximum size of a request packet sent from the client to the server in bytes. The minimum is 130, 000 bytes. | 130000 | Text Box | |
Advanced | Reconnect | reconnect | boolean | When enabled, the system automatically reconnects to the database instance after a command timeout or a when the connection was broken and reconnecting restores the old state (for example, if no transaction was open). | true | Toggle | |
Advanced | Split Batch Commands | splitBatchCommands | boolean | Allow split and parallel execution of batch commands on partitioned tables. | false | Toggle | |
Advanced | Virtual Host Name | virtualHostName | string | The virtual host name. This value is ignored if no HdbUserStore key is specified. | Text Box |
Common Properties (2.0)
[TODO ]
Data Types Mapping
SAP HANA Data Type | CDAP Schema Data Type | Comment |
---|---|---|
BOOLEAN | Schema.Type.BOOLEAN | |
TINYINT | Schema.Type.INT | |
SMALLINT | Schema.Type.INT | |
INTEGER | Schema.Type.INT | |
BIGINT | Schema.Type.LONG | |
SMALLDECIMAL | Schema.Type.DECIMAL | |
DECIMAL | Schema.Type.DECIMAL | |
REAL | Schema.Type.FLOAT | |
DOUBLE | Schema.Type.DOUBLE | |
VARCHAR | Schema.Type.STRING | |
NVARCHAR | Schema.Type.STRING | |
ALPHANUM | Schema.Type.STRING | |
SHORTTEXT | Schema.Type.STRING | |
DATE | Schema.Type.DATE | |
TIME | Schema.Type.DATE | |
SECONDDATE | Schema.Type.DATE | |
TIMESTAMP | Schema.Type.TIMESTAMP_MICRO | |
VARBINARY | Schema.Type.BYTES | |
BLOB | Schema.Type.BYTES | |
CLOB | Schema.Type.BYTES | Not sure about BYTES or STRING |
NCLOB | Schema.Type.BYTES | |
TEXT | Schema.Type.STRING | |
ARRAY | Schema.Type.ARRAY |
Pipeline Samples
[TODO: attach sample pipeline]
UI Impact or Changes
Configurable database properties are presented as named text fields instead of arbitrary key value pairs. SAP HANA source and sink are separate entries with SAP HANA logo in source and sink lists.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|