SAP HANA Database Plugin

SAP HANA Database Plugin

Checklist

User Stories Documented
User Stories Reviewed
Design Reviewed
APIs reviewed
Release priorities assigned
Test cases reviewed
Blog post

Introduction 

This plugin allow to use SAP HANA database as both sink and source

Goals

  • Users can choose and install SAP HANA source and sink plugins.

  • Users should see SAP HANA logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip should describe accurately what each field is used for.

  • Users should not have to specify any redundant configuration

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for SAP HANA database plugin should be placed in repo under data-integrations org.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

  • Integration tests for SAP HANA database plugin should be added in the test repo.

User Stories 

  • Users should be able to install SAP HANA specific database source and sink plugins from the Hub

  • Users should have each tool tip accurately describe what each field does

  • Users should get field level lineage information for the SAP HANA source and sink 

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for SAP HANA source and sink

  • Users should be able to read all the DB types

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design reference

Design

The suggestion is to create maven submodule saphana under database-plugins repo, as it was done for other plugins

Only SAP HANA express edition can be tested, as we don't have full version at hands.

Compatability matrix only available for paid customers: https://launchpad.support.sap.com/#/notes/1906576 [TODO: how to get this?]

Documentation describes following versions of SAP HANA database [TODO: identify difference between them]:

  • 2.0 SPS 04

  • 2.0 SPS 03

  • 2.0 SPS 02

  • 2.0 SPS 01

  • 2.0 SPS 00

  • 1.0 SPS 12

We also need to understand, which versions we do want to support [TODO: identify this]

Design for the plugin can be derrived from generic JDBC classes, but modified according to the custom properties SAP HANA have.



Common Properties (1.0 SPS12)

The properties, that are specific to source, sink or action will be listed separately.

Section

User Configuration Label

Variable

Type

Options

Label Description

Default

User widget

Section

User Configuration Label

Variable

Type

Options

Label Description

Default

User widget

Basic

Database

databaseName

string



The name of the database to connect to in multi-tenant database container systems.



Text Box

Basic

User

user

string



The user name. Optional, depending on the authentication method used.



Text Box

Basic

Password

password

password



The user password. Optional, depending on the authentication method used.



Text Box

Basic

Schema

currentschema

string



Sets the current schema, which is used for identifiers without schema.

Defaults to current user name

Text Box

Basic

Read Only

readOnly

boolean



When enabled, only read-only statements are permitted. Attempting to execute DLL or DML causes an exception.

false

Toggle

Advanced

Autocommit

autocommit

boolean



When in autocommit mode, every statement is automatically committed. Otherwise, commits and/or rollbacks must be done manually.

true

Toggle

Advanced

Close handles on finalize

closeHandlesOnFinalize

boolean



When enabled, connections, statements, and result sets are automatically closed when their Java finalizers are run.

true

Toggle

Advanced

Connection timeout

communicationTimeout

int



Connection timeout in milliseconds. Setting this option to 0 disables the timeout.

0

Text Box

Advanced

Distribution

distribution

enum

OFF,CONNECTION, STATEMENT, ALL

Choose the distribution mode. Specifying STATEMENT does not include CONNECTION distribution.

STATEMENT

Radio Button

Advanced

Empty Timestamp is NULL

emptyTimestampIsNull

boolean



When enabled, DAYDATE, SECONDTIME, SECONDDATE, and LONGDATE values inserted as empty strings are returned as NULLs. When disabled, these values are returned as out-of-band values.

false

Toggle

Advanced

Encryption

encrypt

boolean



When enabled, all communication is encrypted via SSL.

false

Toggle

Advanced

Ignore topology

ignoreTopology

 boolean



true = Use the topology unless port-forwarding is detected

true = Always ignore the topology

false

Toggle

Advanced

Transaction isolation

isolation

enum

READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, SERIALIZABLE

Sets the isolation level for the connection.

READ_COMMITTED

Radio Button

Advanced

HDB User Key

key

string



The key for the HdbUserStore.



Text Box

Advanced

Locale

locale

string



ISO locale code

The client locale

Text Box

Advanced

Packet size

packetsize

int



Sets the maximum size of a request packet sent from the client to the server in bytes. The minimum is 130, 000 bytes.

130000

Text Box

Advanced

Reconnect

reconnect

boolean



When enabled, the system automatically reconnects to the database instance after a command timeout or a when the connection was broken and reconnecting restores the old state (for example, if no transaction was open).

true

Toggle

Advanced

Split Batch Commands

splitBatchCommands

boolean



Allow split and parallel execution of batch commands on partitioned tables.

false

Toggle

Advanced

Virtual Host Name

virtualHostName

string



The virtual host name. This value is ignored if no HdbUserStore key is specified.



Text Box

Common Properties (2.0)

[TODO ]

Data Types Mapping

SAP HANA Data Type

CDAP Schema Data Type

Comment

SAP HANA Data Type

CDAP Schema Data Type

Comment

BOOLEAN

Schema.Type.BOOLEAN



TINYINT

Schema.Type.INT



SMALLINT

Schema.Type.INT



INTEGER

Schema.Type.INT



BIGINT

Schema.Type.LONG



SMALLDECIMAL

Schema.Type.DECIMAL



DECIMAL

Schema.Type.DECIMAL



REAL

Schema.Type.FLOAT



DOUBLE

Schema.Type.DOUBLE



VARCHAR

Schema.Type.STRING



NVARCHAR

Schema.Type.STRING



ALPHANUM

Schema.Type.STRING



SHORTTEXT

Schema.Type.STRING



DATE

Schema.Type.DATE



TIME

Schema.Type.DATE



SECONDDATE

Schema.Type.DATE



TIMESTAMP

Schema.Type.TIMESTAMP_MICRO



VARBINARY

Schema.Type.BYTES



BLOB

Schema.Type.BYTES



CLOB

Schema.Type.BYTES

Not sure about BYTES or STRING

NCLOB

Schema.Type.BYTES



TEXT

Schema.Type.STRING



ARRAY

Schema.Type.ARRAY



Pipeline Samples

[TODO: attach sample pipeline]

UI Impact or Changes

  • Configurable database properties are presented as named text fields instead of arbitrary key value pairs. SAP HANA source and sink are separate entries with SAP HANA logo in source and sink lists.

Test Scenarios

Test ID

Test Description

Expected Results

Test ID

Test Description

Expected Results

























Releases

Release X.Y.Z

Release X.Y.Z

Related Work

Future work

Created in 2020 by Google Inc.