Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

This plugin allow to use SAP HANA database as both sink and source

Goals

  • Users can choose and install SAP HANA source and sink plugins.
  • Users should see SAP HANA logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip should describe accurately what each field is used for.

  • Users should not have to specify any redundant configuration

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for SAP HANA database plugin should be placed in repo under data-integrations org.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

  • Integration tests for SAP HANA database plugin should be added in the test repo.

User Stories 

  • Users should be able to install SAP HANA specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does

  • Users should get field level lineage information for the SAP HANA source and sink 

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for SAP HANA source and sink

  • Users should be able to read all the DB types

Plugin Type

  •  Batch Source
  •  Batch Sink 
  •  Real-time Source
  •  Real-time Sink
  •  Action
  •  Post-Run Action
  •  Aggregate
  •  Join
  •  Spark Model
  •  Spark Compute

Design reference

Design

The suggestion is to create maven submodule saphana under database-plugins repo, as it was done for other plugins

Only SAP HANA express edition can be tested, as we don't have full version at hands.

Compatability matrix only available for paid customers: https://launchpad.support.sap.com/#/notes/1906576 [TODO: how to get this?]

Documentation describes following versions of SAP HANA database [TODO: identify difference between them]:

  • 2.0 SPS 04
  • 2.0 SPS 03
  • 2.0 SPS 02
  • 2.0 SPS 01
  • 2.0 SPS 00
  • 1.0 SPS 12

We also need to understand, which versions we do want to support [TODO: identify this]

Design for the plugin can be derrived from generic JDBC classes, but modified according to the custom properties SAP HANA have.


Common Properties (1.0 SPS12)

The properties, that are specific to source, sink or action will be listed separately.

User facing nameTypeDescriptionDefault value
autocommitboolean

When in autocommit mode, every statement is automatically committed. Otherwise, commits and/or rollbacks must be done manually.

true

closeHandlesOnFinalizebooleanWhen enabled, connections, statements, and result sets are automatically closed when their Java finalizers are run.true
communicationTimeout<number> (milliseconds)Aborts communication after the specified timeout. Setting this option to 0 disables the timeout.0
currentschemastringSets the current schema,which is used for identifiers without schema.CURRENT_USER
databaseNamestringThe name of the database to connect to in multi-tenant database container systems.
distributionOFF, CONNECTION, STATEMENT, ALL STATEMENT

Choose the distribution mode. Specifying STATEMENT does not include CONNECTION distribution.

STATEMENT
emptyTimestampIsNullbooleanWhen enabled, DAYDATE, SECONDTIME, SECONDDATE, and LONGDATE values inserted as empty strings are returned as NULLs. When disabled, these values are returned as out-of-band values.false
encryptbooleanWhen enabled, all communication is encrypted via SSL.false
ignoreTopology

 boolean

true = Use the topology unless port-forwarding is detected

true = Always ignore the topology

false
isolationTRANSACTION_READ_UNCOMMITTED, TRANSACTION_READ_COMMITTED, TRANSACTION_REPEATABLE_READ, TRANSACTION_SERIALIZABLESets the isolation level for the connection.TRANSACTION_READ_COMMITTED
key<key>The key for the HdbUserStore.
locale

ISO locale code

The client localeThe client locale
packetsize<number> bytesSets the maximum size of a request packet sent from the client to the server in bytes. The minimum is 130, 000 bytes.130000
passwordstringThe user password. Optional, depending on the authentication method used.
readOnlybooleanWhen enabled, only read-only statements are permitted. Attempting to execute DLL or DML causes an exception.false
reconnectbooleanWhen enabled, the system automatically reconnects to the database instance after a command timeout or a when the connection was broken and reconnecting restores the old state (for example, if no transaction was open).true

trace

JDBC trace file name

If set, enable tracing using the current trace options and with the given trace file name.

tracing is controlled by the GUI-based tracing configuration tool or the command-line tool.splitBatchCommandsbooleanAllow split and parallel execution of batch commands on partitioned tables.false
userstringThe user name. Optional, depending on the authentication method used.
virtualHostName

<virtual-host-name>

The virtual host name. This value is ignored if no HdbUserStore key is specified.

Common Properties (2.0)

[TODO ]

Data Types Mapping

SAP HANA Data TypeCDAP Schema Data TypeComment

BOOLEAN

Schema.Type.BOOLEAN


TINYINT

Schema.Type.INT


SMALLINT

Schema.Type.INT


INTEGER

Schema.Type.INT


BIGINT

Schema.Type.LONG


SMALLDECIMAL

Schema.Type.DECIMAL


DECIMAL

Schema.Type.DECIMAL


REAL

Schema.Type.FLOAT


DOUBLE

Schema.Type.DOUBLE


VARCHAR

Schema.Type.STRING


NVARCHAR

Schema.Type.STRING


ALPHANUM

Schema.Type.STRING


SHORTTEXT

Schema.Type.STRING


DATE

Schema.Type.DATE


TIME

Schema.Type.DATE


SECONDDATE

Schema.Type.DATE


TIMESTAMP

Schema.Type.TIMESTAMP_MICRO


VARBINARY

Schema.Type.BYTES


BLOB

Schema.Type.BYTES


CLOB

Schema.Type.BYTES

Not sure about BYTES or STRING

NCLOB

Schema.Type.BYTES


TEXT

Schema.Type.STRING


ARRAY

Schema.Type.ARRAY


Pipeline Samples

[TODO: attach sample pipeline]

UI Impact or Changes

  • Configurable database properties are presented as named text fields instead of arbitrary key value pairs. SAP HANA source and sink are separate entries with SAP HANA logo in source and sink lists.

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release X.Y.Z

Release X.Y.Z

Related Work

Future work