Oracle Batch Source

Plugin version: 1.10.0

Reads from an Oracle table using a configurable SQL query. Outputs one record for each row returned by the query. For example, you might want to create daily snapshots of a database table by using this source and writing to Amazon S3.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use connection

No

6.7.0/1.8.0

Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials.

Connection

Yes

6.7.0/1.8.0

Required. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}.

JDBC Driver Name

No

 

Required. Select the JDBC driver to use.

Default is oracle.

Host

Yes

 

Required. Host that Oracle is running on.

Default is localhost.

Port

Yes

 

Required. Port that Oracle is running on.

Default is 1521.

Username

Yes

 

Optional. User identity for connecting to the specified database.

Password

Yes

 

Optional. Password to use to connect to the specified database.

Role

No

 

Optional. Login role of the user when connecting to the database.

Default is Normal.

Transaction Isolation Level

Yes

6.6.0/1.7.1

The transaction isolation level of the databse connection

  • TRANSACTION_READ_COMMITTED: No dirty reads. Non-repeatable reads and phantom reads are possible.

  • TRANSACTION_SERIALIZABLE (default): No dirty reads. Non-repeatable and phantom reads are prevented.

  • Note: If the user role selected is SYSDBA or SYSOPER, the plugin will default to TRANSACTION_READ_COMMITTED to prevent ORA-08178 errors

Connection Type

No

 

Required. Whether to use an SID, Service Name, or TNS Connect Descriptor when connecting to the database.

SID/Service Name/TNS Connect Descriptor

Yes

 

Required. Oracle connection point (Database name, Service name, or a TNS Connect Descriptor). When using TNS, place the full TNS Connect Descriptor in the text field. For example: (DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = 123.123.123.123)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED) (SERVICE_NAME = XE)))

Connection Arguments

Yes

 

Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Import Query

Yes

 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. The '$CONDITIONS' string will be replaced by Split-By Field Name field limits specified by the bounding query. The '$CONDITIONS' string is not required if Number of Splits to Generate is set to 1.

Bounding Query

Yes

 

Optional. Bounding Query should return the min and max of the values of the Split-By Field Name field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if Number of Splits to Generate is set to 1.

Split-By Field Name

Yes

 

Optional. Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to 1.

Number of Splits to Generate

Yes

 

Optional. Number of splits to generate.

Default is 1.

Fetch Size

Yes

6.6.0/1.7.0

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Default Batch Value

No

 

Optional. The default batch value that triggers an execution request.

Default is 10.

Default Row Prefetch

No

 

Optional. The default number of rows to prefetch from the server.

Default is 40.

Example

Suppose you want to read data from Oracle database named “XE” that is running on “localhost” port 1251, as “system” user with “oracle” password. Ensure that the driver for Oracle is installed. You can also provide driver name for some specific driver, otherwise “oracle” will be used, and then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

oracle

Host

localhost

Port

1521

Connection Type

Service Name

SID/Service Name/TNS Connect Descriptor

XE

Import Query

select id, name, email, phone from users;

Number of Splits to Generate

1

Username

system

Password

oracle

Default Batch Value

10

Default Row Prefetch

40

For example, if the ‘id’ column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

field name

type

field name

type

id

int

name

string

email

string

phone

string

Data Type Mapping

Oracle Data Type

CDAP Schema Data Type

Comments

Oracle Data Type

CDAP Schema Data Type

Comments

VARCHAR2

string

 

NVARCHAR2

string

 

VARCHAR

string

 

NUMBER

decimal

For NUMBER types with a defined precision and scale.

NUMBER

string

For NUMBER types defined without a precision and scale.
Users can manually set output schema to map it to Decimal.

FLOAT

double

 

LONG

string

 

DATE

timestamp

 

BINARY_FLOAT

float

 

BINARY_DOUBLE

double

 

TIMESTAMP

timestamp

 

TIMESTAMP WITH TIME ZONE

string

 

TIMESTAMP WITH LOCAL TIME ZONE

timestamp

 

INTERVAL YEAR TO MONTH

string

 

INTERVAL DAY TO SECOND

string

 

RAW

bytes

 

LONG RAW

bytes

 

ROWID

string

 

UROWID

string

 

CHAR

string

 

NCHAR

string

 

CLOB

string

 

NCLOB

string

 

BLOB

bytes

 

BFILE

bytes

BFILE is a data type used to store a locator (link)to an external file, which is stored outside of the database. Only the locator will be read from an Oracle table and not the content of the external file.

 

Created in 2020 by Google Inc.