SQL Server Batch Source

SQL Server Batch Source

Plugin version: 1.10.0

Reads from a SQL Server using a configurable SQL query. Outputs one record for each row returned by the query.

Use this source when you need to read from a SQL Server. For example, you might want to create daily snapshots of a database table by using this source and writing to a partitioned table on BigQuery.

For information about using this plugin in pipelines, see Microsoft SQL Server Best Practices.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use connection

No

6.7.0/1.8.0

Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials.

Connection

Yes

6.7.0/1.8.0

Required. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}.

JDBC Driver Name

No

Ā 

Required. Select the JDBC driver to use.

Host

Yes

Ā 

Required. Host that SQL Server is running on.

Default is localhost.

Port

Yes

Ā 

Required. Port that SQL Server is listening to. If the port number is specified in the connection string, no request to SQLbrowser is made. When the Port and Instance Name are both specified, the connection is made to the specified port. However, the Instance Name is validated and an error is thrown if it does not match the port.

Default is 1433.

Authentication Type

No

Ā 

Optional. Indicates which SQL authentication method will be used for the connection. Use 'SQL Login' to connect to a SQL Server using username and password properties. Use 'Active Directory Password' to connect to an Azure SQL Database/Data Warehouse using an Azure AD principal name and password.

Default is SQL Login.

Username

Yes

Ā 

Optional. User identity for connecting to the specified database.

Password

Yes

Ā 

Optional. Password to use to connect to the specified database.

Connection Arguments

No

Ā 

Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Reference Name

No

Ā 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Database

Yes

Ā 

Required. SQL Server database name.

Import Query

Yes

Ā 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the ā€˜$CONDITIONS’ string. For example, ā€˜SELECT * FROM table WHERE $CONDITIONS’. The ā€˜$CONDITIONS’ string will be replaced by Split-by Field Name field limits specified by the bounding query. The ā€˜$CONDITIONS’ string is not required if Number of Splits to Generate is set to 1.

Bounding Query

Yes

Ā 

Required. Bounding Query should return the minimum and maximum of the values of the Split-by Field Name field. For example, SELECT MIN(id),MAX(id) FROM table. Not required if Number of Splits to Generate is set to 1.

Split-By Field Name

Yes

Ā 

Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to one.

Number of Splits to Generate

Yes

Ā 

Number of splits to generate.

Default is 1.

Fetch Size

Yes

6.6.0/1.7.0

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Instance Name

No

Ā 

Optional. SQL Server instance name to connect to. When it is not specified, a connection is made to the default instance. For the case where both the instanceName and port are specified, see the notes for port. If you specify a Virtual Network Name in the Server connection property, you cannot use instanceName connection property.

Query Timeout

No

Ā 

Optional. Number of seconds to wait before a timeout has occurred on a query. The default value is -1, which means infinite timeout. Setting this to 0 also implies to wait indefinitely.

Default is -1.

Connect Timeout

No

Ā 

Optional. Time in seconds to wait for a connection to the server before terminating the attempt and generating an error.

Default is 0.

Column Encryption

No

Ā 

Optional. Whether to encrypt data sent between the client and server for encrypted database columns in the SQL server.

Default is Disabled.

Encrypt

No

Ā 

Optional. Whether to encrypt all data sent between the client and server. This requires that the SQL server has a certificate installed.

Default is No.

Trust Server Certificate

No

Ā 

Optional. Whether to trust the SQL server certificate without validating it when using SSL encryption for data sent between the client and server.

Default is No.

Workstation ID

No

Ā 

Optional. Used to identify the specific workstation in various SQL Server profiling and logging tools.

Failover Partner

No

Ā 

Optional. Name or network address of the SQL Server instance that acts as a failover partner.

Packet Size

No

Ā 

Optional. Network packet size in bytes to use when communicating with the SQL Server.

Default is -1.

Current Language

No

Ā 

Optional. Language to use for SQL sessions. The language determines datetime formats and system messages. For the list of installed languages, seeĀ sys.syslanguages.

Example

You want to read data from SQL Server database named "prod" that is running on "localhost" port 1433, as "sa" user with "Test11" password. Ensure that the driver for SQL Server is installed. You can also provide driver name for some specific driver, otherwise "sqlserver42" will be used, and then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

sqlserver42

Host

localhost

Port

1433

Database

prod

Import Query

select id, name, email, phone from users;

Number of Splits to Generate

1

Username

sa

Password

Test11

For example, if the 'id' column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

field name

type

field name

type

id

int

name

string

email

string

phone

string

Data Type Mapping

SQL Server Data Type

CDAP Schema Data Type

Comments

SQL Server Data Type

CDAP Schema Data Type

Comments

BIGINT

long

Ā 

BINARY

bytes

Ā 

BIT

boolean

Ā 

CHAR

string

Ā 

DATE

datetime

Ā 

DATETIME

datetime (version 1.5.5 and later)

timestamp (version 1.5.4 and earlier)

Users can manually set output schema to map it to timestamp.

DATETIME2

datetime (version 1.5.5 and later)

timestamp (version 1.5.4 and earlier)

Users can manually set output schema to map it to timestamp.

DATETIMEOFFSET

datetime (version 1.5.5 and later)

string (version 1.5.4 and earlier)

Users can manually set output schema to map it to string.

DECIMAL

decimal

Ā 

FLOAT

double

Ā 

IMAGE

bytes

Ā 

INT

int

Ā 

MONEY

decimal

Ā 

NCHAR

string

Ā 

NTEXT

string

Ā 

NUMERIC

decimal

Ā 

NVARCHAR

string

Ā 

NVARCHAR(MAX)

string

Ā 

REAL

float

Ā 

SMALLDATETIME

timestamp

Ā 

SMALLINT

int

Ā 

SMALLMONEY

decimal

Ā 

TEXT

string

Ā 

TIME

time

TIME data type has the accuracy of 100 nanoseconds which is not currently supported. Values of this type will be rounded to microsecond.

TINYINT

int

Ā 

UDT

bytes

UDT types are mapped according to the type they are an alias of. For example, is there is an 'SSN' type that was created as'CREATE TYPE SSN FROM varchar(11);', that type would get mapped to a CDAP string. Common Language Runtime UDTs are mapped to CDAP bytes.

UNIQUEIDENTIFIER

string

Ā 

VARBINARY

bytes

Ā 

VARBINARY(MAX)

bytes

Ā 

VARCHAR

string

Ā 

VARACHAR(MAX)

string

Ā 

XML

string

Ā 

SQLVARIANT

string

Ā 

GEOMETRY

bytes

Ā 

GEOGRAPHY

bytes

Ā 

Created in 2020 by Google Inc.