Salesforce Multi Objects Batch Source

The Salesforce Multi Objects batch source plugin is available in the Hub.

Plugin version: 1.5.0

This source reads multiple sObjects from Salesforce. The data which should be read is specified using list of sObjects and incremental or range date filters. The source will output a record for each row in the SObjects it reads, with each record containing an additional field that holds the name of the SObject the record came from. In addition, for each SObject that will be read, this plugin will set pipeline arguments where the key is multisink.[SObjectName] and the value is the schema of the SObject.

Configuration

Property

Macro Enabled?

Version Introduced

Description

 

 

Property

Macro Enabled?

Version Introduced

Description

 

 

Reference Name

No

 

Required. Used to uniquely identify this source for lineage, annotating metadata, etc.

 

 

Use Connection (yes/no toggle)

No

1.5.0

Optional. Use an existing connection. If a connection is used, you do not need to provide the credentials.

 

 

Browse Connections

Yes

1.5.0

Optional. Name of the connection to use.

 

 

Username

Yes

 

Required. Salesforce username.

 

 

Password

Yes

 

Required. Salesforce password. The Salesforce API requires an additional "security token" to be added to the Password field to enable API access.

Also the Salesforce API can be used to verify if the query and token works as expected. 

If you do not include the security token, the following error occurs when you click Get Schema: "Connection to salesforce with plugin configurations failed".

 

 

Security Token

Yes

 

Optional. Salesforce security token. If the password does not contain the security token, the plugin will append the token before authenticating with Salesforce.

 

 

Consumer Key

Yes

 

Required. Application Consumer Key. This is also known as the OAuth client ID. A Salesforce connected application must be created in order to get a consumer key.

 

 

Consumer Secret

Yes

 

Required. Application Consumer Secret. This is also known as the OAuth client secret. A Salesforce connected application must be created in order to get a client secret.

 

 

Login Url

Yes

 

Required. Salesforce OAuth2 login URL.

Default is https://login.salesforce.com/services/oauth2/token

 

 

Connection Timeout

Yes

1.4.4

Optional. Maximum time in milliseconds to wait for connection initialization before it times out.

Default is 30000 milliseconds.

 

 

Proxy URL

Yes

1.4.5

Optional. Proxy URL. Must contain a protocol, address and port.

 

 

Whitelist

Yes

 

Optional. List of SObjects to read from. By default all SObjects will be white listed. For each white listed SObject, a SOQL query will be generated of the form: select <FIELD_1, FIELD_2, ..., FIELD_N> from ${sObjectName}.

 

 

Blacklist

Yes

 

Optional. List of SObjects to avoid reading from. By default NONE of SObjects will be black listed.

 

 

Last Modified After

Yes

 

Optional. Filter data to only include records where the system field LastModifiedDate is greater than or equal to the specified date. The date must be provided in the Salesforce date format. If no value is provided, no lower bound for LastModifiedDate is applied.

See below for Salesforce date format examples.

 

 

Last Modified Before

Yes

 

Optional. Filter data to only include records where the system field LastModifiedDate is less than the specified date. The date must be provided in the Salesforce date format. Specifying this along with Last Modified After allows reading data modified within a specific time window. If no value is provided, no upper bound for LastModifiedDate is applied.

See below for Salesforce date format examples.

 

 

Duration

Yes

 

Optional. Filter data read to only include records that were last modified within a time window of the specified size. For example, if the duration is ‘6 hours’ and the pipeline runs at 9am, it will read data that was last updated from 3am (inclusive) to 9am (exclusive). The duration is specified using numbers and time units:

  • Seconds

  • Minutes

  • Hours

  • Days

  • Months

  • Years

Several units can be specified, but each unit can only be used once. For example, 2 days, 1 hours, 30 minutes. The duration is ignored if a value is already specified for Last Modified After or Last Modified Before.

 

 

Offset

Yes

 

Optional. Filter data to only read records where the system field LastModifiedDate is less than the logical start time of the pipeline minus the given offset. For example, if duration is ‘6 hours’ and the offset is ‘1 hours’, and the pipeline runs at 9am, data last modified between 2am (inclusive) and 8am (exclusive) will be read. The duration is specified using numbers and time units:

  • Seconds

  • Minutes

  • Hours

  • Days

  • Months

  • Years

Several units can be specified, but each unit can only be used once. For example, 2 days, 1 hours, 30 minutes. The offset is ignored if a value is already specified for Last Modified After or Last Modified Before.

 

 

SOQL Operation Type

No

 

Optional. Specify the query operation to run on the table. If query is selected, only current records will be returned. If queryAll is selected, all current and deleted records will be returned.

Default operation is query.

 

 

SObject Name Field

No

 

Optional. The name of the field that holds the SObject name. Must not be the name of any SObject column that will be read. Defaults to tablename.

 

 

Salesforce Date Format Examples

Format

Format Syntax

Example

Format

Format Syntax

Example

Date, time, and time zone offset

YYYY-MM-DDThh:mm:ss+hh:mm

1999-01-01T23:01:01+01:00

 

YYYY-MM-DDThh:mm:ss-hh:mm

1999-01-01T23:01:01-08:00

 

YYYY-MM-DDThh:mm:ssZ

1999-01-01T23:01:01Z

Example

There are two SObjects of interest in Salesforce. The first SObject is named ‘Account’ and contains:

id

name

email

id

name

email

0

Samuel

sjax@example.net

1

Alice

a@example.net

The second is named ‘Activity’ and contains:

id

item

action

id

item

action

0

shirt123

view

0

carxyz

view

0

shirt123

buy

0

coffee

view

1

cola

buy

To read data from these two SObjects, both of them must be indicated in the White List configuration property.

The output of the the source will be the following records:

id

name

email

tablename

id

name

email

tablename

0

Samuel

sjax@example.net

Account

1

Alice

a@example.net

Account

id

item

action

tablename

id

item

action

tablename

0

shirt123

view

Activity

0

carxyz

view

Activity

0

shirt123

buy

Activity

0

coffee

view

Activity

1

cola

buy

Activity

The plugin will emit two pipeline arguments to provide multi sink plugin with the schema of the output records:

multisink.Account = { "type": "record", "name": "output", "fields": [ { "name": "Id", "type": "long" } , { "name": "Name", "type": "string" }, { "name": "Email", "type": [ "string", "null" ] } ] } multisink.Activity = { "type": "record", "name": "output", "fields": [ { "name": "Id", "type": "long" } , { "name": "Item", "type": "string" }, { "name": "Action", "type": "string" } ] }

Data Type Mapping

Salesforce Data Type

CDAP Schema Data Type

Salesforce Data Type

CDAP Schema Data Type

_bool

bool

_int

int

_long

long

_double, currency, percent

double

date

date

datetime

timestamp (microseconds)

time

time (microseconds)

picklist

string

multipicklist

string

combobox

string

reference

string

base64

string

textarea

string

phone

string

id

string

url

string

email

string

encryptedstring

string

datacategorygroupreference

string

location

string

address

string

anyType

string

json

string

complexvalue

string

 

Created in 2020 by Google Inc.