Salesforce Batch Source

The Salesforce batch source plugin is available in the Hub.

Plugin version: 1.5.0

This source reads sObjects from Salesforce. Examples of sObjects are opportunities, contacts, accounts, leads, any custom object, etc.

The data which should be read is specified using SOQL queries (Salesforce Object Query Language queries) or using sObject and incremental or range date filters.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Reference Name

No

 

Required. Used to uniquely identify this source for lineage, annotating metadata, etc.

Use Connection (yes/no toggle)

No

1.5.0

Optional. Use an existing connection. If a connection is used, you do not need to provide the credentials.

Browse Connections

Yes

1.5.0

Optional. Name of the connection to use.

Username

Yes

 

Required. Salesforce username.

Password

Yes

 

Required. Salesforce password.

Security Token

Yes

 

Optional. Salesforce security token. If the password does not contain the security token the plugin will append the token before authenticating with Salesforce.

Consumer Key

Yes

 

Required. Application Consumer Key. This is also known as the OAuth client ID. A Salesforce connected application must be created in order to get a consumer key.

Consumer Secret

Yes

 

Required. Application Consumer Secret. This is also known as the OAuth client secret. A Salesforce connected application must be created in order to get a client secret.

Login Url

Yes

 

Required. Salesforce OAuth2 login URL.

Default is https://login.salesforce.com/services/oauth2/token

Connection Timeout

Yes

1.4.4

Optional. Maximum time in milliseconds to wait for connection initialization before it times out.

Default is 30000 milliseconds.

Proxy URL

Yes

1.4.5

Optional. Proxy URL. Must contain a protocol, address and port.

SOQL

Yes

 

Optional. An SOQL query to fetch data into source.

Examples:
SELECT Id, Name, BillingCity FROM Account
SELECT Id FROM Contact WHERE Name LIKE 'A%' AND MailingCity = 'California'

SObject Name

Yes

 

Optional. Salesforce object name to read. If value is provided, plugin will get all fields for this object from Salesforce and generate SOQL query (select <FIELD_1, FIELD_2, ..., FIELD_N> from ${sObjectName}). Ignored if SOQL query is provided.

There are also sObjects that are not supported in the Bulk API of Salesforce. When a job is created using an object that is not supported in the Bulk API, “Entity is not supported by the Bulk API“ is thrown. These objects are also not supported by Einstein Analytics as it also uses Bulk API for querying data.

Below is a non-comprehensive list of sObjects that are not currently available in the Bulk API:

  • *Feed (e.g. AccountFeed, AssetFeed, …)

  • *Share (e.g. AccountBrandShare, ChannelProgramLevelShare, …)

  • *History (e.g. AccountHistory, ActivityHistory, …)

  • *EventRelation (e.g. AcceptedEventRelation, DeclinedEventRelation, …)

  • AggregateResult

  • AttachedContentDocument

  • CaseStatus

  • CaseTeamMember

  • CaseTeamRole

  • CaseTeamTemplate

  • CaseTeamTemplateMember

  • CaseTeamTemplateRecord

  • CombinedAttachment

  • ContentFolderItem

  • ContractStatus

  • EventWhoRelation

  • FolderedContentDocument

  • KnowledgeArticleViewStat

  • KnowledgeArticleVoteStat

  • LookedUpFromActivity

  • Name

  • NoteAndAttachment

  • OpenActivity

  • OwnedContentDocument

  • PartnerRole

  • RecentlyViewed

  • ServiceAppointmentStatus

  • SolutionStatus

  • TaskPriority

  • TaskStatus

  • TaskWhoRelation

  • UserRecordAccess

  • WorkOrderLineItemStatus

  • WorkOrderStatus

Cases when BULK API is not used: When query length sends the query to Salesforce in order to receive the array of batch info, there is one case in which Bulk API is not used. If query is within the limit, it executes the original query, otherwise, switches to wide object logic, for example generates Id query to retrieve batch info only for Ids that will be used later to retrieve data using SOAP API.

Last Modified After

Yes

 

Optional. Filter data to only include records where the system field LastModifiedDate is greater than or equal to the specified date. The date must be provided in the Salesforce date format. See below for Salesforce date format examples.

If no value is provided, no lower bound for LastModifiedDate is applied.

Last Modified Before

Yes

 

Optional. Filter data to only include records where the system field LastModifiedDate is less than the specified date. The date must be provided in the Salesforce date format. See below for Salesforce date format examples.

Specifying this along with Last Modified After allows reading data modified within a specific time window. If no value is provided, no upper bound for LastModifiedDate is applied.

Duration

Yes

 

Optional. Filter data read to only include records that were last modified within a time window of the specified size. For example, if the duration is ‘6 hours’ and the pipeline runs at 9am, it will read data that was last updated from 3am (inclusive) to 9am (exclusive). The duration is specified using numbers and time units:

  • Seconds

  • Minutes

  • Hours

  • Days

  • Months

  • Years

Several units can be specified, but each unit can only be used once. For example, 2 days, 1 hours, 30 minutes. The duration is ignored if a value is already specified for Last Modified After or Last Modified Before.

Offset

Yes

 

Optional. Filter data to only read records where the system field LastModifiedDate is less than the logical start time of the pipeline minus the given offset. For example, if duration is ‘6 hours’ and the offset is ‘1 hours’, and the pipeline runs at 9am, data last modified between 2am (inclusive) and 8am (exclusive) will be read. The duration is specified using numbers and time units:

  • Seconds

  • Minutes

  • Hours

  • Days

  • Months

  • Years

Several units can be specified, but each unit can only be used once. For example, 2 days, 1 hours, 30 minutes. The offset is ignored if a value is already specified for Last Modified After or Last Modified Before.

SOQL Operation Type

No

 

Optional. Specify the query operation to run on the table. If query is selected, only current records will be returned. If queryAll is selected, all current and deleted records will be returned.

Default operation is query.

Enable PK Chunking

Yes

 

Optional. Primary key (PK) Chunking splits query on large tables into chunks based on the record IDs, or primary keys, of the queried records.

Salesforce recommends that you enable PK chunking when querying tables with more than 10 million records or when a bulk query consistently times out. However, the effectiveness of PK chunking depends on the specifics of the query and the queried data. We do not recommend enabling PK chunking when querying a large table and filtering out most of the data. A separate query is created for each chunk, so enabling PK chunking on a large table can end up counting as thousands of queries against the quota.

For example, let’s say you enable PK chunking for the following query on an Account table with 10,000,000 records.

SELECT Name FROM Account

Assuming a chunk size of 250,000 the query is split into the following 40 queries. Each query is processed parallely

Queries:

SELECT Name FROM Account WHERE Id >= 001300000000000 AND Id < 00130000000132G

SELECT Name FROM Account WHERE Id >= 00130000000132G AND Id < 00130000000264W

SELECT Name FROM Account WHERE Id >= 00130000000264W AND Id < 00130000000396m

...

SELECT Name FROM Account WHERE Id >= 00130000000euQ4 AND Id < 00130000000fxSK

PK chunking works only with queries that don’t include SELECT clauses or conditions other than WHERE.

PK chunking only works with the following objects:

  • Account

  • AccountContactRelation

  • AccountTeamMember

  • AiVisitSummary

  • Asset

  • B2BMktActivity

  • B2BMktProspect

  • Campaign

  • CampaignMember

  • CandidateAnswer

  • Case

  • CaseArticle

  • CaseComment

  • Claim

  • ClaimParticipant

  • Contact

  • ContractLineItem

  • ConversationEntry

  • CustomerProperty

  • EinsteinAnswerFeedback

  • EmailMessage

  • EngagementScore

  • Event

  • EventRelation

  • FeedItem

  • Individual

  • InsurancePolicy

  • InsurancePolicyAsset

  • InsurancePolicyParticipant

  • Lead

  • LeadInsight

  • LiveChatTranscript

  • LoginHistory

  • LoyaltyLedger

  • LoyaltyMemberCurrency

  • LoyaltyMemberTier

  • LoyaltyPartnerProduct

  • LoyaltyProgramMember

  • LoyaltyProgramPartner

  • Note

  • ObjectTerritory2Association

  • Opportunity

  • OpportunityContactRole

  • OpportunityHistory

  • OpportunityLineItem

  • OpportunitySplit

  • OpportunityTeamMember

  • Pricebook2

  • PricebookEntry

  • Product2

  • ProductConsumed

  • ProductRequired

  • QuickText

  • Quote

  • QuoteLineItem

  • ReplyText

  • ScoreIntelligence

  • ServiceContract

  • Task

  • TermDocumentFrequency

  • TransactionJournal

  • User

  • UserRole

  • VoiceCall

  • WorkOrder

  • WorkOrderLineItem

Support also includes custom objects, and any Sharing and History tables that support standard objects.

Chunk Size

Yes

 

Optional. Specify size of chunk. Maximum Size is 250,000. Default Size is 100,000.

SObject Parent Name

Yes

 

Optional. Parent of the Salesforce Object. This is used to enable chunking for history tables or shared objects.

Salesforce Date Format Examples

Format

Format Syntax

Example

Format

Format Syntax

Example

Date, time, and time zone offset

YYYY-MM-DDThh:mm:ss+hh:mm

1999-01-01T23:01:01+01:00

 

YYYY-MM-DDThh:mm:ss-hh:mm

1999-01-01T23:01:01-08:00

 

YYYY-MM-DDThh:mm:ssZ

1999-01-01T23:01:01Z

Data Type Mapping

Salesforce Data Type

CDAP Schema Data Type

Salesforce Data Type

CDAP Schema Data Type

_bool

boolean

_int

int

_long

long

_double, currency, percent, geolocation (latitude), geolocation (longitude)

double

date

date

datetime

timestamp (microseconds)

time

time (microseconds)

picklist

string

multipicklist

string

combobox

string

reference

string

base64

string

textarea

string

phone

string

id

string

url

string

email

string

encryptedstring

string

datacategorygroupreference

string

location

string

address

string

anyType

string

json

string

complexvalue

string

 

Created in 2020 by Google Inc.