Supporting wide Salesforce objects
Introduction
When objects have lots of (e.g. 800) attributes, the SOQL for these objects can hit URL length limits. This page documents an approach to handle this limit, by splitting attributes into multiple batches, without the knowledge of users
Use case(s)
I have a Salesforce object with over 800 attributes. When I send a SOQL query for this object with all attributes in it to the Salesforce APIs, I hit URL length limits. To get around this situation, I have wrapped my code for querying Salesforce into a script that retrieves data for an object in batches, and then joins the batches together. This process is extremely manual, custom, one-off and error prone, and therefore extremely hard to manage and maintain. I would like the ability to for the Salesforce plugins in CDAP to batch the attributes before submitting the SOQL to Salesforce, and handle this grunt work automatically.
User Storie(s)
Plugin Type
This is not a new plugin, but enhancements to existing plugins.
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
|
|
|
|
|
|
|
|
Design / Implementation Tips
Tip #1
Tip #2
Design
Approach(s)
[1] https://developer.salesforce.com/docs/atlas.en-us.salesforce_app_limits_cheatsheet.meta/salesforce_app_limits_cheatsheet/salesforce_app_limits_platform_soslsoql.htm
[2] https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/system_fields.htm
[3] https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_calls_retrieve.htm
Detailed flow
Initial wide object SOQL: `SELECT Id, Name, Type ...., Column_800__c from sObject`
Extract query fields from wide SOQL and replace with field `Id`.
Extracted fieldList: `Id, Name, Type ...., Column_800__c`
Generated SOQL: `select id from sObject`
Execute SOQL using standard flow (Bulk API)
Execution result:
Total count of records: 6005
Number of batches: 1
Result CSV
Each resulting batch will be processed using standard flow: for each batch new RecordReader will be created
RecordReader will retrieve list of Ids from assigned batchId. This Ids will be split into sub-lists by 2000 records (in our case to 4):
SubList_A: "1", ..., "2000"
SubList_B: "2001", ..., "4000"
SubList_C: "4001", ..., 6000
SubList_D: "6001", ..., 6005
Each sub-list will be executed in parallel through SOAP API.
`connection.retrieve(fieldList, sObject, SubList_A)`
`connection.retrieve(fieldList, sObject, SubList_B)`
`connection.retrieve(fieldList, sObject, SubList_C)`
`connection.retrieve(fieldList, sObject, SubList_D)`
All responses will be transformed to CDAP Records and exposed to further CDAP transformations.
Properties
Security
Limitation(s)
To add wide sObject query support SOAP API will be used but it has some limitations described below.
Single SOAP API retrieve() call can contain a maximum of 2000 records [1] while Bulk API batch can contain a maximum of 10,000 records. [2]
Since first we will fetch all IDs via Bulk API and then process them by 2000 via SOAP API, this may lead to performance degradation compared to the queries executed via Bulk API only.
[1] https://developer.salesforce.com/docs/atlas.en-us.218.0.api.meta/api/sforce_api_calls_retrieve.htm
[2] https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/asynch_api_concepts_limits.htm
Future Work
Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999
Test Case(s)
Test case #1
Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
- 1 Use case(s)
- 2 User Storie(s)
- 3 Plugin Type
- 4 Configurables
- 5 Design / Implementation Tips
- 6 Design
- 6.1 Approach(s)
- 6.1.1 Detailed flow
- 6.2 Properties
- 6.3 Security
- 6.4 Limitation(s)
- 6.5 Future Work
- 6.1 Approach(s)
- 7 Test Case(s)
- 8 Sample Pipeline
- 8.1 Pipeline #1
- 8.2 Pipeline #2
Checklist