Page Comparison

Table of Contents

...

As a framework Apache HttpComponents HttpClient is be used, a successor of Commons HttpClient.

It seems the most widely used/supported by community framework. It is very simple to find all kind of solutions and workaround already implemented, which makes plugin development and maintenance easy. Framework has a built in support for compession, https tunneling, digest auth and lot of other functions.

Properties:

Section

Name

Description

Default

Widget

Validations

General

URL

The url we will request. "{pagination.index}" can be included into url to represent a changing part needed for some pagination strategies.

E.g:

https://my.api.com/api/v1/user?maxResults=10&name=John&pageNumber={pagination_index}

Text Box

Validate it contains protocol.

HTTP Method

Possible values:

GET
PUT
POST
DELETE
HEAD

GET

Radio group

Headers

Key-value map of headers

KeyValue

DropdownThis


Request Body			Text Area	No validation [1]

Connect Timeout

Maximum seconds to connect to server. (seconds)

0 - wait forever

120Text BoxIf is_ number and >=0Read Timeout

Maximum seconds to wait for data. (seconds)

0 - wait forever

120Text BoxIf is_ number and >=0Error Handling
Error Handling Per Status

Error Handling

HTTP Errors Handling

This is a map in which user can define which error status codes produce which results. Possible values are: RETRY, FAIL, SKIP, SEND_TO_ERROR

.

, ALERT

Example:

500: RETRY

404: SEND_TO_ERROR

*: FAIL

Wildcard (*) means "otherwise" or "for all other codes do ..".

If the field is empty. Any status_code>=400 will yield a pipeline failure.

KeyValue Dropdown

If using SEND_TO_ERROR or SKIP or SEND_TO_ALERTS and current pagination type does not support it throw a validation error. [2]

Non-HTTP Error Handling

Handling of type casting and any other unhandled exceptions thrown during transformation of a record:

Possible values are:

"Skip on error" - ignores any errors
"Stop on error" - fails pipeline
"Send to error" - send to error handler

Stop on error

Dropdown list

If using "Send to error" or "Skip on error" and current pagination type does not support it throw a validation error. [2]

Retry

IntervalThe interval between retries (seconds)30Text BoxIf is_ number and >=0Retry CountTotal number of retries to make before failing5Text BoxIf is_ number and >=0Basic authenticationUsernameUsed for basic authentication.Text BoxPasswordUsed for basic authentication.

Policy	Possible values are: Exponential Linear	Exponential	Radio group
Linear Retry Interval	The interval between retries (seconds)	30	Number	if not set and retryPolicy is linear, fail.
Max retry duration	Max seconds it takes to do retries	600	Number
Connect Timeout	Maximum seconds to connect to server. (seconds) 0 - wait forever	120	Number
Read Timeout	Maximum seconds to wait for data. (seconds) 0 - wait forever	120	Number
Basic authentication	Username	Used for basic authentication.		Text Box
Basic authentication	Password	Used for basic authentication.		Password
HTTP Proxy:	Proxy

URI

URL

Example: http://proxy.com:8080

Note for me: test this with https proxies.

Text Box

Username

Text Box

Password

[1] Unfortunately we cannot do validation here. Even though most commonly body in API requests is a JSON for JSON APIs or an XML for XML SOAP APIs. Theoretically it can be anything.

[2] Pagination types, where next page url is on previous taken from the previous page, are the one which do not support SEND_TO_ERROR or IGNORE.

...

Parallelization

There are two reasons why we should not parallelize the requests:

...

Name	Description	Default	Widget	Validations
Pagination type	Possible values are: None Link in response header Link in response body Increment an index Token in Response Body Custom	None	Dropdown listSelect	"Link in response body": Next Page field is set "Increment an index": {pagination.indexToken in Response Body": "Next Page Token Path" and "Next Page Url Parameter" are set. "Increment an index": {pagination.index} is in url, start index, increment are set. Custom: python code is set.
Start Index	Initial value for index which replaces {pagination.index} in url. See example here		Text Box	If set and pagination type is not "Increment an index", fail. If set and no {pagination.index} in url, fail. Assert if is_number
Max Index	Max value for index which replaces {pagination.index} in url. If this is empty, plugin will load pages until no results or 404 is returned. Also plugin may stop iteration before max index is reached, if no more records.		Text Box	If set and pagination type is not "Increment an index", fail. If set and no {pagination.index} in url, fail. Assert if is_number
Index Increment	Increment value for index which replaces {pagination.index} in url.		Text Box	If set and pagination type is not "Increment an index", fail. If set and no {pagination.index} in url, fail. Assert if is_number
Next Page JSON/XML Field Path	Link to a field which in JSON or an XML containing next page url. See an example here		Text Box	If set and pagination type is not "Link in response body", fail. If the content type is not XML or JSON, fail.	Custom Pagination Python Code	A code fragment which determines how next page url is generated and also when to finish iteration. For more info see Custom Pagination	Python code
Next Page Token Path	Link to a field in JSON or an XML containing next page token.			If set and pagination type is not " Custom Token in Response Body", fail.
Wait time between pages	The number of milliseconds to wait before requesting the next page.	1000	Text Box	Assert if is_number and > 0.

Pagination type is none

Plugin will request a single page.

Pagination via url from response header

When accessing the page the response header contains a link to next page:

...

If the content type is not XML or JSON, fail. Validate to have at least one element
Next Page Url Parameter	For type "Token in Response Body" this is used as next page token name in added to url		Text Box	If set and pagination type is not "Token in Response Body", fail.
Custom Pagination Python Code	A code fragment which determines how next page url is generated and also when to finish iteration. For more info see Custom Pagination		Python code	If set and pagination type is not "Custom" fail.
Wait time between pages	The number of milliseconds to wait before requesting the next page.	1000	Number	Assert if is_number and > 0. If not set and Pagination type is non 'None' fail.

The above is a bit messy cause we cannot dynamically change the content of widget depending on pagination type. Which makes it a mix of properties for different pag_types. Is not super user-friendly for end-user. For now I will a placeholder which says which pagination type property coresponds to.

Pagination type is none

Plugin will request a single page.

Pagination via url from response header

When accessing the page the response header contains a link to next page:

Code Block

Link: <http://helloworld.voog.co/admin/api/pages?page=1&q.language.id=1>; rel="first",
<http://helloworld.voog.co/admin/api/pages?page=2&q.language.id=1>; rel="next", # <-------- HERE IT IS
<http://helloworld.voog.co/admin/api/pages?page=2&q.language.id=1>; rel="last"

...

The plugin stops reading when a page returns no records or 404. Or when reached max_index (if it's not empty)

...

Different APIs use very different styles of pagination. In the simple cases they return link in header or some field of response JSON.

But here

Pagination by next page token

Here's an example of pagination from youtube API. NextPageToken field contains a token, which should be included in url to get next page. "&page_token=CAEQAA"

...

Code Block
${url} ${url}&nextPageToken=${nextPageToken1} ${url}&nextPageToken=${nextPageToken2} ...

...

${url}&nextPageToken=${nextPageToken2}
...

Anchor
custom_pagination
custom_pagination
Custom pagination

Different APIs use very different styles of pagination. In the simple cases they return link in header or some field of response JSON.

For example API where user wants to paginate by time in the following way: &start_time={something}&end_time={something+10000}. Two dependent variables are involved here. It would be very problematic to give ability to configure something like this via widget.
Let's images another case. User wants to download a webserver directory. So "pages" in this case are files on webserver. Let's say he analyses/backups a whole site. So we need to paginate based on results from parsing HTML.
Let's assume another example. User wants to skip certain pages in API. Let's say the API pagination is time based, meaning something like this is appended to url "&start_time=1389075585". But he only wants to get pages for the weekends.

...

Code Block

context.start_time = 1389075585

def get_next_page_url(url, page, headers):
  context.start_time += 10000
  end_time = context.start_time + 10000
  context.next_page = url  = url + '&start_time=' + str(start_time) + '&startend_time=' + str(start_time) + '&end_time=' + str(end_time)

For this Jython is used, so user does not need to have Python installed. "Context" object is a java object exposes to Python.

Transforming API responses into Records

No automatic schema generation is implemented. Since we don't know the value types.

Properties:

...

Possible values:

JSON
XML
Delimited
Text

...

For JSON a simple slash separated path is used e.g. /library/books/items.

For XML an XPath is used.

...

' + str(end_time)

For this Jython is used, so user does not need to have Python installed. "Context" object is a java object exposes to Python.

Transforming API responses into Records

No automatic schema generation is implemented. Since we don't know the value types.

Properties:

Section

Name

Description

Default

Widget

Validations

Format

Format

Possible values:

JSON
XML
TSV
CSV
Text
Blob

Dropdown list

JSON/XML Result Path

For JSON a simple slash separated path is used e.g. /library/books/items.

For XML an XPath is used.

Text Box

Fail if used with non JSON/XML format

JSON/XML Fields Mapping

Mapping of schema field name to jsonPath (past the result path).

Example (Jira API):

FieldName	FieldPath
name	/key
type	/fields/issuetype/name
description	/fields/description
projectCategory	/fields/project/projectCategory/name
isSubtask	/fields/issuetype/subtask
fixVersions	/fields/fixVersions

Schema fields which are not in the map, will use fieldName:/fieldName mapping.

if key is not present in schema fail
if used for non JSON/XML fail

1 JSON format

JSON entries are converted into StructuredRecord using StructuredRecordStringConverter.java

...

Fields which are not present in schema get skipped.
Schema fields, which are not present in JSON response, are set to null.
If no fields from schema are found an exception is thrown. Which is handled according to "Non-HTTP Error Handling" property value.

2 XML format

...

Error Handling" property value.

2 XML format

We may add functionality for XML parsing to separate project so other projects can re-use that.

XML below will be used as basis for examples in this section.

...

2.1 STEP 1 - Get XML by XPath

XML parsing is done by default Java DOM parser. Which is able to get items by a specified XPath. XPath is super flexible it allows user to get nodes by attribute value, as well as to group nodes from different parents into single result, as well as chose nodes conditionally etc. etc.

Some XPath examples:

Code Block
/bookstores/bookstore/book[position()<3] //title[@lang] //title[@lang='en'] /bookstores/bookstore/book/price[text()] # convert all subelements to string /bookstores/bookstore/book[price>35.00]/title

...

Code Block
year: string author: unionarray price: record - value:double - policy:string category: string title: record - lang:string - content:string

...

Name	Description	Default	Widget	Validations
OAuth2 Enabled	True or false.	false	Radio group
Auth URL	A page, where the user is directed to enter his credentials. Example: https://www.facebook.com/dialog/oauth		Text Box	Assert to be empty if OAuth2 is disabled and the not empty if enabled.
Token URL	A page, where CDAP can exchange authCode for accessToken and refreshToken. Or refresh the accessToken. Example: https://graph.facebook.com/v3.3/oauth/access_token		Text Box	Assert to be empty if OAuth2 is disabled and the not empty if enabled.
Client ID	User should obtain this when registering the OAuth2 application in the service (e.g. Twitter).		Text Box	Assert to be empty if OAuth2 is disabled and the not empty if enabled.
Client Secret	User should obtain this when registering the OAuth2 application in the service (e.g. Twitter).		Password	Assert to be empty if OAuth2 is disabled and the not empty if enabled.
Scope	This is optional. Scope is a mechanism in OAuth 2.0 to limit an application's access to a user's account. An application can request one or more scopes, this information is then presented to the user in the consent screen, and the access token issued to the application will be limited to the scopes granted.		Text Box	Assert to be empty if OAuth2 is disabled.
Refresh Token	This is populated by the button "Login via OAuth 2.0". Since we save Refresh Token (not an access token which is short lived), this should be done only once, during initial pipeline deployment. For more information click here. UI should put an actual value into secure store and put macro function ${secure(key)} a value for extra safety.			Fail is empty and OAuth2 is enabled.

SSL/TLS

...

Some general definitions for more context:
...
Should we provide an option for user to skip identity check during HTTPs connection? This is not recommended anywhere you read about it, but it might be useful in case user is testing some API which is in development stage.
If url starts with "https" the plugin by default will try to use TLS.
Name Description Default Widget Validations
Verify HTTPs Trust Certificates If false will allow connection to untrusted https sources. true

Keystore File Path to a keystore file
Text Box Check if file exists
Keystore Type
According to Oracle docs. There are 3 supported keystore types.
Possible values:
Java KeyStore (JKS)
Java Cryptography Extension KeyStore (JCEKS)
PKCS #12
JKS Radio Group
Keystore Password Leave empty if keystore is not password protected
Password Try to load keystore with given password
Keystore Key Algorithm SunX509 is default in Java. SunX509 Text Box
TrustStore File Path to a truststore file. If empty use default Java truststores.
Text Box Check if file exists
TrustStore Type
According to Oracle docs. There are 3 supported truststore types.
Possible values:
Java KeyStore (JKS)
Java Cryptography Extension KeyStore (JCEKS)
PKCS #12
JKS Radio Group
TrustStore Password Leave empty if keystore is not password protected
Password Try to load truststore with given password
Truststore Trust Algorithm
SunX509 Text Box
Transport Protocols User can add multiple protocols. Which will be offered by client during handshake. TLSv1.2 Array Validate if names are correct
Cipher Suites
User can add multiple cipher suites. They will be offered by client during handshake.
If empty use default cipher suites.
This is textBox with comma separated list of ciphers. Since sometimes there can be 20, 30 or more ciphers it is not usable for user to add every one of them manually into an array.

Text Box
Validate if supported by current java implementation
...

Versions Compared

Old Version 63

New Version Current

Key

Parallelization

Pagination type is none

Pagination via url from response header

Pagination type is none

Pagination via url from response header

Pagination by next page token

Anchor
custom_pagination
custom_pagination
Custom pagination

Transforming API responses into Records

Transforming API responses into Records

1 JSON format

2 XML format

2 XML format

2.1 STEP 1 - Get XML by XPath

SSL/TLS

Name	Description	Default	Widget	Validations
Verify HTTPs Trust Certificates	If false will allow connection to untrusted https sources.	true
Keystore File	Path to a keystore file		Text Box	Check if file exists
Keystore Type	According to Oracle docs. There are 3 supported keystore types. Possible values: Java KeyStore (JKS) Java Cryptography Extension KeyStore (JCEKS) PKCS #12	JKS	Radio Group
Keystore Password	Leave empty if keystore is not password protected		Password	Try to load keystore with given password
Keystore Key Algorithm	SunX509 is default in Java.	SunX509	Text Box
TrustStore File	Path to a truststore file. If empty use default Java truststores.		Text Box	Check if file exists
TrustStore Type	According to Oracle docs. There are 3 supported truststore types. Possible values: Java KeyStore (JKS) Java Cryptography Extension KeyStore (JCEKS) PKCS #12	JKS	Radio Group
TrustStore Password	Leave empty if keystore is not password protected		Password	Try to load truststore with given password
Truststore Trust Algorithm		SunX509	Text Box
Transport Protocols	User can add multiple protocols. Which will be offered by client during handshake.	TLSv1.2	Array	Validate if names are correct
Cipher Suites	User can add multiple cipher suites. They will be offered by client during handshake. If empty use default cipher suites. This is textBox with comma separated list of ciphers. Since sometimes there can be 20, 30 or more ciphers it is not usable for user to add every one of them manually into an array.		Text Box	Validate if supported by current java implementation

Page Comparison

Versions Compared

Old Version 63

New Version Current

Key

Parallelization

Pagination type is none

Pagination via url from response header

Pagination type is none

Pagination via url from response header

Pagination by next page token

Anchorcustom_paginationcustom_paginationCustom pagination

Transforming API responses into Records

Transforming API responses into Records

1 JSON format

2 XML format

2 XML format

2.1 STEP 1 - Get XML by XPath

SSL/TLS

Anchor
custom_pagination
custom_pagination
Custom pagination