The HTTP Streaming source plugin is available in the. Hub.
This plugin reads data from HTTP/HTTPS pages periodically. Paginated APIs are supported. For paginated APIs plugin reads available data and then waits for new pages to appear. Data in JSON, XML, CSV, TSV, TEXT and BLOB formats is supported.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
General | ||
Reference Name | No | Required. Name used to uniquely identify this source for lineage, annotating metadata, etc. |
URL | Yes | Required. Url to fetch to the first page. The url must start with a protocol (e.g. http://). |
HTTP Method | Yes | Required. HTTP request method. |
Headers | Yes | Optional. Headers to send with each HTTP request. |
Request Body | Yes | Optional. Body to send with each HTTP request. |
Max Pages Per Fetch | Yes | Optional. Maximum number of pages put to RDD in one blocking reading. Empty value means that the maximum is not enforced. |
Format | ||
Format | Yes | Required. Format of the HTTP response. This determines how the response is converted into output records. Possible values are:
Default is json. |
JSON/XML Result Path | Yes | Optional. Path to the results. When the format is XML, this is an XPath. When the format is JSON, this is a JSON path. For examples, see below. |
JSON/XML Fields Mapping | Yes | Optional. Mapping of fields in a record to fields in retrieved element. The left column contains the name of schema field. The right column contains path to it within a relative to an element. It can be either XPath or JSON path. For an example, see below. |
CSV Skip First Row | Yes | Optional. Whether to skip the first row of the HTTP response. This is usually set if the first row is a header row. Default is false. |
OAuth2 | ||
OAuth2 Enabled | No | Required. If true, plugin will perform OAuth2 authentication. Default is False. |
Auth URL | Yes | Optional. Endpoint for the authorization server used to retrieve the authorization code. |
Token URL | Yes | Optional. Endpoint for the resource server, which exchanges the authorization code for an access token. |
Client ID | Yes | Optional. Client identifier obtained during the Application registration process. |
Client Secret | Yes | Optional. Client secret obtained during the Application registration process. |
Scopes | Yes | Optional. Scope of the access request, which might have multiple space-separated values. |
Refresh Token | Yes | Optional. Token used to receive accessToken, which is end product of OAuth2. |
JSON/XML Result Path Examples
JSON path example:
{ "errors": [], "response": { "books": [ { "id": "1159142", "title": "Agile Web Development with Rails", "author": "Sam Ruby, Dave Thomas, David Heinemeier Hansson", "printInfo": { "page": 488, "coverType": "hard", "publisher": "Pragmatic Bookshelf" } }, { "id": "2375753", "title": "Flask Web Development", "author": "Miguel Grinberg", "printInfo": { "page": 543, "coverType": "hard", "publisher": "O'Reilly Media, Inc" } }, { "id": "547307", "title": "Alex Homer, ASP.NET 2.0 Visual Web Developer 2005", "author": "David Sussman", "printInfo": { "page": 543, "coverType": "hard", "publisher": "unknown" } } ] } }
The JSON path to fetch books is /response/books
. However, if we need to fetch only printInfo
, we can specify /response/books/printInfo
as well.
XPath example:
Giada De Laurentiis 2005 15.0 Discount up to 50% James McGovern Per Bothner 2003 49.99 No discount ... ...
XPath to fetch all books is /bookstores/bookstore/book
. However a more precise selections can be done. E.g. /bookstores/bookstore/book[@category='web']
.
XPath to fetch all books is /bookstores/bookstore/book
. However a more precise selections can be done. E.g. /bookstores/bookstore/book[@category='web']
.
JSON/XML Fields Mapping Example
Example response:
{ "startAt":1, "maxResults":5, "total":15599, "issues":[ { "id":"20276", "key":"NETTY-14", "fields":{ "issuetype":{ "name":"Bug", "subtask":false }, "fixVersions":[ "4.1.37" ], "description":"Test description for NETTY-14", "project":{ "id":"10301", "key":"NETTY", "name":"Netty-HTTP", "projectCategory":{ "id":"10002", "name":"Infrastructure" } } } }, { "id":"19124", "key":"NETTY-13", "fields":{ "issuetype":{ "self":"https://issues.cask.co/rest/api/2/issuetype/4", "name":"Improvement", "subtask":false }, "fixVersions":[ ], "description":"Test description for NETTY-13", "project":{ "id":"10301", "key":"NETTY", "name":"Netty-HTTP", "projectCategory":{ "id":"10002", "name":"Infrastructure" } } } } ] }
Assume the result path is /issues
.
The mapping is:
Field Name | Field Path |
---|---|
type | /fields/issuetype/name |
description | /fields/description |
projectCategory | /fields/project/projectCategory/name |
isSubtask | /fields/issuetype/subtask |
fixVersions | /fields/fixVersions |
The result records are:
key | type | isSubtask | description | projectCategory | fixVersions |
---|---|---|---|---|---|
NETTY-14 | Bug | false | Test description for NETTY-14 | Infrastructure | [“4.1.37”] |
NETTY-13 | Improvement | false | Test description for NETTY-13 | Infrastructure | [] |
Note that field key
was mapped without being included into the mapping. Mapping entries like key: /key
can be omitted as long as the field is present in schema.