Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

The HTTP Streaming source plugin is available in the. Hub.

This plugin reads data from HTTP/HTTPS pages periodically. Paginated APIs are supported. For paginated APIs plugin reads available data and then waits for new pages to appear. Data in JSON, XML, CSV, TSV, TEXT and BLOB formats is supported.

Configuration

Property

Macro Enabled?

Description

General

Reference Name

No

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

URL

Yes

Required. Url to fetch to the first page. The url must start with a protocol (e.g. http://).

HTTP Method

Yes

Required. HTTP request method.

Headers

Yes

Optional. Headers to send with each HTTP request.

Request Body

Yes

Optional. Body to send with each HTTP request.

Max Pages Per Fetch

Yes

Optional. Maximum number of pages put to RDD in one blocking reading. Empty value means that the maximum is not enforced.

Format

Format

Yes

Required. Format of the HTTP response. This determines how the response is converted into output records. Possible values are:

  • JSON. Retrieves all records from the given json path and transforms them into records according to the mapping.

  • XML. Retrieves all records from the given XPath and transforms them into records according to the mapping.

  • TSV. Tab separated values. Columns are mapped to record fields in the order they are listed in schema.

  • CSV. Comma separated values. Columns are mapped to record fields in the order they are listed in schema.

  • Text. Transforms a single line of text into a single record with a string field body containing the result.

  • BLOB. Transforms the entire response into a single record with a byte array field body containing the result.

Default is json.

JSON/XML Result Path

Yes

Optional. Path to the results. When the format is XML, this is an XPath. When the format is JSON, this is a JSON path.

For examples, see below.

JSON/XML Fields Mapping

Yes

Optional. Mapping of fields in a record to fields in retrieved element. The left column contains the name of schema field. The right column contains path to it within a relative to an element. It can be either XPath or JSON path.

For an example, see below.

CSV Skip First Row

Yes

Optional. Whether to skip the first row of the HTTP response. This is usually set if the first row is a header row.

Default is false.

OAuth2

OAuth2 Enabled

No

Required. If true, plugin will perform OAuth2 authentication.

Default is False.

Auth URL

Yes

Optional. Endpoint for the authorization server used to retrieve the authorization code.

Token URL

Yes

Optional. Endpoint for the resource server, which exchanges the authorization code for an access token.

Client ID

Yes

Optional. Client identifier obtained during the Application registration process.

Client Secret

Yes

Optional. Client secret obtained during the Application registration process.

Scopes

Yes

Optional. Scope of the access request, which might have multiple space-separated values.

Refresh Token

Yes

Optional. Token used to receive accessToken, which is end product of OAuth2.

JSON/XML Result Path Examples

JSON path example:

{
     "errors": [],
     "response": {
       "books": [
         {
           "id": "1159142",
           "title": "Agile Web Development with Rails",
           "author": "Sam Ruby, Dave Thomas, David Heinemeier Hansson",
           "printInfo": {
             "page": 488,
             "coverType": "hard",
             "publisher": "Pragmatic Bookshelf"
           }
         },
         {
           "id": "2375753",
           "title": "Flask Web Development",
           "author": "Miguel Grinberg",
           "printInfo": {
             "page": 543,
             "coverType": "hard",
             "publisher": "O'Reilly Media, Inc"
           }
         },
         {
           "id": "547307",
           "title": "Alex Homer, ASP.NET 2.0 Visual Web Developer 2005",
           "author": "David Sussman",
           "printInfo": {
             "page": 543,
             "coverType": "hard",
             "publisher": "unknown"
           }
         }
       ]
     }
}

The JSON path to fetch books is /response/books. However, if we need to fetch only printInfo, we can specify /response/books/printInfo as well.

XPath example:

        Giada De Laurentiis
        2005
        
         15.0
         Discount up to 50%
        
     
     
        
        James McGovern
        Per Bothner
        2003
        
         49.99
         No discount
        
     
     ...
  
  
     ...
  

XPath to fetch all books is /bookstores/bookstore/book. However a more precise selections can be done. E.g. /bookstores/bookstore/book[@category='web'].

XPath to fetch all books is /bookstores/bookstore/book. However a more precise selections can be done. E.g. /bookstores/bookstore/book[@category='web'].

JSON/XML Fields Mapping Example

Example response:

{
   "startAt":1,
   "maxResults":5,
   "total":15599,
   "issues":[
      {
         "id":"20276",
         "key":"NETTY-14",
         "fields":{
            "issuetype":{
               "name":"Bug",
               "subtask":false
            },
            "fixVersions":[
               "4.1.37"
            ],
            "description":"Test description for NETTY-14",
            "project":{
               "id":"10301",
               "key":"NETTY",
               "name":"Netty-HTTP",
               "projectCategory":{
                  "id":"10002",
                  "name":"Infrastructure"
               }
            }
         }
      },
      {
         "id":"19124",
         "key":"NETTY-13",
         "fields":{
            "issuetype":{
               "self":"https://issues.cask.co/rest/api/2/issuetype/4",
               "name":"Improvement",
               "subtask":false
            },
            "fixVersions":[

            ],
            "description":"Test description for NETTY-13",
            "project":{
               "id":"10301",
               "key":"NETTY",
               "name":"Netty-HTTP",
               "projectCategory":{
                  "id":"10002",
                  "name":"Infrastructure"
               }
            }
         }
      }
   ]
}

Assume the result path is /issues.

The mapping is:

Field Name

Field Path

type

/fields/issuetype/name

description

/fields/description

projectCategory

/fields/project/projectCategory/name

isSubtask

/fields/issuetype/subtask

fixVersions

/fields/fixVersions

The result records are:

key

type

isSubtask

description

projectCategory

fixVersions

NETTY-14

Bug

false

Test description for NETTY-14

Infrastructure

[“4.1.37”]

NETTY-13

Improvement

false

Test description for NETTY-13

Infrastructure

[]

Note that field key was mapped without being included into the mapping. Mapping entries like key: /key can be omitted as long as the field is present in schema.

  • No labels