HTTP To HDFS Action

The HTTP to HDFS action plugin is available in the Hub.

Plugin version: 1.3.0

Action to fetch data from an external http endpoint and create a file in HDFS.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

URL

Yes

Required. The URL to fetch data from.

HDFS File Path

Yes

Required. The location to write the data in HDFS. If the file already exists, it will be overwritten.

HTTP Method

No

Required. The HTTP request method. GET and POST are the allowed methods.

Default is GET.

Request Body

Yes

Optional. Request body.

Request Headers

Yes

Optional. An optional string of header values to send in each request where the keys and values are delimited by a colon (“:”) and each pair is delimited by a newline (“\n”).

Output File Format

No

Required. Output data should be written as Text (JSON, XML, txt files) or Binary (zip, gzip, images).

Default is Text.

Charset for Text

No

Required. If text data is selected, this should be the charset of the text being returned. 

Default is UTF-8.

Should Follow Redirects ?

No

Required. Whether to automatically follow redirects. 

Default is true.

Disable SSL Validation

No

Required. If user enables SSL validation, they will be expected to add the certificate to the trustStore on each machine. 

Default is true.

Number of Retries

No

Required. The number of times the request should be retried if the request fails. 

Default is 3.

Connection Timeout (milliseconds)

Yes

Optional. The time in milliseconds to wait for a connection. Set to 0 for infinite. 

Default is 60000 (1 minute).

Read Timeout (milliseconds)

Yes

Optional. The time in milliseconds to wait for a read. Set to 0 for infinite. 

Default is 60000 (1 minute).

Token Key for HDFS File Path

Yes

Optional. The key used to store the file path for the data that was written so that the file source can read from it. Plugins that run at later stages in the pipeline can retrieve the file path using this key through macro substitution:${filePath} where “filePath” is the key specified.

Default is filePath.

Token Key for Response Headers

Yes

Optional. The key used to store the response headers so that they are available to other plugins down the line. Plugins that run at later stages in the pipeline can retrieve the response headers using this through macro substitution:${responseHeaders} where “responseHeaders” is the key specified. 

Default is responseHeaders.

Example

This example performs HTTP GET request to http://example.com/data and downloads the csv file to /tmp/data.csv.

Property

Value

Property

Value

URL

http://example.com/data

HDFS File Path

/tmp/data.csv

HTTP Method

GET

Output File Format

Text

Charset for Text

UTF-8

Should Follow Redirects ?

true

Disable SSL Validation

true

Number of Retries

0

Connection Timeout (milliseconds)

60000

Read Timeout (milliseconds)

60000

Created in 2020 by Google Inc.