HTTP Poller Streaming Source

The HTTP Poller Streaming source is available in the Hub.

Plugin version: 2.11.0

This is a streaming source that will fetch data from a specified URL at a given interval and pass the results to the next plugin. This source will return one record for each request to the specified URL. The record will contain a timestamp, the URL that was requested, the response code of the response and the set of response headers in a map<string, string> format, and the body of the response.

The source is used whenever you need to fetch data from a URL at a regular interval. It could be used to fetch Atom or RSS feeds regularly, or to fetch the status of an external system.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Reference Name

No

Required. This will be used to uniquely identify this source for lineage, annotating metadata, etc.

URL

Yes

Required. The URL to fetch data from.

Interval

No

Required The time to wait between fetching data from the URL in seconds.

Default is 60.

Request Headers

Yes

Optional. An optional string of header values to send in each request where the keys and values are delimited by a colon (“:”) and each pair is delimited by a newline (“\n”).

Charset

No

Optional. The charset of the content returned by the URL.

Default is UTF-8.

Should Follow Redirects

Yes

Optional. Whether to automatically follow redirects.

Default is true.

Connect Timeout

Yes

Optional. The time in milliseconds to wait for a connection. Set to 0 for infinite.

Default is 60000 (1 minute).

Read Timeout

No

Optional. The time in milliseconds to wait for a read. Set to 0 for infinite. Defaults to 60000 (1 minute).

Example

This example fetches data from a URL every hour using a custom user agent:

Property

Value

Property

Value

Reference Name

poller

URL

http://example.com/sampleEndpoint

Interval

60

Request Headers

User-Agent:HydratorPipeline\nAccept:application/json

The contents will output records with this schema:

field name

type

field name

type

ts

long

url

string

responseCode

int

headers

map<string, string>

body

string

All fields will be always be included, but the body might be an empty string.



Created in 2020 by Google Inc.