Elasticsearch Batch Source

The Elasticsearch Batch source plugin is available in the Hub.

Plugin version: 1.10.1

Pulls documents from Elasticsearch according to the query specified by the user and converts each document to a Structured Record with the fields and schema specified by the user. The Elasticsearch server should be running prior to creating the application.

This source is used whenever you need to read data from Elasticsearch. For example, you may want to read in an index and type from Elasticsearch and store the data in an HBase table.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Reference Name

No

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Elasticsearch Host

Yes

Required. The hostname and port for the Elasticsearch instance.

Index

Yes

Required. The name of the index to query. 

Type

Yes

Required. The name of the type where the data is stored. 

Query

Yes

Required. The query to use to import data from the specified index and type; see Elasticsearch for additional query examples. 

Additional Properties

Yes

Optional. Additional properties to use with the es-hadoop client when reading the data, documented at elastic.co.

Example

This example connects to Elasticsearch, which is running locally, and reads in records in the specified index (megacorp) and type (employee), which match the query to (in this case) select all records. All data from the index will be read on each run:

Property

Value

Property

Value

Reference Name

elasticsearch

Elasticsearch Host

localhost:9200

Index

megacorp

Type

employee

Query

?q=*



Created in 2020 by Google Inc.