Webserver Log Parser Transformation

Plugin version: 2.11.0

Parses logs from any input source for relevant information such as URI, IP, browser, device, HTTP status code, and timestamp.

This transform is used when you need to parse log entries. For example, you may want to read in log files from S3 using Amazon S3 source, parse the logs using Webserver Log Parser transformation, and then store the IP and URI information in a Cube dataset.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Log Format

No

Required. Log format to parse. Currently supports S3CLF, and Cloudfront formats.

Default is CLF.

Input Name

No

Optional. Name of the field in the input schema which encodes the log information. The given field must be of type String or Bytes.

Output Schema

No

Required. The output schema for the data.

Conditions

If error dataset is configured, then all the erroneous rows, if present in the input, will be committed to the specified error dataset. If no error dataset is configured, then pipeline will get completed but with warnings in the logs.

Example

This example searches for an input Schema field named ‘body’, and then attempts to parse the Combined Log Format entries found in the field for the URI, IP, browser, device, HTTP status code, and timestamp:

Property

Value

Property

Value

Log Format

CLF

Input Name

body

The Webserver Log Parser transformation will emit records with this schema:

field name

type

field name

type

uri

string

ip

string

browser

string

device

string

httpStatus

int

ts

long



Created in 2020 by Google Inc.