/
Stream Views
Stream Views
Note: Moved to Views.
Requirements
- CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.
- Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
- Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
- User should specify at minimum a name and description for the plugin in a configuration
- User should have the ability to list the available plugins using REST API / CLI
- User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
- User should have the ability to list the views associated with a Stream using REST API / CLI / UI
- User should have the ability to apply the plugin to a Stream and create a view
- User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
- User should have the ability to apply different plugins on the same Stream creating different view
- User should have the ability to change the plugin associated with a view
- CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.
Overview
- Pluggable stream record formats (the format in which data is read from a stream, which is different from the format in which files are written to a stream)
- Expose cdap-spi module that contains StreamEventRecordFormat abstract class
- Each StreamEventRecordFormat will be associated with a simple name (e.g. grok, clf, avro)
- "system" record formats will come from within the CDAP codebase (grok, clf, avro)
- "user" record formats will be loaded from jars in a certain directory containing SPI jars
- In a later revision, this will may be namespaced and/or managed via an HTTP API
- Stream views
- A stream view is an explorable view (Hive table) of a stream, with a particular record format
- A stream may have multiple views
- Upon creating a stream, the stream will have a default view
Stream View HTTP API
Changes to existing APIs
Path | Request | Response | Notes |
---|---|---|---|
PUT /v3/namespaces/<namespace>/streams/<stream> | Instead of creating a Hive table with a default record format, this will create a "default" view with a default record format. | ||
DELETE /v3/namespaces/<namespace>/streams/<stream> | This will delete all associated views for the stream. | ||
POST /v3/namespaces/<namespace>/streams/properties | "format" field will be considered "deprecated" -> if format is given, this modifies the default view for backwards compat | Notify user that "format" field is deprecated somehow? |
New APIs
Path | Request | Response | Notes |
---|---|---|---|
PUT /v3/namespaces/<namespace>/views/stream/<view> | { "stream": "stream1", "format": <same as before> } | Creates or modifies a view. | |
GET /v3/namespaces/<namespace>/views/stream/<view> | {"id":"someView", "stream": "stream1", "format": ..} | Get details of an individual view. | |
GET /v3/namespaces/<namespace>/views/stream | Lists all views. | ||
DELETE /v3/namespace/<namespace>/views/stream/<view> | Deletes a view. | ||
GET /v3/namespaces/<namespace>/stream/<stream>/views | [ {"id":"someView", "stream": "stream1", "format": ..}, {"id":"otherView", "stream": "stream2", "format": ..} ] | Lists all views for a stream. |
Notes
- If Explore is disabled, then all stream view APIs will be disabled
- Existing Hive queries must not be affected by the deletion or modification of any stream views it may be using
Sample CLI Flow
- User wants to create a stream "stream1" that contains CSV data and read via Explore through two views "view1" and "view2".
- create stream stream1
- send stream stream1 "a,b,c"
send stream stream1 "d,e,f" execute "select * from stream_stream1" // this is the default table, will be deprecated and later removed
body a,b,c d,e,f - create stream-view stream1 view1 format csv "ticker string, num_traded int, price double"
execute "select * from view_view1"
ticker num_traded price a b c d e f create stream-view stream1 view2 format csv "drop=num_traded"
execute "select * from view_view2"
ticker price a c d f
, multiple selections available,
Related content
Realtime CDAP Stream Source
Realtime CDAP Stream Source
More like this
XML Reader
XML Reader
More like this
Introduction to data pipelines
Introduction to data pipelines
More like this
Plugin Basics
Plugin Basics
More like this
Created in 2020 by Google Inc.