/
Views.
Views.
Requirements
CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.
- Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
- Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
- User should specify at minimum a name and description for the plugin in a configuration
- User should have the ability to list the available plugins using REST API / CLI
- User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
- User should have the ability to list the views associated with a Stream using REST API / CLI / UI
- User should have the ability to apply the plugin to a Stream and create a view
- User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
- User should have the ability to apply different plugins on the same Stream creating different view
- User should have the ability to change the plugin associated with a view
- CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.
Overview
- A view is another source where data can be read, like streams and datasets.
- Therefore, views are readable anywhere a stream or dataset is readable (MapReduce/Spark program, flows, ETL)
- A view is a read-only view of a stream or dataset, with a specific read format (schema + format (csv, avro))
- If explore is enabled, then a Hive table will be created for each view
3.2 Plan
- view HTTP API, client, CLI
- views can be a view of a stream (not dataset yet)
- Hive tables will be created for views when explore is enabled
view HTTP API
Path | Request | Response | Notes |
---|---|---|---|
PUT /v3/namespaces/<namespace>/streams/<stream>/views/<view> | ViewSpecification { "format" : <same as before> } | created new stream view -> 201 Created modified existing stream view -> 200 OK | Creates or modifies a view. |
GET /v3/namespaces/<namespace>/streams/<stream>/views/<view> | ViewDetail (ViewSpecification with an "id" field)
| Get details of an individual view. | |
DELETE /v3/namespace/<namespace>/streams/<stream>/view/<view> | Deletes a view. | ||
GET /v3/namespaces/<namespace>/stream/<stream>/views | [ { "id" : "someview" , "stream" : "stream1" , "format" : ..}, { "id" : "otherview" , "stream" : "stream2" , "format" : ..} ] | Lists all views associated with a stream. |
Notes
- If Explore is disabled, then Hive tables will not be created for views
Sample CLI Flow
- User wants to create a stream "stream1" that contains CSV data and read using two views "view1" and "view2".
- create stream stream1
- send stream stream1 "a,b,c"
send stream stream1 "d,e,f" execute "select * from stream_stream1" // may be removed later, as views already cover this
bodya,b,c d,e,f - create view view1 stream1 format csv "ticker string, num_traded int, price double"
execute "select * from view_view1"
tickernum_tradedpricea b c d e f create view view2 stream1 format csv "ticker string, price double" "drop=$2" <-- drop $2 indicates "drop the 2nd field"
execute "select * from view_view2"
tickerpricea c d f
, multiple selections available,
Related content
Stream Views
Stream Views
More like this
Core Abstractions
Core Abstractions
More like this
Introduction to data pipelines
Introduction to data pipelines
More like this
CDAP Abstractions
CDAP Abstractions
More like this
Plugin Basics
Plugin Basics
More like this
Data Pipeline User Guide
Data Pipeline User Guide
More like this
Created in 2020 by Google Inc.