Views.

Views.

Requirements

  • CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.

  • Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.

  • Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration

  • User should specify at minimum a name and description for the plugin in a configuration

  • User should have the ability to list the available plugins using REST API / CLI

  • User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.

  • User should have the ability to list the views associated with a Stream using REST API / CLI / UI

  • User should have the ability to apply the plugin to a Stream and create a view

  • User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.

  • User should have the ability to apply different plugins on the same Stream creating different view

  • User should have the ability to change the plugin associated with a view

  • CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.

Overview

  • A view is another source where data can be read, like streams and datasets.

    •  Therefore, views are readable anywhere a stream or dataset is readable (MapReduce/Spark program, flows, ETL) 

  • A view is a read-only view of a stream or dataset, with a specific read format (schema + format (csv, avro))

  • If explore is enabled, then a Hive table will be created for each view

3.2 Plan

  • view HTTP API, client, CLI

  • views can be a view of a stream (not dataset yet)

  • Hive tables will be created for views when explore is enabled

view HTTP API

Path

Request

Response

Notes

Path

Request

Response

Notes

PUT /v3/namespaces/<namespace>/streams/<stream>/views/<view>

ViewSpecification

{

  "format": <same as before>

}

created new stream view -> 201 Created
modified existing stream view -> 200 OK 

Creates or modifies a view.

GET /v3/namespaces/<namespace>/streams/<stream>/views/<view>

 

ViewDetail (ViewSpecification with an "id" field)

{"id":"view1""format": ..}

Get details of an individual view.

DELETE /v3/namespace/<namespace>/streams/<stream>/view/<view>

 

 

Deletes a view.

GET /v3/namespaces/<namespace>/stream/<stream>/views

 

[

  {"id":"someview""stream""stream1""format": ..},

  {"id":"otherview""stream""stream2""format": ..}

]

Lists all views associated with a stream.

Notes

  • If Explore is disabled, then Hive tables will not be created for views

Sample CLI Flow

  1. User wants to create a stream "stream1" that contains CSV data and read using two views "view1" and "view2".

    1. create stream stream1

    2. send stream stream1 "a,b,c"
      send stream stream1 "d,e,f" 

    3. execute "select * from stream_stream1" // may be removed later, as views already cover this

    4. create view view1 stream1 format csv "ticker string, num_traded int, price double"

    5. execute "select * from view_view1"

    6. create view view2 stream1 format csv "ticker string, price double" "drop=$2" <-- drop $2 indicates "drop the 2nd field"

    7. execute "select * from view_view2"

Created in 2020 by Google Inc.