The developer to load webpage click and view data (customer id, timestamp, action, url) into a partitioned fileset. After loading the data, the developer wants to de-duplicate records and calculate how many times each customer clicked and viewed over the past hour, past day, and past month.

User Stories:

(3.4) A developer should be able to create pipelines that contain aggregations (GROUP BY -> count/sum/unique)
(3.5) A developer should be able to create a pipeline with multiple sources, with one happening after the otherA control some parts of the pipeline running before others. For example, one source -> sink branch running before another source -> sink branch.
(3.5) A developer should be able to use a Spark ML job as a pipeline stage
A (3.4) A developer should be able to rerun failed pipeline runs without reconfiguring the pipeline
A (3.4) A developer should be able to de-duplicate records in a pipeline
A (3.5) A developer should be able to join multiple branches of a pipeline
A (3.5) A developer should be able to use an Explore action as a pipeline stage
A (3.5) A developer should be able to create pipelines that contain Spark Streaming jobs
A (3.5) A developer should be able to create pipelines that run based on various conditions, including input data availability and Kafka events

...

Versions Compared

Old Version 47

New Version 48

Key

User Stories:

Page Comparison

Versions Compared

Old Version 47

New Version 48

Key

User Stories: