GitHub Batch Source

GitHub Batch Source

GitHub provides hosting for software development version control using Git. This plugin would allow users to select the data sets associated with the specified repository and collect raw level data.

User Expectations

  • Users would like to collect raw data sets associated with a specific repository so that they can perform monitoring and reporting on it

  • User would like to perform aggregations on GitHub datasets so that they can get better understanding of the repository usage 

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

User Configurations

User Configuration Label

Label Description

Variable

User Widget

Notes

User Configuration Label

Label Description

Variable

User Widget

Notes

Access Token

Authorization token to be used to authenticate to GitHub API

authorizationToken

Text Box

https://developer.github.com/v3/#authentication

Repository name

Repository name from which the data is retrieved

repoName

Text Box



Repository owner name

GitHub username who owns the repository from which the data is retrieved

repoOwner

Text Box



GitHub API hostname

GitHub API hostname from which the data is retrieved.



hostname

Text Box

Optional, for GitHub Enterprise only.

By default, api.github.com

Dataset*

Dataset name that you would like to retrieve**

dataset_name

Drop down

https://developer.github.com/v3/repos/

Valid values include all the objects listed in the above link.

* Dataset name can be one of the following: Branches, Collaborators, Comments, Commits, Contents, Deploy Keys, Deployments, Forks, Invitations, Pages, Releases, Traffic:Referrers, Webhooks)

** Retrieving GitHub data would always call list API for the associated object. For instance, if Collaborators dataset was selected, the plugin would get the list of all the collaborators on the specified repository (along with other associated fields returned by List Collaborators API)

Design / Implementation Tips

Authentication will be performed using access token.

Output schema must be automatically generated from selected data. 

References

Table of Contents







Checklist

User stories documented 
User stories reviewed 
Design documented 
Design reviewed 
Feature merged 
Examples and guides 
Integration tests 
Documentation for feature 
Short video demonstrating the feature

Created in 2020 by Google Inc.