Jira Source
- Andrew Onischuk
Introduction
Plugin is used to fetch issues from from Jira using JQL or filtering properties or filter id. The plugin works in a parallel fashion.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
Section | Name | Description | Default | Widget | Validations |
---|---|---|---|---|---|
Basic | Jira URL | URL of Jira instance Example: https://issues.cask.co | Text Box | ||
Track Updates | If true, source will track updates of issues, not only their creations | False | Text Box | ||
Filter Mode | Possible values:
| Basic | Select | ||
Projects (for mode Basic) | List of project names. | List | |||
Issue Types (for mode Basic) | List of issue types. E.g. Improvement, Bug, Task etc. | List | |||
Statuses (for mode Basic) | List of Issue statuses. E.g. Open, In Progress, Reopened, Resolved | List | |||
Priorities (for mode Basic) | List of Issue priorities. E.g. Critical | List | |||
Reporters (for mode Basic) | List reporter name ids. e.g. aonishuk | List | |||
Assignees (for mode Basic) | List assignee reporter ids e.g. aonishuk | List | |||
Fix Versions (for mode Basic) | List of fix versions e.g. 6.1.0 | List | |||
Affected Versions (for mode Basic) | List of affected versions e.g. 6.1.0 | List | |||
Labels (for mode Basic) | List of labels on issues. e.g. urgent. | List | |||
updateDateFrom (for mode Basic) | Start for range of update date. Can be used without end date. | Text Box | Validate if valid date. | ||
updateDateTo (for mode Basic) | End for range of update date. Can be used without start date. | Text Box | Validate if valid date. | ||
JQL query (for mode JQL) | A query in Jira Query Language (JQL), which is used to fetch issues. Example: project = CDAP AND priority >= Critical AND (fixVersion = 6.0.0 OR fixVersion = 6.1.0) | Text Box | Check if is valid URI | ||
Jira Filter Id (for Jira Filter Id) | An id of jira filter, which will be used to fetch issues. | Number | |||
Authentication | Username | Used for basic authentication. If not set along with password. Login as anonymous user. | Text Box | ||
Password | Used for basic authentication. If not set along with username. Login as anonymous user. | Password | |||
Advanced | Max Split Size (only for batch source) | Maximum number of issues which will be processed with a single request in a single split. If set to 0 everything will be processed in a single split. | 50 | Number |
Design/Implementation
Structured Record Schema Structure
Note:
- By default the schema contains all the fields possible.
- If user wants to exclude some fields. It's enough to simply remove them from output schema.
We will query only fields which are in schema to increase efficiency.Unfortunately this option does not work correctly in Jira API and does not give some fields even if they are queried. So will have to query all the fields every time.
Schema field | Type | Example | Notes | Nullable |
---|---|---|---|---|
key | String | NETTY-15 | ||
summary | String | Netty caches race condition | ||
id | Long | 21371 | API id of issue in jira | |
project | String | Netty-HTTP | ||
status | String | Open | ||
description | String | ... description of issue ... | true | |
resolution | String | Fixed | true | |
reporter | Record | { name=aonishuk, emailAddress=null, #nullable active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, #nullable timezone=America/Los_Angeles #nullable } | true | |
assignee | Record | { name=aonishuk, emailAddress=null, active=true, avatarUris={48x48=https://www.gravatar.com/avatar/...}, groups=null, timezone=America/Los_Angeles } | ||
fields | array<record> | [{ 'id':'customfield_10005', },...] | Custom Fields | |
affectedVersions | array<string> | ['NETTY-1.0'] | true | |
fixVersions | array<string> | ['NETTY-1.0-maint', 'NETTY-1.1'] | true | |
components | array<string> | ['NETTY-SERVER', 'NETTY-DOCS'] | ||
priority | string | Ciritical | ||
issueType | string | Improvement | ||
isSubtask | boolean | false | ||
creationDate | LogicalType timestamp | 2016-12-21T23:21:42.000+02:00 | ||
updateDate | LogicalType timestamp | 2016-12-21T23:21:42.000+02:00 | ||
dueDate | LogicalType timestamp | 2016-12-30T23:21:42.000+02:00 | ||
attachments | array<record> | [{ 'filename': 'image.png', 'author': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00' 'size': 21454, 'mimeType': 'image/png', 'contentUri': 'http://.../image.png' }, ...] | ||
comments | array<record> | [{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'body': 'actual comment contents' }, ...] | ||
issueLinks | array<record> | [{ 'type': ''is blocked by', # inward 'link': https://issues.cask.co/rest/api/2/issueLink/97018' }, ...] | true | |
votes | int | 3 | ||
worklog | array<record> | [{ 'author': 'aonishuk', 'updateAuthor': 'aonishuk', 'startDate': '2016-12-30T23:21:42.000+02:00', 'creationDate': '2016-12-30T23:21:42.000+02:00', 'updateDate': '2016-12-30T23:21:42.000+02:00', 'comment': 'actual comment contents', 'minutesSpent': 3600 }, ...] | ||
watchers | int | 0 | true | |
isWatching | boolean | false | true | |
timeTracking | record | { 'originalEstimateMinutes': 3600, # nullable } | true | |
subtasks | array<record> | [{ 'key': 'NETTY-44' }, ...] | true | |
labels | array<string> | ['urgent', 'ready_for_review'] |
Why no OAuth2 Authentication?
Jira does not support creating OAuth2 applications for its users (not to be confused with OpenId access setup by some people via services like google etc.), which accepts OAuth2 of google, not of jira own.
Jira supports OAuth2 only for applications which are published to Atlassian market (aka. plug-ins for jira). Which is not our case. Link: https://developer.atlassian.com/cloud/jira/platform/oauth-2-authorization-code-grants-3lo-for-apps/
Implementation and Parallellization For Batch Source
For implementation the JIRA API framework will be used. Here's a generic example of the code using it
https://ecosystem.atlassian.net/wiki/spaces/JRJC/pages/27164680/Tutorial
The framework allows to get count of records which are fetched by JQL query and also to fetch only records from certain point, let's say starting at 100th issue to 150th issue.
Which make it perfect for a parallellization.
A single MapReduce split will proccess maximum maxSplitSize issues. And transform method will transform them from Issue objects to structured records.
STEP 1. Generating splits:
- execute JQL query asking minimal set of fields to get count of issues (this is not done when maxSplitSize is 0)
- create splits according to maxSplitSize
STEP 2. RecordReader routine:
- return issues one by one from startingPosition to endPosition defined by current split
STEP 3. Transform method:
- transform Issue object into structuredRecord.
Realtime Source Specifics
When plugin is run for the first time it will load all the issues. After that every X seconds (configurable via batchInterval), a plugin will fetch only newly created/updated issues.
Since we will just add the date of last found issue as a new condition for the next filter (this will avoid race conditions which would happen when using current date).
Also we can make the plugin continue from the place where it was stopped. This can be done using Spark checkpointing, we can save the date of creation of newest fetched issue and than add that to condition to filter to pipeline restart.
Table of Contents
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature
Created in 2020 by Google Inc.