CDAP Development Workflow

Recommended guidelines for Development workflow

  • Create feature branches for new features
  • Push code to remote branches early and often (ideally at least once a day)
  • Have a clear and concise commit message
  • Write unit tests to cover the functionality both positive and negative cases
  • Build the code locally using maven without -DskipTests and check for any errors
  • Check the builds on bamboo for your feature branch
    • Make sure you fix errors early 
  • Ensure there are no checkstyle errors
    • mvn package -DskipTests will run checkstyle
  • Create pull request to merge the code to develop branch
  • Pull request description should contain the reason why the PR is created in a short summary
  • Keep the pull requests small and focused 
  • You can keep the same feature branch and keep sending pull requests as and when the feature is complete
    • Example: If you are working on implementing a metrics framework, the following steps would be logical points to send code reviews:
      • Java API  (without implementation)
      • REST API (without implementation)
      • Metrics processor implementation 
      • Java API implementation review
      • REST API implementation review
      • Kafka integration review
  • Fix all code review comments
  • Merge code after there is an approval in a pull request


Sample development workflow
$ # Pull the latest develop
$ git checkout -B develop origin/develop

$ # Create new feature branch
$ git checkout -b feature/CDAP-xxxx-description
 
$ # Make your changes
 
$ git add <files>
 
$ git commit -m "Clear and concise commit messages"

$ # Rebase on latest changes from develop if needed 
$ git fetch origin 
$ git rebase origin/develop 
 
$ git push origin feature/CDAP-xxxx-description

Checklist before sending pull requests
  • Unit test builds for feature branches are successful in bamboo
  • No checkstyle errors in feature branch
  • Adequate unit tests are written for the feature


Continuous Delivery

The idea of Continuous Delivery (CD) is to keep the software always releasable so that we can release as often as possible whenever we wanted.

[From http://www.collab.net/sites/default/files/uploads/CollabNet-S-curve_2.png]

Development Best Practices

In this section, some best practices in design and development to ease CD are discussed.

Develop as libraries / components

Always try to break down the new feature you are working on into self-contain library / components so that abstraction boundary for integration can be defined properly (e.g. interfaces, class, REST endpoints, etc.)

Design for incremental changes

Break down task into multiple deliverables, each accompanied with appropriate unit tests and integration points to the rest of the system.

e.g. Stream on HDFS

  1. Writer and Reader for single stream file and index.
    • Safe to merge, as no production code path would be touching these classes.
  2. Partition stream writer that knows how to write to multiple stream files based on time partition.
    • Define classes/interface as integration points for Gateway in later integration stage.
    • Needed for next stage. Easily to be tested separately.
  3. InputFormat for consumed by MapReduce.
    • Could be tested separately in unit test with LocalJobRunner.
  4. StreamConsumer for real time consumption of events, used by Flow
    • This should be implementing existing interface so that it is ready to integration with Flow system in later stage.
    • If changes needs to be made to existing interfaces, TODO are marked at the places that needs to be changed.
  5. Modify Gateway to write to new stream using the stream writer defined in step 2. Also rewire Flow system to use the new StreamConsumer in step 4.
    • Adjustments to interfaces defined in step 2 might be needed.
    • Also TODO in step 4 needs to be all cleared.
    • This is the final integration stage. Since things are already well tested in steps 1-4, a smoother integration and testing are expected.

In this example, all changes made in step 1-4 are safe to merge and release, as no production code would be affected (new code are dead code to the release).

Feature toggle

In some cases, when the set of changes for the new feature is big and touched a lot of existing code, it is tempting to keep working on a branch not merging back to main trunk. However, this is not a good practice and defeated the purpose of CI. In order to be able to merge early, a technique called feature toggle could be used. Basically it uses configuration to turn on/off usage of new feature. The configuration may or may not be stayed after the feature is fully completed, depending on whether need to keep the old behavior or not.

Branch by Abstraction

Basically new abstractions are built (e.g. new interfaces) and have an implementation of the existing behavior. New features can then now be built by implementing the same interfaces, and potentially uses feature toggle to select what implementation to be used.


Created in 2020 by Google Inc.