Checklist

User Stories Documented
User Stories Reviewed
Design Reviewed
APIs reviewed
Release priorities assigned
Test cases reviewed
Blog post

Introduction

CDAP 6.0.0 metadata search allows users to search for multiple tags at a time, with the results being entities that have at least one of the requested tags.Currently CDAP runs search using ElasticSearch and an internally built noSQL search system, and this feature is designed to be implemented in both.

The new feature presented in this design doc aims to introduce a search syntax that allows the user to indicate tags that required objects are required to have.

Currently CDAP runs search using Elasticsearch and an internally built NoSQL search system, and this feature is designed to be implemented in both.

Goals

The new feature presented in this design document aims to introduce a search syntax that allows the user to indicate required tags in the output results. Currently CDAP runs search using ElasticSearch and an internally built noSQL search system, and this feature is designed to be implemented in both.

User Stories

As a pipeline developer, I want to search for all datasets that contain both tag X and tag Y.

Design

This design will introduce a new internal API for handling special queries, those containing required terms, for use by both the NoSQL and Elasticsearch implementations of metadata storage.

A high level overview of how the two systems will talk to the new API to access the parsed data:

Approach

Approach #1

Create a helper class to parse the user’s request. The information extracted will include the individual terms and whether they are required or optional in the results based on their syntax. Abstracting this functionality and information will allow for both Elasticsearch and NoSQL to utilize methods from the same class. Future expansions on this system will be possible through one file.

Approach #2

Continue to parse user requests in each implementation separately, looking through the query string for the required term notation. While similar between implementations, the process of extracting and storing information regarding priority level would be left to the individual implementations to handle. This approach would be more straightforward to achieve, but ultimately harder to maintain and augment in the future.

Primary design considerations

Scalability—what if we want to add new features for search later on?

Complexity—how can we implement the feature in a way that is conceptually straightforward and effective?

Implementation

Create a QueryTerm class containing two fields:

String term;

Qualifier qualifier;

Create a QueryParser class for splitting queries into organized QueryTerm objects.

Uses a public parse() method that takes a query string as a parameter, separates that string into individual terms by whitespace, parses each one individually for search operators, and returns them as QueryTerms in a list.

Elasticsearch Implementation

In ElasticsearchMetadataStorage.java’s createMainQuery() method, delegate query parsing and string formatting to the new QueryParser class, and use the resultant QueryTerm object information to make proper calls to Elasticsearch’s API.

NoSQL Implementation

Keep the original search method the same but utilize the QueryParser class to retrieve search terms that are stripped of the new syntax.

Create a class, MetadataResultEntry, which contains an instance of MetadataEntry. This new object will be replacing the current corresponding instances of MetadataEntry. The object will also contain a string representing the term that was used to search for the MetadataEntry.

Alter the existing SearchTerm class such that it contains a QueryTerm field. It will use this field to construct MetadataResultEnrty’s.

Before sorting the results, filter out the entities that do not have the tags specified as required by the user’s search query.

API changes

New Programmatic APIs

QueryParser.java

/**
* A thread-safe class that provides helper methods for metadata search string interpretation,
* and defines search syntax for qualifying information (e.g. required terms) {@link QueryTerm.Qualifier}.
*/
public final class QueryParser() {
	private static final Pattern SPACE_SEPARATOR_PATTERN = Pattern.compile("//s+");
	private static final char REQUIRED_OPERATOR = '+';

	private QueryParser() {}
	/**
	* Organizes and separates a raw, space-separated search string
	* into multiple {@link QueryTerm} objects. Spaces are defined by the {@link QueryParser#SPACE_SEPARATOR_PATTERN}
	* field, the semantics of which are documented in Java's {@link Pattern} class.
	* Certain typical separations of terms, such as hyphens and commas, are not considered spaces.
	* This method preserves the original case of the query.
	*
	* This method supports the use of certain search operators that, when placed before a search term,
	* denote qualifying information about that search term. When translated into a QueryTerm object, search terms
	* containing an operator have the operator removed from the string representation.
	* The {@link QueryParser#REQUIRED_OPERATOR} character signifies a search term that must receive a match.
	* By default, this method considers search items without an operator to be optional.
	*
	* @param query the raw search string
	* @return a list of QueryTerms
	*/
	public static List<QueryTerm> parse(String query) {
		//...
	}
}

QueryTerm.java

/**
* Represents a single item in a search query in terms of its content (i.e. the value being searched for)
* and its qualifying information (e.g. whether a match for it is optional or required).
* Is typically constructed in a list via {@link QueryParser#parse(String)}
*/
public class QueryTerm(){
	private final String term;
	private final Qualifier qualifier;
	/**
	* Defines the different types of search terms that can be input.
	* A qualifier determines how the search implementation should handle the given term, e.g.
	* prioritizing required terms over optional ones.
	*/
	public enum Qualifier {
		OPTIONAL, REQUIRED
	}

	/**
	* Constructs a QueryTerm using the search term and its qualifying information.
	*
	* @param term the search term
	* @param qualifier the qualifying information {@link Qualifier}
	*/
	public QueryTerm(String term, Qualifier qualifier) {
		this.term = term;
		this.qualifier = qualifier;
	}

	public String getTerm() {
		return term;
	}

	public Qualifier getQualifier() {
		return qualifier;
	}

	@Override
	public boolean equals(Object o) {
		if (o == this) {
      		return true;
    	}
    	if (o == null || getClass() != o.getClass()) {
      		return false;
    	}
    	QueryTerm that = (QueryTerm) o;
    	return Objects.equals(term, that.getTerm()) && Objects.equals(qualifier, that.getQualifier());
	}


	@Override
	public int hashCode() {
    	return Objects.hash(term, qualifier);
  	}

}

UI Impact or Changes

User can now indicate a required term in the metadata search bar by simply placing the symbol “+” directly in front of the desired term
- e.g.: query: “+tag1 tag2”
  - tag1 is a ‘required’ term
  - tag2 is an ‘optional’ term

Future work

New search syntax symbols can be more easily added to the current implementation

Possible examples of such syntax: “!tag1” which indicates: Do not include results that have ‘tag1’ as a metadata tag

Required Search Fields