Marklogic plugins

Marklogic plugins

Introduction

MarkLogic Server is a powerful software solution for harnessing your digital content all in a single database. MarkLogic enables you to build complex applications that interact with large volumes of JSON, XML, SGML, HTML, RDF triples, binary files, and other popular content formats. The unique architecture of MarkLogic ensures that your applications are both scalable and high-performance, delivering query results at search-engine speeds while providing transactional integrity over the underlying database. These plugins will allow you to integrate data in Marklogic with the rest of your data using CDAP.

User Storie(s)

  • As a pipeline developer, I would like to read data in Marklogic in batch using CDAP, so that I can integrate it easily with the rest of my data.

  • As a pipeline developer, I would like to write complex structures (XML, JSON, SGML, HTML, RDF triples, binary data, etc) to Marklogic in batch using CDAP, so that I do not have to develop custom code to load my data into Marklogic, and take advantage of the standardization that CDAP offers.

  • As a pipeline developer, I would like CDAP to support ELT in Marklogic, so that I can take advantage of Marklogic's powerful search and analytics features after loading the data, while still maintaining standardization and lineage in CDAP

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

Marklogic batch source. 

Category

User Facing Name

Type

Description

Constraints

Category

User Facing Name

Type

Description

Constraints

Basic

Host

text

The host running the Marklogic REST Server

Should validate URL

Port

number

The port that the Marklogic REST Server listens on

 

Database

text

Database

 

Input method

radio button

Method to get files: QUERY or PATH

 

Path

text

Path to read documents from

 

Input Query

text

Query for data search

 

Credentials

User

text

The user to perform operations as. The user should have appropriate read privileges

 

Password

password

The password for the user

 

Connection

Authentication Type

radio button

The type of authentication to use - Digest or

 

Connection Type

radio button

The type of connection to use - Direct or Gateway

 

Advanced

Format

select

Type of document (AUTO/JSON/XML/TEXT/BLOB/DELIMITED), default: BLOB

 

Delimiter

text

Delimiter if the format is 'delimited'

 

Bounding Query

text

Query for splits generation

 

Max Splits

number

Maximum amount of splits

 

File Name Field

text

Field to store information about the file

 

Payload Field

text

Field to store data from Binary and Text files

 

Marklogic batch sink. 

Category

User Facing Name

Type

Description

Constraints

Category

User Facing Name

Type

Description

Constraints

Basic

Host

text

The host running the Marklogic REST Server

Should validate URL

Port

number

The port that the Marklogic REST Server listens on

 

Database

text

Database

 

Path

text

Path to document folder

 

File Name Field

text

Which input field will be used to generate file name. If this field is not set, than UUID will be generated

 

Credentials

User

text

The user to perform operations as. The user should have appropriate read privileges

 

Password

password

The password for the user

 

Connection

Authentication Type

radio button

The type of authentication to use - Digest or

 

Connection Type

radio button

The type of connection to use - Direct or Gateway

 

Advanced

Batch size

number

The batch size for writing to Marklogic

 

Max retries

number

The maximum retries for requests to marklogic

 

Format

select

Type of document, default: JSON

 

Delimiter

text

Delimiter if the format is 'delimited'

 

Marklogic query executor action. 

Category

User Facing Name

Type

Description

Constraints

Category

User Facing Name

Type

Description

Constraints

Basic

Host

text

The host running the Marklogic REST Server

Should validate URL

Port

number

The port that the Marklogic REST Server listens on

 

Database

text

Database

 

Query

textarea

The query to execute in Marklogic

 

Credentials

User

text

The user to perform operations as. The user should have appropriate read privileges

 

Password

password

The password for the user

 

Connection

Authentication Type

radio button

The type of authentication to use - Digest or

 

Connection Type

radio button

The type of connection to use - Direct or Gateway

 

Design / Implementation Tips

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999

  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1

  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2

 

 

Table of Contents

Checklist

User stories documented 
User stories reviewed 
Design documented 
Design reviewed 
Feature merged 
Examples and guides 
Integration tests 
Documentation for feature 
Short video demonstrating the feature

Created in 2020 by Google Inc.