Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


General 


To support use cases of migrating files from OnPrem to Google cloud there is a need to for comphrensive file handling capabilities. This includes FileList, FileCompression, FileDecompression, FileEncryption , FileDecryption etc.  There are few file level plugins available in CDAP like FileMove, FileDelete and this needs to be expanded. 


UseCase 1



Proposed Design 




  1. FileList Plugin - BatchSource Plugin  - Implement a new FileList plugin ( Batchsource plugin) with similar capability of the current FileSource plugin but instead of actually reading the file contents  it would just pass the filenames with full URI to be used for processing the following actions in pipeline. 
  2. FileCompressEncrypt Sink Plugin -  SparkSink Plugin -   Implement a new plugin of type SparkSink that will read the filename's from the FileList plugin , will compress the file using gzip / snappy then encrypt the file using PGP Public key and persist to Google Cloud storage or HDFS or FileSystem. 

...

SectionFieldTypeDescription

Basic

Configuration

Input FileName


String

Full Name of File including path ( URI)

Compress FileBooleanTrue / False
Compression AlgorithmString ( List)Gzip / Snappy. Applicable only if above Compress File is set to true
Encrypt FileBooleanTrue
PGP Public Key PathStringLocation of PGP public key. Path to File
PGP Public Key Access UseridStringUserid to access the public key incase security is enabled

PGP Public Key Access

password

StringPassword to access the key file
OutFilePathString

Path to store the output file from sync. The output filename will follow the format of <InputfileName Suffix>.gz.pgp

The file path URI can contain filesystem , Hdfs, gcs - google file system or cloud store.

...

.

MoveInputBooleanTrue / False - Move the source input file to a different path so the next run of the pipeline the same file will not be processed.
MoveFilePathStringPath to move the input on successful processing of the file.



Usecase 2


FileDecompressDecrypt Plugin

The plugin support decrypting  files using PGP Public key . The private key can be loaded as a CDAP secret or provide a input file location and decompress file


Plugin Properties

SectionFieldTypeDescription

Basic

Configuration

FileName


String

Full Name of File including path ( URI)

Location of PGK Public KeyString
Key Access ( userid )String
PasswordString

Output

Schema

Decrypt FileName (TBD)StringRecord with FileName with full URI
Decrypt File contents Content (TBD)