Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »


General 


To support use cases of migrating files from OnPrem to Google cloud there is a need to for comphrensive file handling capabilities. This includes FileList, FileCompression, FileDecompression, FileEncryption , FileDecryption etc.  There are few file level plugins available in CDAP like FileMove, FileDelete and this needs to be expanded. 


UseCase



Proposed Design 




  1. FileList Plugin - BatchSource Plugin  - Implement a new FileList plugin ( Batchsource plugin) with similar capability of the current FileSource plugin but instead of actually reading the file contents  it would just pass the filenames with full URI to be used for processing the following actions in pipeline. 
  2. FileCompress Plugin - Transform Plugin -  Implement a Compression plugin similar to the Field Compression Plugin which accepts an input file URI , reads the file compress this and stored it temporarily on teh same node and spits out the compressed file URI location that can be used by the next processing action.   
  3. Invoke the current Google Cloud Storage plugin to persist the file to Cloud storage. 
  4. FileEncrypt Plugin - Transform Plugin -  Plugin supporting PGP Encryption of files using a Public key. 
  5. FileDecrypt Plugin - Transform Plugin - Plugin support PGP Decryption using Private Key stored in CDAP Secrets. 


FileList Plugin 

This is a Batchsource plugin similar to current FileSource Plugin but only list the filenames with full URI and not actually read the contents of the file. 

Plugin Properties

SectionFieldTypeDescription

Basic

Configuration

Path


String

Path - Provide the path for the File or Directory. ( Text Field)

Recursive ProcessingBooleanList Files Recursively ( Boolean ) True / False

Output

Schema

FileNameStringRecord with FileName with full URI

FileCompress Plugin


This plugin will take input file name thats passed from the FileList Plugin, Get the fileInputStream using the URI and then using Gzip or Snappy libs compress the file and store them locally on the node. 


Plugin Properties

SectionFieldTypeDescription

Basic

Configuration

FileName


String

Full Name of File including path ( URI)

CompressionDropdownSnappy / Gzip

Output

Schema

Compressed FileName (TBD)StringRecord with FileName with full URI
Compressed Content (TBD)File Stream


Queries 

  1. Should this plugin just store the file locally and pass the new compressed file name or should this actually read the file , compress contents and passed the compressed contents as a stream. The issue with stream is if the file size if larger then it might not be efficient or we might have to do everything in memory which is not ideal. 
  2. If we store the files locally we need some way to clean it up after the processing. 


File Encrypt Plugin

The plugin support encryption of files using PGP Public key . The public key can be loaded as a CDAP secret or provide a input file location . 


Plugin Properties

SectionFieldTypeDescription

Basic

Configuration

FileName


String

Full Name of File including path ( URI)

Location of PGK Public KeyString
Key Access ( userid )String
PasswordString

Output

Schema

Encrypted FileName (TBD)StringRecord with FileName with full URI
Encrypted File contents Content (TBD)


File DeCrypt Plugin

The plugin support decrypting  files using PGP Public key . The private key can be loaded as a CDAP secret or provide a input file location . 


Plugin Properties

SectionFieldTypeDescription

Basic

Configuration

FileName


String

Full Name of File including path ( URI)

Location of PGK Public KeyString
Key Access ( userid )String
PasswordString

Output

Schema

Decrypt FileName (TBD)StringRecord with FileName with full URI
Decrypt File contents Content (TBD)
  • No labels