Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SectionFieldTypeDescription

Basic

Configuration

Path


String

Path - Provide the path for the File or Directory. ( Text Field)

This should also support other file sources like FTP / SFTP etc

Recursive ProcessingBooleanList Files Recursively ( Boolean ) True / False

Output

Schema

FileNameStringRecord with FileName with full URI


Queries 

  • If SFTP or FTP needs to be supported then its not clear how the credential information can be shared to the next step in the process. 

FileCompressEncrypt Plugin

...

SectionFieldTypeDescription

Basic

Configuration

Input FileName


String

Full Name of File including path ( URI)

Compress FileBooleanTrue / False
Compression AlgorithmString ( List)Gzip / Snappy. Applicable only if above Compress File is set to true
Encrypt FileBooleanTrue
PGP Public Key PathStringLocation of PGP public key. Path to File
PGP Public Key Access UseridStringUserid to access the public key incase security is enabled

PGP Public Key Access

password

StringPassword to access the key file
OutFilePathString

Path to store the output file from sync. The output filename will follow the format of <InputfileName Suffix>.gz.pgp

The file path URI can contain filesystem , Hdfs, gcs - google file system or cloud store.

MoveInputBooleanTrue / False - Move the source input file to a different path so the next run of the pipeline the same file will not be processed.
MoveFilePathStringPath to move the input on successful processing of the file.



Queries 

  1. What is the best approach to track processed files so they are not processed again.  Proposing moving the input files after successful processing to a different directory so they dont get processed again in the next run. 



Usecase 2


Image Added



FileDecompressDecrypt Plugin

...

SectionFieldTypeDescription

Basic

Configuration

FileNamePath


String

Full Name of File including path ( URI)

Location of PGK Public KeyStringKey Access ( userid )StringPasswordString

Output

Schema

Decrypt FileName (TBD)StringRecord with FileName with full URI
Decrypt File contents Content (TBD)

Path containing file name or directory of files.


Recursive ProcessingBooleanTrue / False

DeCompress FileBooleanTrue / False

DeCompression AlgorithmString ( List)Gzip / Snappy. Applicable only if above Compress File is set to true

DeEncrypt FileBooleanTrue

PGP Private Key PathStringLocation of PGP public key. Path to File

PGP Private Key Access UseridStringUserid to access the public key incase security is enabled


PGP Private Key Access

password

StringPassword to access the key file





MoveInputBooleanTrue / False - Move the source input file to a different path so the next run of the pipeline the same file will not be processed.

MoveFilePathStringPath to move the input on successful processing of the file.

Output

Schema

OutputStringEach Row from the file read.