Checklist
Support Pluggable Log Saver Plugins in Log Saver for users to provide custom plugins to configure and store log messages and also improve log saver for other functionalities like versioning, impersonation, resilience, etc.
Current implementation of LogSaver uses namespace directory to store logs which makes LogSaver dependent on impersonation. Impersonation could be avoided for logsaver if log files storage is not namespaced. For example, use a common root directory outside any CDAP namespace to store log files and use namespace-version-id to differentiate between namespaces. This would prohibit the users to look at their log files directly in HDFS as they are owned by "cdap". We need to consider if we have to provide options to make the "user" flexible for log files written by log saver plugins.
LogSaver currently does not handle the case where a given entity is deleted and recreated with the same name. This is a common scenario on development clusters where developers can delete and recreate entity with the same name. There are couple of open issues present <link>
User might want to configure LogSaver to collect logs at different level and also write log messages at different destination. Having capability to add custom LogAppender will allow users to implement custom logic for logs in addition to existing CDAP LogSaver. For example, user might want to collect debug logs per application and store them on HDFS at application-dir/{debug.log} location.
TODO: Look into improving messageTable used for sorting log messages.
Making LogSaver resilient to failure scenarios.
TODO: Add more details
application-dir/{audit.log, metrics.log, debug.log}
As mentioned above, the log processing framework (the framework) is responsible for dispatching log events to individual appenders to process, as well as making CDAP services available for appenders.
Since the Logback Library is a very popular library as logging framework, we are going use it as the contract between the log processing framework and individual log processor.
The Log Processing Framework is responsible for the followings:
Appender
based on a logback configuration file provided to the framework.logback.xml
used by CDAP processes and logback-container.xml
used by containers.LoggerContext
class will be implemented and injected to Appender
(s).Appender
.Appender
configured, the framework will create an in-memory Log Processing Pipeline.For each Appender
instance, there is a corresponding Log Processing Pipeline responsible for the followings:
SortedSet
(or a LongSortedSet
from fastutil to avoid extensive object creation) that is ordered by a unique key that is generated by the log event timestamp and kafka offset(log_event_timestamp << 20) + (kafka_offset & 0x0fffffL)
Appender.doAppend
method on log events in the sorting bufferFlushable.flush()
if the Appender
implements the java.io.Flushable
interface before persisting the Kafka offsetAppender
doesn't implement Flushable
, persist the offset periodicallyEach Log Processing Pipeline is single thread. This provides a simple mechanism on putting back pressure on the rate of reading from Kafka, without the overhead and complications of multi-thread coordination.
Given that the Logback library is a widely used logging library for SLF4J and is also the logging library used by CDAP, we can simply have log processors implemented through Logback Appender
API, as well as configured through usual logback.xml
configuration. Here are the highlights for such design:
logback.xml
file (not the one used by CDAP nor for containers).logback.xml
should be supported (e.g. appender-ref for delegation/wrapping).Appender
implementations should be usable without any modification, as long as being properly configured in the runtime environment.Appender
already implements a ContextAware
interface such that Appender
gets an instance of ContextAware
. In order to provide CDAP services to Appender
, we can have a CDAP implementation of ContextAware
and let the Appender
implementation do the casting when needed.Flushable
interface. The framework will call the flush
method before preserving the transport offset to make sure there is no data loss.There will be a log appender implementation for CDAP, replacing the current log-saver. The CDAP log appender is responsible for:
Flushable
interface to perform hsync
for HDFS filesIn order to get rid of the complication of impersonation and namespace removal issues, the way that log files are organized on file system will be different starting from 4.1.
All The log files will be created under a common base logging directory. There will be sub-directories for namespace, date, app-name, program-name. The individual files under the directory will be sequence numbered.
Format: /<cdap-base-dir>/logs/<namespace>/<yyyy-mm-dd>/<app-name>/<program-name>/<seq-id.avro> Flow Logs /<cdap-base-dir>/logs/default/2017-1-22/HelloWorld/WhoFlow/0.avro /<cdap-base-dir>/logs/default/2017-1-22/HelloWorld/WhoFlow/1.avro System service Logs /<cdap-base-dir>/logs/system/2017-1-22/services/messaging/0.avro /<cdap-base-dir>/logs/default/2017-1-22/services/messaging/1.avro |
During Create New File, we will perform list files in the directory to figure out the next sequence number for the file to be created and use that.
Row-key pattern can remain the same for the new framework as the existing one.
Old Format for rowkey : Rowkey_Prefix(200)LoggingContext Example : 200ns1:app1:program1 New Format for rowkey : Rowkey_Prefix(200)LoggingContext:eventTimeStamp(long)creationTimestamp(long) Example : 300:ns1:app1:program1:ts1ts2 Existing Column Format In Log-Saver : Column Key : <TimeStamp> Column value : Path to file Column Format Changes: ColumnKey : constant_column_key ColumnValue : Path to file Example : rowKey column Value 300ns1:app1:program1:14851785080011485178508101 file (constant) hdfs://<hostname>:8020/cdap/logs/system/2017-01-23/services/transaction/1.avro |
Property | Description | default |
---|---|---|
log.process.pipeline.config.dir | A local directory on the CDAP Master that is scanned for log processing pipeline | /opt/cdap/master/ext/logging/config |
log.process.pipeline.lib.dir | Semicolon-separated list of local directories on the CDAP Master scanned | /opt/cdap/master/ext/logging/lib |
CDAP Log Framework allows users to implement custom logback appenders. CDAP comes packaged with CDAP Log appender which is used by CDAP system for processing logs from CDAP system and user namespace. It also has RollingLocationLogAppender which is extension of logback's RollingFileAppender to use HDFS.
More information implementing a custom appender can be found here <link>.
Once you have the appender packaged, in order to make it available, you would copy the appender jar to the path denoted by "log.process.pipeline.lib.dir" in your cluster. When the log.saver system container starts up, jar's under this directory will be made available to it.
In CDAP Log Framework, For every logback.xml files configured at the "log.process.pipeline.config.dir" we create a logging pipeline for the file.
A Log Pipeline provides isolation from other log pipelines.
As indicated in the diagram above, each pipeline has to have a unique name. they have separate kafka consumers, this allows the pipelines to have different offsets and a slow processing pipeline won't affect the performance of other logging pipelines.
Each pipeline has to have a unique name, as we use this name for persisting and retrieving metadata (kafka offsets).
Example Logging pipeline configuration used by CDAP system logging - <https://github.com/caskdata/cdap/blob/release/4.1/cdap-watchdog/src/main/resources/cdap-log-pipeline.xml>
Example Custom Logging pipeline configuration using RollingLocationLogAppender - https://github.com/caskdata/cdap/blob/release/4.1/cdap-watchdog/src/test/resources/rolling-appender-logback-test.xml TODO : find a better example for this.
If you would like to create a custom logging pipeline, you would create and configure a logback.xml file, configuring loggers and appenders based on your requirement and place this logback file at the path identified by "log.process.pipeline.config.dir".
CDAP Pipeline has certain common properties for the pipelines that can be configured in cdap-site.xml. They are
Properties |
---|
log.process.pipeline.buffer.size |
log.process.pipeline.checkpoint.interval.ms |
log.process.pipeline.event.delay.ms |
log.process.pipeline.kafka.fetch.size |
log.process.pipeline.logger.cache.size |
log.process.pipeline.logger.cache.expiration.ms |
log.process.pipeline.auto.buffer.ratio |
Default Values for these can be found in cdap-default.xml.
These properties can also be changed at pipeline level, by overriding these properties by providing a value in the pipeline's logback.xml file for these properties.
Users can use any existing logback's appender and also `RollingLocationLogAppender` - Extension of RollingFileLogAppender to use HDFS location in their logging pipelines. In addition, it is also possible for a user to implement their custom appender and make use of it in the log framework.
LogFramework uses the logback's Appender API, So a user wishing to write a custom appender, has to implement Logback's Appender interface in their application.
In Addition access to CDAP's system components like, Datasets, Metrics, LocationFactory, etc. are made available to Appender Context.
Adding Dependency on cdap-watch-dog API will allow you to access AppenderContext in your application. AppenderContext is an extension of logback's LoggerContext.
New Java APIs introduced (both user facing and internal)
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
- close currently open log files for apps,programs,etc in that namespace - should be done for both CDAP log appender and log processors
- CDAP log appender should either delete or make the files from old namespace unavailable through meta data changes. storage structure has to change from existing format to support this.
- what should be done about the files written by log processors ?
- unprocessed kafka events for the deleted namespace should be dropped.when to stop dropping the events ? if namespace is recreated, how to ensure the new logs from this new namespace are not dropped?
- one option is to have a flag in log saver, when namespace, app,etc is deleted, this flag can be set to drop events in those context
- when these app, namespace are recreated, the flag can be reset to avoid skipping events
Creation:
Namespace creation will notify log framework of all the log saver instances about creation of the namespace; through Rest-api/tms/kafka? Log framework will create uuid for that namespace and store an entry in metadata table for that namespace. The UUID will only be understandable by log saver. Log framework will add uuid information into MDC of LogEvent so that appenders can distinguish between 2 different namespace instances (eventhough their names are same).
Deletion:
Deletion will mark metadata as deleted by log framework and it will be responsible for cleaning up marked_tobe_deleted files. It can do it while doing log clean up. Current implementation takes all namespaces and then scans table considering it namespace and prefix key. If log framework do not have access to all the namespaces (in case of custom/impersonated namespaces) other solution is to go over all the files on hdfs and map them to metadata table and delete files which are older than retention duration. What happens when log files are deleted manually in that case? We should have a REST-Api to repair/make log saver consistent by exposing rest end point to check these inconsitencies and repair them.
Appenders:
When an event is received by appender, appender can check (in metadata) whether the event received is for existing namespace or deleted namespace and can either skip it or process it.
Backwards compatibility:
Can we add upgrade step and do atomic rename for all the log files on hdfs? That way we do not need to support old log file structure
exception while initializing appender (issues in jar, classpath, could not initialize)
initialized successfully, exception while processing records
exception while writing log events to destination (flush)
exception while saving meta data to table
Scenario : When one log appender has issues writing to destination, while other appenders are working fine, the log appender with issue will be behind in processing (lower kafka offset) compared to other log appenders. When the log framework is restarted, how do we handle processing of log messages.
Pros:
Cons:
Note : This would also need each appender to store meta data about the kafka offset for their partitions.
What's the impact on Authorization and how does the design take care of this aspect
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test ID | Test Description | Expected Results |
---|---|---|