Tracker Audit Metrics
This is not for 3.4 release.
The goal of this page is to document the design of the Tracker Audit Metrics.
Use-Cases
As a user of Tracker, I would like to see total number of audit messages by type/subtype in the past T timeframe.
"Show me the total number of reads in the system in the past 1 hour."
As a user of Tracker, I would like to see the top N datasets/streams by audit message type/subtype activity in the past T timeframe.
"Show me the 5 datasets with the most writes in the past 24 hours."
"Show me the 5 streams with the most metadata_changes in the past 7 days."
As a user of Tracker, I would like to see the top N namespaces with the most type/subtype activity in the past T timeframe.
"Show me the 5 namespaces with the most reads in the past 1 hour."
As a user of Tracker, I would like to see the top N programs reading/writing to a specific dataset in the past T timeframe.
"Show me the top 5 programs writing to dataset1 in the past 1 hour."
Initial High Level Plan
As messages come from the Kafka broker and are written to the AuditLog Table, when a message matches one of the metrics criteria, update metrics in a separate OLAP Cube (but the same dataset) as required.
In the service layer, expose a new endpoint that allows users to query the data in the metrics table and returns the results in JSON.
Storing Metrics in AuditLog Dataset
Add an additional Cube table to the AuditLog custom dataset to hold metrics.
The properties of the cube will be as follows
Resolutions: 1h, 6h, 24h, 1w, 1m, 3m, 6m, 1y
Aggs:
namespace (default ns1 ns2)
namespace,entity_type,entity_name (default,stream,stream1 default,dataset,dataset1)
namespace,entity_type,entity_name,program (default,stream,stream1,program1 default,dataset,dataset1,program2)
Measurements:
access
read
write
unknown
create
update
truncate
delete
metadata_change
count
Queries to OLAP cube
"Show me the total number of reads in the system in the past 1 hour."
{"aggregation":"agg2","resolution":3600,"startTs":now,"endTs":now-1h,"measurements":{"access_reads":"SUM"},"limit":1}
"Show me the 5 datasets with the most writes in the past 24 hours."
{"aggregation":"agg2","resolution":86400,"startTs":now,"endTs":now-24h,"measurements":{"access_writes":"SUM"},"dimensionValues" : { "entity_type" : "dataset" },"groupByDimensions":["namespace","entity_type","entity_name"],"limit" : 10000}Results then sorted and top 5 returned
"Show me the 5 streams with the most metadata_changes in the past 7 days."
{"aggregation":"agg2","resolution":604800,"startTs":now,"endTs":now-7d,"measurements":{"metadata_changes":"SUM"},"dimensionValues" : { "entity_type" : "stream" },"groupByDimensions":["namespace","entity_type","entity_name"],"limit" : 10000}Results then sorted and top 5 returned
"Show me the 5 namespaces with the most reads in the past 1 hour."
{"aggregation":"agg2","resolution":86400,"startTs":now,"endTs":now-1h,"measurements":{"access_reads":"SUM"},"groupByDimensions":["namespace"],"limit" : 10000}Results then sorted and top 5 returned
Endpoints
Method | Endpoint | Description | Params | Sample Data | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
GET | /auditmetrics/topEntities?limit={limit} | Returns the entities with the most activity for use in a general chart listing the entities with the most activity in CDAP |
| [
{
"namespace": "default",
"entityType": "09ed6ccb-fd1a-11e5-a248-0000003b6093",
"entityName": "AuditMetrics",
"columnValues": {
"count": 15,
"unknown": 15,
"access": 15
}
},
{
"namespace": "default",
"entityType": "b78f346d-fa88-11e5-b588-2ef89310f408",
"entityName": "AuditLog",
"columnValues": {
"count": 12,
"unknown": 12,
"access": 12
}
},
{
"namespace": "default",
"entityType": "application",
"entityName": "CDAPToSlack",
"columnValues": {
"count": 10,
"metadata_change": 10
}
},
...
] | ||||||||
|
|
|
|
|