Tracker Audit Metrics
This is not for 3.4 release.
The goal of this page is to document the design of the Tracker Audit Metrics.
Use-Cases
- As a user of Tracker, I would like to see total number of audit messages by type/subtype in the past T timeframe.
- "Show me the total number of reads in the system in the past 1 hour."
- As a user of Tracker, I would like to see the top N datasets/streams by audit message type/subtype activity in the past T timeframe.
- "Show me the 5 datasets with the most writes in the past 24 hours."
- "Show me the 5 streams with the most metadata_changes in the past 7 days."
- As a user of Tracker, I would like to see the top N namespaces with the most type/subtype activity in the past T timeframe.
- "Show me the 5 namespaces with the most reads in the past 1 hour."
- As a user of Tracker, I would like to see the top N programs reading/writing to a specific dataset in the past T timeframe.
- "Show me the top 5 programs writing to dataset1 in the past 1 hour."
Initial High Level Plan
- As messages come from the Kafka broker and are written to the AuditLog Table, when a message matches one of the metrics criteria, update metrics in a separate OLAP Cube (but the same dataset) as required.
- In the service layer, expose a new endpoint that allows users to query the data in the metrics table and returns the results in JSON.
Storing Metrics in AuditLog Dataset
- Add an additional Cube table to the AuditLog custom dataset to hold metrics.
- The properties of the cube will be as follows
- Resolutions: 1h, 6h, 24h, 1w, 1m, 3m, 6m, 1y
- Aggs:
- namespace (default ns1 ns2)
- namespace,entity_type,entity_name (default,stream,stream1 default,dataset,dataset1)
- namespace,entity_type,entity_name,program (default,stream,stream1,program1 default,dataset,dataset1,program2)
- Measurements:
- access
- read
- write
- unknown
- create
- update
- truncate
- delete
- metadata_change
- count
- Queries to OLAP cube
- "Show me the total number of reads in the system in the past 1 hour."
{ "aggregation": "agg2", "resolution": 3600, "startTs": now, "endTs": now-1h, "measurements": {"access_reads": "SUM"}, "limit": 1 }
- "Show me the 5 datasets with the most writes in the past 24 hours."
{ "aggregation": "agg2", "resolution": 86400, "startTs": now, "endTs": now-24h, "measurements": {"access_writes": "SUM"},
"dimensionValues" : { "entity_type" : "dataset" },
"groupByDimensions": ["namespace","entity_type","entity_name"],
"limit" : 10000
}Results then sorted and top 5 returned
- "Show me the 5 streams with the most metadata_changes in the past 7 days."
{ "aggregation": "agg2", "resolution":604800, "startTs": now, "endTs": now-7d, "measurements": {"metadata_changes": "SUM"},
"dimensionValues" : { "entity_type" : "stream" },
"groupByDimensions": ["namespace","entity_type","entity_name"],
"limit" : 10000
}Results then sorted and top 5 returned
- "Show me the 5 namespaces with the most reads in the past 1 hour."
{ "aggregation": "agg2", "resolution":86400, "startTs": now, "endTs": now-1h, "measurements": {"access_reads": "SUM"},
"groupByDimensions": ["namespace"],
"limit" : 10000
}Results then sorted and top 5 returned
- "Show me the total number of reads in the system in the past 1 hour."
Endpoints
| Method | Endpoint | Description | Params | Sample Data | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GET | /auditmetrics/topEntities?limit={limit} | Returns the entities with the most activity for use in a general chart listing the entities with the most activity in CDAP |
| [
{
"namespace": "default",
"entityType": "09ed6ccb-fd1a-11e5-a248-0000003b6093",
"entityName": "AuditMetrics",
"columnValues": {
"count": 15,
"unknown": 15,
"access": 15
}
},
{
"namespace": "default",
"entityType": "b78f346d-fa88-11e5-b588-2ef89310f408",
"entityName": "AuditLog",
"columnValues": {
"count": 12,
"unknown": 12,
"access": 12
}
},
{
"namespace": "default",
"entityType": "application",
"entityName": "CDAPToSlack",
"columnValues": {
"count": 10,
"metadata_change": 10
}
},
...
]
| ||||||||
, multiple selections available,
Created in 2020 by Google Inc.