Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Overview

The CDAP 4.0 UI is designed to provide operational insights about both - CDAP services as well as other service providers such as YARN, HBase and HDFS. The CDAP platform will need to expose additional APIs to surface this information.

Requirements

The operational APIs should surface information for the Management Screen

These designs translate into the following requirements:

  • CDAP Uptime
    • P1: Should indicate the time (number of hours, days?) for which the CDAP Master process has been running. 
    • P2: In an HA environment, it would be nice to indicate the time of the last master failover.
  • CDAP System Services
    • P1: Should indicate the current number of instances.
    • P1: Should have a way to scale services.
    • P1: Should show service logs
    • P2: Node name where container started
    • P2: Container name
    • P2: master.services YARN application name
  • Middle Drawer:
    • CDAP:
      • P1: # of masters, routers, kafka-servers, auth-servers
      • P1: Router requests - # 200s, 404s, 500s
      • P1: # namespaces, artifacts, apps, programs, datasets, streams, views
      • P1: Transaction snapshot summary (invalid, in-progress, committing, committed)
      • P1: Logs/Metrics service lags
      • P2: Last GC pause time
    • HDFS:
      • P1: Space metrics: yotal, free, used
      • P1: Nodes: yotal, healthy, decommissioned, decommissionInProgress
      • P1: Blocks: missing, corrupt, under-replicated
    • YARN:
      • P1: Nodes: total, new, running, unhealthy, decommissioned, lost, rebooted
      • P1: Apps: total, submitted, accepted, running, failed, killed, new,  new_saving
      • P1: Memory: total, used, free
      • P1: Virtual Cores: total, used, free
      • P1: Queues: total, stopped, running, max_capacity, current_capacity
    • HBase
      • P1: Nodes: total_regionservers, live_regionservers, dead_regionservers, masters
      • P1: No. of namespaces, tables
      • P2: Last major compaction (time + info)
    • Zookeeper: Most of these are from the output of echo mntr | nc localhost 2181
      • P1: Num of alive connections
      • P1: Num of znodes
      • P1: Num of watches
      • P1: Num of ephemeral nodes
      • P1: Data size
      • P1: Open file descriptor count
      • P1: Max file descriptor count
    • Kafka
    • Sentry
      • P1: # of roles
      • P1: # of privileges
      • P1: memory: total, used, available
      • P1: requests per second
      • any more?
    • KMS

  • Component Overview
    • P1: YARN, HDFS, HBase, Zoookeeper, Kafka, Hive
    • P1: For each component: version, url, logs_url
    • P2: Distribution info
    • P2: Plus button - to store custom components and version, url, logs_url for each.

Design

Data Sources

Data for these APIs will be sourced using:

  • DistributedFileSystem - For HDFS statistics
  • YarnClient - for YARN statistics and info
  • HBaseAdmin - for HBase statistics and info
  • Configuration and HBaseConfiguration - For HDFS, YARN and HBase info

Versions

  • CDAP - co.cask.cdap.common.utils.ProjectInfo
  • HBase - co.cask.cdap.data2.util.hbase.HBaseVersion
  • YARN - org.apache.hadoop.yarn.util.YarnVersionInfo
  • HDFS - org.apache.hadoop.util.VersionInfo
  • Zookeeper - No client API available. Will have to build a utility around echo stat | nc localhost 2181
  • Hive - org.apache.hive.common.util.HiveVersionInfo

URL

  • CDAP - $(dashboard.bind.address) + $(dashboard.bind.port)
  • YARN - $(yarn.resourcemanager.webapp.address)
  • HDFS -  $(dfs.namenode.http-address)
  • HBase - hbaseAdmin.getClusterStatus().getMaster().toString()

REST API

The following REST APIs will be exposed from app fabric.

Info

Path

/v3/system/serviceproviders/info

Output

{
  "hdfs": {
    "version": "2.7.0",
    "url": "http://localhost:50070",
    "url": "http://localhost:50070/logs/"
  },
  "yarn": {
    "version": "2.7.0",
    "url": "http://localhost:8088",
    "logs": "http://localhost:8088/logs/"
  },
  "hbase": {
    "version": "1.0.0",
    "url": "http://localhost:50070",
    "logs": "http://localhost:60010/logs/"
  },
  "hive": {
    "version": 1.2
  },
  "zookeeper": {
    "version": "3.4.2"
  },
  "kafka": {
    "version": "2.10"
  }
}

Statistics

Path

/v3/system/serviceproviders/statistics

Output

 {
  "cdap": {
    "masters": 2,
    "kafka-servers": 2,
    "routers": 1,
    "auth-servers": 1,
    "namespaces": 10,
    "apps": 46,
    "artifacts": 23,
    "datasets": 68,
    "streams": 34,
    "programs": 78
  },
  "hdfs": {
    "space": {
      "total": 3452759234,
      "used": 34525543,
      "available": 3443555345
    },
    "nodes": {
      "total": 40,
      "healthy": 36,
      "decommissioned": 3,
      "decommissionInProgress": 1
    },
    "blocks": {
      "missing": 33,
      "corrupt": 3,
      "underreplicated": 5
    }
  },
  "yarn": {
    "nodes": {
      "total": 35,
      "new": 0,
      "running": 30,
      "unhealthy": 1,
      "decommissioned": 2,
      "lost": 1,
      "rebooted": 1
    },
    "apps": {
      "total": 30,
      "submitted": 2,
      "accepted": 4,
      "running": 20,
      "failed": 1,
      "killed": 3,
      "new": 0,
      "new_saving": 0
    },
    "memory": {
      "total": 8192,
      "used": 7168,
      "available": 1024
    },
    "virtualCores": {
      "total": 36,
      "used": 12,
      "available": 24
    },
    "queues": {
      "total": 10,
      "stopped": 2,
      "running": 8,
      "maxCapacity": 32,
      "currentCapacity": 21
    }
  },
  "hbase": {
    "nodes": {
      "totalRegionServers": 37,
      "liveRegionServers": 34,
      "deadRegionServers": 3,
      "masters": 3
    },
    "tables": 56,
    "namespaces": 43
  }
}

Sentry

The following is available by enabling the sentry web service (ref: http://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_metrics.html) and querying for metrics (API: http://[sentry-service-host]:51000/metrics?pretty=true).

{
  "version" : "3.0.0",
  "gauges" : {
    "buffers.direct.capacity" : {
      "value" : 57344
    },
    "buffers.direct.count" : {
      "value" : 5
    },
    "buffers.direct.used" : {
      "value" : 57344
    },
    "buffers.mapped.capacity" : {
      "value" : 0
    },
    "buffers.mapped.count" : {
      "value" : 0
    },
    "buffers.mapped.used" : {
      "value" : 0
    },
    "gc.PS-MarkSweep.count" : {
      "value" : 0
    },
    "gc.PS-MarkSweep.time" : {
      "value" : 0
    },
    "gc.PS-Scavenge.count" : {
      "value" : 2
    },
    "gc.PS-Scavenge.time" : {
      "value" : 26
    },
    "memory.heap.committed" : {
      "value" : 1029701632
    },
    "memory.heap.init" : {
      "value" : 1073741824
    },
    "memory.heap.max" : {
      "value" : 1029701632
    },
    "memory.heap.usage" : {
      "value" : 0.17999917863585554
    },
    "memory.heap.used" : {
      "value" : 185345448
    },
    "memory.non-heap.committed" : {
      "value" : 31391744
    },
    "memory.non-heap.init" : {
      "value" : 24576000
    },
    "memory.non-heap.max" : {
      "value" : 136314880
    },
    "memory.non-heap.usage" : {
      "value" : 0.2187954829289363
    },
    "memory.non-heap.used" : {
      "value" : 29825080
    },
    "memory.pools.Code-Cache.usage" : {
      "value" : 0.029324849446614582
    },
    "memory.pools.PS-Eden-Space.usage" : {
      "value" : 0.6523454156767787
    },
    "memory.pools.PS-Old-Gen.usage" : {
      "value" : 1.1440740671897877E-4
    },
    "memory.pools.PS-Perm-Gen.usage" : {
      "value" : 0.32970512204053926
    },
    "memory.pools.PS-Survivor-Space.usage" : {
      "value" : 0.22010480095358456
    },
    "memory.total.committed" : {
      "value" : 1061093376
    },
    "memory.total.init" : {
      "value" : 1098317824
    },
    "memory.total.max" : {
      "value" : 1166016512
    },
    "memory.total.used" : {
      "value" : 215170528
    },
    "org.apache.sentry.provider.db.service.persistent.SentryStore.group_count" : {
      "value" : 3
    },
    "org.apache.sentry.provider.db.service.persistent.SentryStore.privilege_count" : {
      "value" : 0
    },
    "org.apache.sentry.provider.db.service.persistent.SentryStore.role_count" : {
      "value" : 132
    },
    "threads.blocked.count" : {
      "value" : 1
    },
    "threads.count" : {
      "value" : 38
    },
    "threads.daemon.count" : {
      "value" : 27
    },
    "threads.deadlocks" : {
      "value" : [ ]
    },
    "threads.new.count" : {
      "value" : 0
    },
    "threads.runnable.count" : {
      "value" : 6
    },
    "threads.terminated.count" : {
      "value" : 0
    },
    "threads.timed_waiting.count" : {
      "value" : 8
    },
    "threads.waiting.count" : {
      "value" : 23
    }
  },
  "counters" : { },
  "histograms" : { },
  "meters" : { },
  "timers" : {
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.create-role" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.drop-privilege" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.drop-role" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.grant-privilege" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.grant-role" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.list-privileges-by-authorizable" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.list-privileges-by-role" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.list-privileges-for-provider" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.list-roles-by-group" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.rename-privilege" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.revoke-privilege" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    },
    "org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessor.revoke-role" : {
      "count" : 0,
      "max" : 0.0,
      "mean" : 0.0,
      "min" : 0.0,
      "p50" : 0.0,
      "p75" : 0.0,
      "p95" : 0.0,
      "p98" : 0.0,
      "p99" : 0.0,
      "p999" : 0.0,
      "stddev" : 0.0,
      "m15_rate" : 0.0,
      "m1_rate" : 0.0,
      "m5_rate" : 0.0,
      "mean_rate" : 0.0,
      "duration_units" : "seconds",
      "rate_units" : "calls/second"
    }
  }
}

TODO: CDAP Master Uptime?

Caching

It is not possible to hit HBase/YARN/HDFS for every request from the UI. As a result, the result of the statistics API will have to be cached, with a configurable timeout. Details TBD.

  • No labels