Mapreduce metrics aren't recorded on a secure hadoop cluster

Description

While other metrics are created and recorded, mapreduce (and workflow) metrics do not seem to be populated, resulting in audi test failures.

Release Notes

None

Activity

Show:

Shankar Selvam March 30, 2015 at 6:54 PM

Alex Baranau February 24, 2015 at 7:43 PM

Fixing will require quite some effort: changing the way we run MR jobs. At same time we have to fix it. Keeping in 3.0 - but not sure how to plan for it. It could be at least week of work based on above.

Alex Baranau January 14, 2015 at 12:49 AM

moving out to 3.0 release

Alex Baranau December 3, 2014 at 11:38 PM

Moved to 2.7. We want to put this into the "known issues". There seem to be no quick fix that we can squeeze into 2.6.

The summary is:

  • we launch mapreduce job from a twill container - mr job runner - as that one executes user code, we want it to be executed in isolated env, as that user

  • that twill container watches for MR job to complete and reports its stats

  • to report stats mr job runner polls MR AM for mr job status

  • in secure cluster to talk to MR AM you need to have token

  • only kerberos authenticated clients can aqcuire tokens, which the mr job runner is not: all twill containers are not

One other option to fix is to have service as a part of cdap-master that reports stats for all mr jobs.

Shankar Selvam December 3, 2014 at 8:48 PM

The MapreduceProgramRunner get Token from the master to communicate with RM, but when the MapreduceProgramRunner launches the mapreduce job, it does not have the token for the AM (MR job) to communicate and get the metrics counters. but it can talk with RM and we guess that's how it knows the MR job's status.

since MapreduceProgramRunner is the one that start's AM, and it cannot communicate with AM (as it does not have the token for AM) it cannot get the token for AM.

Possible fix would be for Master to generate token for the Mapreduce job identified by a name and send it to MapreduceProgramRunner, so it can communicate with AM using that. but need to check the security implications of that.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Affects versions

Components

Fix versions

Priority

Created September 25, 2014 at 4:20 AM
Updated March 31, 2015 at 7:18 PM
Resolved March 31, 2015 at 7:18 PM