Issues

Select view

Select search mode

 
50 of 98

Fail early if a pipeline (or MR job) is configured with more memory than allowed by Yarn

Description

If the user requests more mapper/reducer memory than allowed per container, a job can only fail. In that case, CDAP should fail early, before even trying to start the MR job, or it should reject the configuration for the mapper memory even sooner. 

Otherwise it takes minutes until the user sees an error message such as: 

2019-06-03 15:20:29,108 - INFO [MapReduceRunner-phase-1:i.c.c.e.b.m.ETLMapReduce@204] - Batch Run finished : status = ProgramState{status=FAILED, failureInfo='MAP capability required is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:8192, vCores:2> maxContainerCapability:<memory:6144, vCores:32000>

Release Notes

None
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Affects versions

Components

Fix versions

Priority

Created June 3, 2019 at 10:51 PM
Updated June 24, 2020 at 10:29 PM

Activity

Show:

Terence Yim June 3, 2019 at 11:25 PM

It's more than just the spec. We have logic to calculate the resources for each process type (driver, executor, mapper, reducer) based on the spec and runtime arguments.

Albert Shau June 3, 2019 at 10:55 PM

One possible enhancement is to pass the program's resource spec to the provisioner during the createCluster() call. This could allow the provisioner to create an appropriately sized cluster with the max container size set to an acceptable number, or fail if it cannot do so (or if the required memory is above some max memory configuration). This would allow us to fail within seconds instead of minutes, and would also reduce the amount of configuration users have to do (only need to adjust pipeline memory and not cluster memory).