By default, Spark will retry a YARN app if the first attempt fails. For our pipelines, this second attempt basically always fails, and often in a way that misleads the user. For example, the first attempt may create an output directory on GCS based on the logical start time. The second attempt fails because the output directory already exists.
It seems like it would make sense to turn app retry off, especially since Spark already retries tasks within an app when the tasks fail.
Disabled Spark yarn app retries since spark already performs retries at a task level.
PR merged to disable yarn app retries. https://github.com/cdapio/cdap/pull/12805
Spark retries task failures by default. We should not retry failures at the yarn app level since they will basically fail again, override the original error message and mislead the user. |