LOG current yarn memory usage when an Application is Submitted or accepted.
Description
Current job submission via yarn : Submitted > accepted > running ( if resource is available )
But if resource is not available i.e. yarn available memory is less than yarn pending memory then the application is killed. We do not have a LOG or WARN to indicate the root cause of this pipeline failure.
1. Better logging of yarn resource :
When the application goes in submitted or accepted state:
we should log YARN over all available memory/core + YARN pending memory
log how much resource are we requesting. [ Possible that multiple jobs are running on the cluster and this application might be requesting for less resource , just for better visibility ]
If the pipeline fails because of resource unavailability [ STARTING:o.a.t.y.YarnTwillController@138] - Yarn application worker.edw.aa_df_test_truncate.DeltaWorker application_1705099731196_0153 is not in running state. Shutting down controller.
We should log the reason is insufficient resources ( if there is an error code from YARN , then great. )
The above would require interacting with yarn client to fetch these infos..
Release Notes
None
Activity
Show:
Pinned fields
Click on the next to a field label to start pinning.
Current job submission via yarn :
Submitted > accepted > running ( if resource is available )
But if resource is not available i.e. yarn available memory is less than yarn pending memory then the application is killed.
We do not have a LOG or WARN to indicate the root cause of this pipeline failure.
1. Better logging of yarn resource :
When the application goes in submitted or accepted state:
we should log YARN over all available memory/core + YARN pending memory
log how much resource are we requesting. [ Possible that multiple jobs are running on the cluster and this application might be requesting for less resource , just for better visibility ]
If the pipeline fails because of resource unavailability
[ STARTING:o.a.t.y.YarnTwillController@138] - Yarn application worker.edw.aa_df_test_truncate.DeltaWorker application_1705099731196_0153 is not in running state. Shutting down controller.
We should log the reason is insufficient resources ( if there is an error code from YARN , then great. )
The above would require interacting with yarn client to fetch these infos..