Scheduler is logging excessively when it fails to launch a program

Description

I have a Pipeline triggering another pipeline. When I start the upstream, for some reason, the scheduler is unable to trigger the downstream. When this happens, the SDK logs are filled up with the following exception:

2017-08-21 15:45:28,614 - WARN [constraint-checker-task:c.c.c.s.ConstraintCheckerService$ConstraintCheckerThread@275] - Failed to run program program:default.p2.-SNAPSHOT.workflow.DataPipelineWorkflow in schedule p2.default.p1_v4.default. Skip running this program. org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() co.cask.cdap.data2.transaction.Transactions$4@edfaadb at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:237) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:223) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$5.executeInternal(Transactions.java:295) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$5.execute(Transactions.java:282) ~[na:na] at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:259) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.runReadyJobs(ConstraintCheckerService.java:264) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.checkJobQueue(ConstraintCheckerService.java:185) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.run(ConstraintCheckerService.java:158) [na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: java.lang.NullPointerException: null at java.util.HashMap.putMapEntries(HashMap.java:500) ~[na:1.8.0_72] at java.util.HashMap.<init>(HashMap.java:489) ~[na:1.8.0_72] at co.cask.cdap.api.schedule.ProgramStatusTriggerInfo.<init>(ProgramStatusTriggerInfo.java:61) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger$2.apply(ProgramStatusTrigger.java:85) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger$2.apply(ProgramStatusTrigger.java:76) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger.getTriggerSatisfiedResult(ProgramStatusTrigger.java:124) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger.getTriggerInfosAddArgumentOverrides(ProgramStatusTrigger.java:89) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.ScheduleTaskRunner.getTriggeringScheduleInfo(ScheduleTaskRunner.java:107) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.ScheduleTaskRunner.launch(ScheduleTaskRunner.java:96) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.runReadyJob(ConstraintCheckerService.java:303) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.access$600(ConstraintCheckerService.java:136) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread$2.call(ConstraintCheckerService.java:267) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread$2.call(ConstraintCheckerService.java:264) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$4.run(Transactions.java:262) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:235) ~[na:na] ... 12 common frames omitted 2017-08-21 15:45:28,621 - WARN [constraint-checker-task:c.c.c.s.ConstraintCheckerService$ConstraintCheckerThread@275] - Failed to run program program:default.p2.-SNAPSHOT.workflow.DataPipelineWorkflow in schedule p2.default.p1_v4.default. Skip running this program. org.apache.tephra.TransactionFailureException: Exception raised from TxRunnable.run() co.cask.cdap.data2.transaction.Transactions$4@7e045e2 at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:237) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.execute(Transactions.java:223) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$5.executeInternal(Transactions.java:295) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$5.execute(Transactions.java:282) ~[na:na] at co.cask.cdap.data2.transaction.Transactions.execute(Transactions.java:259) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.runReadyJobs(ConstraintCheckerService.java:264) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.checkJobQueue(ConstraintCheckerService.java:185) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.run(ConstraintCheckerService.java:158) [na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] Caused by: java.lang.NullPointerException: null at java.util.HashMap.putMapEntries(HashMap.java:500) ~[na:1.8.0_72] at java.util.HashMap.<init>(HashMap.java:489) ~[na:1.8.0_72] at co.cask.cdap.api.schedule.ProgramStatusTriggerInfo.<init>(ProgramStatusTriggerInfo.java:61) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger$2.apply(ProgramStatusTrigger.java:85) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger$2.apply(ProgramStatusTrigger.java:76) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger.getTriggerSatisfiedResult(ProgramStatusTrigger.java:124) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.trigger.ProgramStatusTrigger.getTriggerInfosAddArgumentOverrides(ProgramStatusTrigger.java:89) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.ScheduleTaskRunner.getTriggeringScheduleInfo(ScheduleTaskRunner.java:107) ~[na:na] at co.cask.cdap.internal.app.runtime.schedule.ScheduleTaskRunner.launch(ScheduleTaskRunner.java:96) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.runReadyJob(ConstraintCheckerService.java:303) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread.access$600(ConstraintCheckerService.java:136) [na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread$2.call(ConstraintCheckerService.java:267) ~[na:na] at co.cask.cdap.scheduler.ConstraintCheckerService$ConstraintCheckerThread$2.call(ConstraintCheckerService.java:264) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$4.run(Transactions.java:262) ~[na:na] at co.cask.cdap.data2.transaction.Transactions$CacheBasedTransactional.finishExecute(Transactions.java:235) ~[na:na] ... 12 common frames omitted

.

When this happens, it is continuously emitting these logs as WARN. We should reduce verbosity - log exception stacktrace as DEBUG once every few mins?

Release Notes

None

Activity

Show:

Sreevatsan RamanSeptember 22, 2017 at 6:29 AM

Moving out of 4.3.1. Not reproducible. Let's document a scenario to reproduce this issue then look at this.

Bhooshan MogalAugust 21, 2017 at 11:45 PM

Actually, it seems like when the scheduler fails to trigger a program, it keeps retrying infinitely. In one such case, it created 100s (UI only shows 100) of run records for my downstream pipeline.

Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Priority

Created August 21, 2017 at 10:48 PM
Updated June 26, 2018 at 10:04 PM

Flag notifications