Issues

Select view

Select search mode

 
50 of 72

Failure during program launch doesn't show up in program logs

Fixed

Description

I ran a program and it failed. The only logs I saw in program logs were DEBUG-level logs about provisioning.
I had to look into the cdap-debug.log file to gain insight about why the program failed.
Below is a relevant snippet from cdap-debug.log:

 

2019-01-31 22:57:07,397 - DEBUG [LocalLogProcessor-cdap:c.c.c.l.m.FileMetaDataWriter@60] - Writing meta data for logging context default:CampaignPipeline_v2:DataPipelineWorkflow: with startTimeMs 1548975427394 sequence Id 1548975427395 and location /opt/cdap/sandbox-5.1.3/data/logs/default/2019-01-31/CampaignPipeline_v2/DataPipelineWorkflow/1548975427395.avro2019-01-31 22:57:07,407 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@106] - Executing PROVISION subtask REQUESTING_CREATE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:57:09,407 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@110] - Completed PROVISION subtask REQUESTING_CREATE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:57:09,417 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@106] - Executing PROVISION subtask POLLING_CREATE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:30,242 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@110] - Completed PROVISION subtask POLLING_CREATE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:30,252 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@106] - Executing PROVISION subtask INITIALIZING for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:32,534 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@110] - Completed PROVISION subtask INITIALIZING for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:32,542 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@125] - Completed PROVISION task for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:33,516 - ERROR [program.status:c.c.c.a.r.AbstractProgramRuntimeService@173] - Exception while trying to run programjava.io.FileNotFoundException: /opt/cdap/sandbox-5.1.0/data/namespaces/system/artifacts/google-cloud/0.11.1-SNAPSHOT.b22caaaa-2cde-4363-afc0-4f30889c41c0.jar (No such file or directory) at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_191] at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_191] at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_191] at org.apache.twill.filesystem.LocalLocation.getInputStream(LocalLocation.java:86) ~[org.apache.twill.twill-common-0.13.0.jar:0.13.0] at co.cask.cdap.common.io.Locations.linkOrCopy(Locations.java:228) ~[na:na] at co.cask.cdap.app.runtime.AbstractProgramRuntimeService.copyArtifact(AbstractProgramRuntimeService.java:305) ~[na:na] at co.cask.cdap.app.runtime.AbstractProgramRuntimeService.createPluginSnapshot(AbstractProgramRuntimeService.java:284) ~[na:na] at co.cask.cdap.app.runtime.AbstractProgramRuntimeService.run(AbstractProgramRuntimeService.java:160) ~[na:na] at co.cask.cdap.internal.app.services.ProgramLifecycleService.startInternal(ProgramLifecycleService.java:525) [na:na] at co.cask.cdap.internal.app.services.ProgramNotificationSubscriberService.lambda$handleClusterEvent$0(ProgramNotificationSubscriberService.java:425) [na:na] at co.cask.cdap.internal.app.services.ProgramNotificationSubscriberService.postProcess(ProgramNotificationSubscriberService.java:155) [na:na] at co.cask.cdap.messaging.subscriber.AbstractMessagingSubscriberService.runTask(AbstractMessagingSubscriberService.java:185) ~[na:na] at co.cask.cdap.common.service.AbstractRetryableScheduledService.runOneIteration(AbstractRetryableScheduledService.java:155) ~[na:na] at com.google.common.util.concurrent.AbstractScheduledService$1$1.run(AbstractScheduledService.java:170) ~[com.google.guava.guava-13.0.1.jar:na] at com.google.common.util.concurrent.AbstractScheduledService$CustomScheduler$ReschedulableCallable.call(AbstractScheduledService.java:355) ~[com.google.guava.guava-13.0.1.jar:na] at com.google.common.util.concurrent.AbstractScheduledService$CustomScheduler$ReschedulableCallable.call(AbstractScheduledService.java:321) ~[com.google.guava.guava-13.0.1.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_191] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]2019-01-31 22:59:33,548 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@76] - Starting DEPROVISION task for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:33,559 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@106] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:34,113 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@110] - Completed DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 22:59:34,123 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@106] - Executing DEPROVISION subtask POLLING_DELETE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 23:00:00,001 - DEBUG [stream-cleanup:c.c.c.d.s.s.LocalStreamFileJanitorService$1@65] - Execute stream file cleanup.2019-01-31 23:00:00,002 - DEBUG [stream-cleanup:c.c.c.d.s.s.LocalStreamFileJanitorService$1@69] - Completed stream file cleanup.2019-01-31 23:00:00,002 - DEBUG [stream-cleanup:c.c.c.d.s.s.LocalStreamFileJanitorService$1@80] - Schedule stream file cleanup in 299998 ms2019-01-31 23:00:51,054 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@110] - Completed DEPROVISION subtask POLLING_DELETE for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.2019-01-31 23:00:51,064 - DEBUG [provisioning-service-3:c.c.c.i.p.t.ProvisioningTask@125] - Completed DEPROVISION task for program run program_run:default.CampaignPipeline_v2.-SNAPSHOT.workflow.DataPipelineWorkflow.87a0d2d4-25ab-11e9-a159-024265a1ac28.

Release Notes

None

Attachments

1
  • 01 Feb 2019, 02:18 AM
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Fix versions

Priority

Created February 1, 2019 at 1:57 AM
Updated December 11, 2020 at 8:04 PM
Resolved December 11, 2020 at 8:04 PM

Activity

Show:

Ali Anwar September 9, 2019 at 9:23 PM

Ok, reopened this for now then, since it is not a duplicate to that other issue.

Albert Shau September 9, 2019 at 8:10 PM

This particular issue seems to be different than CDAP-15757. This was caused by an errors during provisioning being logged as debug instead of warn or error. The other jira caused all logs from spark/mapreduce on the remote cluster to be lost, regardless of the level.

Ali Anwar September 7, 2019 at 1:21 AM

This has the same symptoms as in CDAP-15757, and I believe that it is the resolved by the fix in that bug. Marking as duplicate.
Let me know if you think otherwise.