Pipeline run fails intermittently with SocketTimeoutException when dispatching program to SystemWorker
Description
Release Notes
None
Activity
Show:
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee
Ankit Jain
Ankit JainReporter
Ankit Jain
Ankit JainLabels
Triaged
No
Size
S
Components
Fix versions
Priority
Created 4 days ago
Updated 4 days ago
java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167) at io.cdap.common.http.HttpRequests.execute(HttpRequests.java:65) at io.cdap.cdap.common.internal.remote.RemoteClient.executeNonIdempotent(RemoteClient.java:163) at io.cdap.cdap.common.internal.remote.RemoteClient.execute(RemoteClient.java:143) at io.cdap.cdap.common.internal.remote.RemoteClient.execute(RemoteClient.java:117) at io.cdap.cdap.common.internal.remote.RemoteTaskExecutor.lambda$runTask$2(RemoteTaskExecutor.java:131) at io.cdap.cdap.common.service.Retries.callWithRetries(Retries.java:228) at io.cdap.cdap.common.internal.remote.RemoteTaskExecutor.runTask(RemoteTaskExecutor.java:120) at io.cdap.cdap.internal.app.deploy.RemoteProgramRunDispatcher.dispatchProgram(RemoteProgramRunDispatcher.java:128) at io.cdap.cdap.app.runtime.AbstractProgramRuntimeService.lambda$run$0(AbstractProgramRuntimeService.java:151) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Suppressed: io.cdap.cdap.api.retry.RetryFailedException: Retry failed. Encountered non retryable exception. at io.cdap.cdap.common.service.Retries.callWithRetries(Retries.java:236) ... 6 common frames omitted
Currently we only do retry when there is
ServiceException
orRetryableException
: https://github.com/cdapio/cdap/blob/7cc185bfe191cba2c12d00531201481e30b4e79a/cdap-common/src/main/java/io/cdap/cdap/common/internal/remote/RemoteTaskExecutor.java#L67