TMS clients don't reuse underlying connections
Description
Release Notes
Reuse network connections for TMS client.
Activity
Show:

Ali Anwar June 30, 2017 at 12:43 AMEdited
This PR reduces the number of connections in TIME_WAIT significantly:
Against 4.2.1: https://github.com/caskdata/cdap/pull/9170
Against 4.1.2: https://github.com/caskdata/cdap/pull/9181

Ali Anwar June 30, 2017 at 12:23 AM
I tried bumping it to 100 in cdap-env.sh (for master process) and in cdap-site.xml (for system service containers), but that didn't help.

Terence Yim June 29, 2017 at 11:34 PM
By default, it is keep-alive, but it has a limit of reuse, which the default in java is 5. We may consider bumping that number up (also need to see why there are so many connections).
http://docs.oracle.com/javase/7/docs/technotes/guides/net/http-keepalive.html
Fixed
Pinned fields
Click on the next to a field label to start pinning.
Created June 29, 2017 at 10:26 PM
Updated June 30, 2017 at 10:03 PM
Resolved June 30, 2017 at 10:03 PM
To prove this, ssh to a machine where the messaging service is running.
To see the list of network connections in TIME_WAIT state, run:
I found 1000-2000 such connections.
Take the port of the connection with the largest number and determine the process id (it will be the last column):
Take the process id, and you'll see that it is of the messaging service:
This means that the clients talking to messaging service keep closing and recreating new connections, frequently.