TMS clients don't reuse underlying connections

Description

To prove this, ssh to a machine where the messaging service is running.

To see the list of network connections in TIME_WAIT state, run:

I found 1000-2000 such connections.

Take the port of the connection with the largest number and determine the process id (it will be the last column):

Take the process id, and you'll see that it is of the messaging service:

This means that the clients talking to messaging service keep closing and recreating new connections, frequently.

Release Notes

Reuse network connections for TMS client.

Activity

Show:

Ali Anwar June 30, 2017 at 12:43 AM
Edited

This PR reduces the number of connections in TIME_WAIT significantly:
Against 4.2.1: https://github.com/caskdata/cdap/pull/9170
Against 4.1.2: https://github.com/caskdata/cdap/pull/9181

Ali Anwar June 30, 2017 at 12:23 AM

I tried bumping it to 100 in cdap-env.sh (for master process) and in cdap-site.xml (for system service containers), but that didn't help.

Terence Yim June 29, 2017 at 11:34 PM

By default, it is keep-alive, but it has a limit of reuse, which the default in java is 5. We may consider bumping that number up (also need to see why there are so many connections).

http://docs.oracle.com/javase/7/docs/technotes/guides/net/http-keepalive.html

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Fix versions

Priority

Created June 29, 2017 at 10:26 PM
Updated June 30, 2017 at 10:03 PM
Resolved June 30, 2017 at 10:03 PM