Java Http Client 连接池复用研究

​ 项目中用到了apache的httpclient,使用了池化的PoolingHttpClientConnectionManager,研究了一下相关的实现以及JDK原生的HttpURLConnection 的连接复用实现。

1.HttpClient

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class TestHttpClient {
private PoolingHttpClientConnectionManager pool;

@Before
public void init() {
pool = new PoolingHttpClientConnectionManager(10, TimeUnit.MILLISECONDS);
}

public CloseableHttpClient getClient() {
return HttpClientBuilder.create().setConnectionManager(pool).build();
}

@Test
public void test() throws Exception {
CloseableHttpResponse execute = getClient().execute(new HttpGet("http://www.baidu.com"));
System.out.println(EntityUtils.toString(execute.getEntity(), "utf8"));
execute = getClient().execute(new HttpGet("http://www.baidu.com"));
System.out.println(EntityUtils.toString(execute.getEntity(), "utf8"));
}
}

上面代码的两次http请求,会共用一个tcp连接请求。复用链接的过程主要是如下:

  • PoolingHttpClientConnectionManager#requestConnection 根据相同的route信息从连接池获取
  • 连接池为空或者没有空闲的连接,则会在AbstractConnPool#getPoolEntryBlocking 尝试新建,单个route最大连接可配置,默认值为2,若有空闲连接,则直接复用。
  • 使用完毕后,归还给连接池。

下面主要捋清一下归还连接的过程以及连接保活的策略。

1.1 HttpClient 连接归还

AbstractConnPool#release中归还连接的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public void release(final E entry, final boolean reusable) {
this.lock.lock();
try {
if (this.leased.remove(entry)) {
final RouteSpecificPool<T, C, E> pool = getPool(entry.getRoute());
pool.free(entry, reusable);
if (reusable && !this.isShutDown) {
this.available.addFirst(entry);
} else {
entry.close();
}
onRelease(entry);
Future<E> future = pool.nextPending();
if (future != null) {
this.pending.remove(future);
} else {
future = this.pending.poll();
}
if (future != null) {
this.condition.signalAll();
}
}
} finally {
this.lock.unlock();
}
}

那么问题来了,以上的示例代码哪一行会触发连接归还呢?没错,就是EntitysUtil#toString,官方tutorial描述如下:

When working with streaming entities, one can use the EntityUtils#consume(HttpEntity) method to ensure that the entity content has been fully consumed and the underlying stream has been closed.

具体的调用堆栈如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
at org.apache.http.pool.AbstractConnPool.release(AbstractConnPool.java:409)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection(PoolingHttpClientConnectionManager.java:347)
at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:99)
at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:120)
at org.apache.http.impl.execchain.ResponseEntityProxy.releaseConnection(ResponseEntityProxy.java:76)
at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:144)
at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
at java.util.zip.InflaterInputStream.close(InflaterInputStream.java:227)
at java.util.zip.GZIPInputStream.close(GZIPInputStream.java:136)
at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:232)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:270)
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:290)

其实就是在本次http流读完后,自动归还给连接池。如果调用者不想获取对应的response body,归还连接则要调用这个方法归还连接:

1
execute.getEntity().getContent().close()

1.2 拒绝归还连接

如果不归还连接,会发生什么情况呢?还是上述代码,把归还连接相关的代码注释掉,重复发起请求:

1
2
3
while (true) {
CloseableHttpResponse execute = getClient().execute(new HttpGet("http://www.baidu.com"));
}

代码直接waiting在如下堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
"main@1" prio=5 tid=0x1 nid=NA waiting
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
- locked <0x727> (a org.apache.http.pool.AbstractConnPool$2)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:304)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:280)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)

查找对应的代码AbstractConnPool#getPoolEntryBlocking,可以看到hang死在等待连接池归还的状态下(上下文代码有点多,不贴了),可以看到,默认的阈值是maxPerRoute = 2,所以默认配置的情况下一旦大于等于2个连接未归还,接下去的请求全部都会hang死。

2. JDK HttpURLConnection

JDK 原生的HttpURLConnection 实际上默认也是支持池化的,下面是最简单的示例代码:

1
2
3
4
5
6
7
8
9
10
public void testJDKHttp() throws Exception {
URL url = new URL("http://www.baidu.com");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
InputStream is = urlConnection.getInputStream();
System.out.println(new String(IOUtils.readNBytesOrEOF(is, urlConnection.getContentLength())));
if (--recursiveDepth > 0) {
testJDKHttp();
}
}

设置递归深度为2,即两次请求百度,实际上,这两次请求是同一个tcp连接去完成的。

2.1 JDK HttpURLConnection 归还连接

归还连接的点与HttpClient类似,都是在流读完的时候归还给连接池,调用堆栈如下:

1
2
3
4
5
6
7
8
9
at sun.net.www.http.KeepAliveCache.put(KeepAliveCache.java:80)
at sun.net.www.http.HttpClient.putInKeepAliveCache(HttpClient.java:438)
at sun.net.www.http.HttpClient.finished(HttpClient.java:395)
at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:97)
at sun.net.www.MeteredStream.justRead(MeteredStream.java:93)
at sun.net.www.MeteredStream.read(MeteredStream.java:135)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3454)
at sun.misc.IOUtils.readNBytesOrEOF(IOUtils.java:91)

KeepAliveCache就是连接池的实现,与HttpClient的lazy delete 策略不同,JDK 是启用一个定时器去处理过期连接的,KeepAliveCache#put

1
2
3
4
5
6
7
8
9
10
11
ThreadGroup var1 = Thread.currentThread().getThreadGroup();

for(ThreadGroup var2 = null; (var2 = var1.getParent()) != null; var1 = var2) {
}

KeepAliveCache.this.keepAliveTimer = new Thread(var1, KeepAliveCache.this, "Keep-Alive-Timer");
KeepAliveCache.this.keepAliveTimer.setDaemon(true);
KeepAliveCache.this.keepAliveTimer.setPriority(8);
KeepAliveCache.this.keepAliveTimer.setContextClassLoader((ClassLoader)null);
KeepAliveCache.this.keepAliveTimer.start();
return null;

从代码里可以看到,定时器5s检测一次。keep-alive的默认存活时间是5000ms,或者在response头通过Keep-Alive 指定。
所以,一般情况下,5秒内重复再次调用同一个url,那么会复用同一个连接。

2.2 JDK HttpURLConnection 拒绝归还连接

HttpURLConnection 并没有一个连接限制,如果一直不归还连接,会一直创建,造成泄漏。

2.3 提前归还连接的策略

如果发起了一个请求,但是处于特殊的原因,response流没有读完,就想归还连接。
JDK 会检测是否读完response,如果没有读完,会选择直接关掉这个连接。
HttpClient 则会直接尝试读完response流,然后归还连接池。所以,如果存在不读取完整response流的情况,应该使用HEAD去预请求,而不是直接用GET浪费带宽。