Java Http Client 连接池复用研究

项目中用到了apache的httpclient，使用了池化的PoolingHttpClientConnectionManager，研究了一下相关的实现以及JDK原生的HttpURLConnection 的连接复用实现。

1.HttpClient

public class TestHttpClient {
    private PoolingHttpClientConnectionManager pool;

    @Before
    public void init() {
        pool = new PoolingHttpClientConnectionManager(10, TimeUnit.MILLISECONDS);
    }

    public CloseableHttpClient getClient() {
        return HttpClientBuilder.create().setConnectionManager(pool).build();
    }

    @Test
    public void test() throws Exception {
        CloseableHttpResponse execute = getClient().execute(new HttpGet("http://www.baidu.com"));
        System.out.println(EntityUtils.toString(execute.getEntity(), "utf8"));
        execute = getClient().execute(new HttpGet("http://www.baidu.com"));
        System.out.println(EntityUtils.toString(execute.getEntity(), "utf8"));
    }
}

上面代码的两次http请求，会共用一个tcp连接请求。复用链接的过程主要是如下：

PoolingHttpClientConnectionManager#requestConnection 根据相同的route信息从连接池获取
连接池为空或者没有空闲的连接，则会在AbstractConnPool#getPoolEntryBlocking 尝试新建，单个route最大连接可配置，默认值为2，若有空闲连接，则直接复用。
使用完毕后，归还给连接池。

下面主要捋清一下归还连接的过程以及连接保活的策略。

1.1 HttpClient 连接归还

AbstractConnPool#release中归还连接的代码如下：

public void release(final E entry, final boolean reusable) {
    this.lock.lock();
    try {
        if (this.leased.remove(entry)) {
            final RouteSpecificPool<T, C, E> pool = getPool(entry.getRoute());
            pool.free(entry, reusable);
            if (reusable && !this.isShutDown) {
                this.available.addFirst(entry);
            } else {
                entry.close();
            }
            onRelease(entry);
            Future<E> future = pool.nextPending();
            if (future != null) {
                this.pending.remove(future);
            } else {
                future = this.pending.poll();
            }
            if (future != null) {
                this.condition.signalAll();
            }
        }
    } finally {
        this.lock.unlock();
    }
}

那么问题来了，以上的示例代码哪一行会触发连接归还呢？没错，就是EntitysUtil#toString，官方tutorial描述如下：

When working with streaming entities, one can use the EntityUtils#consume(HttpEntity) method to ensure that the entity content has been fully consumed and the underlying stream has been closed.

具体的调用堆栈如下：

at org.apache.http.pool.AbstractConnPool.release(AbstractConnPool.java:409)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.releaseConnection(PoolingHttpClientConnectionManager.java:347)
	at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:99)
	at org.apache.http.impl.execchain.ConnectionHolder.releaseConnection(ConnectionHolder.java:120)
	at org.apache.http.impl.execchain.ResponseEntityProxy.releaseConnection(ResponseEntityProxy.java:76)
	at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:144)
	at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
	at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172)
	at java.util.zip.InflaterInputStream.close(InflaterInputStream.java:227)
	at java.util.zip.GZIPInputStream.close(GZIPInputStream.java:136)
	at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94)
	at org.apache.http.util.EntityUtils.toString(EntityUtils.java:232)
	at org.apache.http.util.EntityUtils.toString(EntityUtils.java:270)
	at org.apache.http.util.EntityUtils.toString(EntityUtils.java:290)

其实就是在本次http流读完后，自动归还给连接池。如果调用者不想获取对应的response body，归还连接则要调用这个方法归还连接:

1	execute.getEntity().getContent().close()

1.2 拒绝归还连接

如果不归还连接，会发生什么情况呢？还是上述代码，把归还连接相关的代码注释掉，重复发起请求：

1
2
3

while (true) {
    CloseableHttpResponse execute = getClient().execute(new HttpGet("http://www.baidu.com"));
}

代码直接waiting在如下堆栈：

"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	  at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
	  at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
	  at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
	  - locked <0x727> (a org.apache.http.pool.AbstractConnPool$2)
	  at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
	  at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:304)
	  at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:280)
	  at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
	  at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	  at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	  at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	  at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)

查找对应的代码AbstractConnPool#getPoolEntryBlocking，可以看到hang死在等待连接池归还的状态下（上下文代码有点多，不贴了），可以看到，默认的阈值是maxPerRoute = 2，所以默认配置的情况下一旦大于等于2个连接未归还，接下去的请求全部都会hang死。

2. JDK HttpURLConnection

JDK 原生的HttpURLConnection 实际上默认也是支持池化的，下面是最简单的示例代码：

public void testJDKHttp() throws Exception {
    URL url = new URL("http://www.baidu.com");
    HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
    urlConnection.setRequestMethod("GET");
    InputStream is = urlConnection.getInputStream();
    System.out.println(new String(IOUtils.readNBytesOrEOF(is, urlConnection.getContentLength())));
    if (--recursiveDepth > 0) {
        testJDKHttp();
    }
}

设置递归深度为2，即两次请求百度，实际上，这两次请求是同一个tcp连接去完成的。

2.1 JDK HttpURLConnection 归还连接

归还连接的点与HttpClient类似，都是在流读完的时候归还给连接池，调用堆栈如下：

at sun.net.www.http.KeepAliveCache.put(KeepAliveCache.java:80)
at sun.net.www.http.HttpClient.putInKeepAliveCache(HttpClient.java:438)
at sun.net.www.http.HttpClient.finished(HttpClient.java:395)
at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:97)
at sun.net.www.MeteredStream.justRead(MeteredStream.java:93)
at sun.net.www.MeteredStream.read(MeteredStream.java:135)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3454)
at sun.misc.IOUtils.readNBytesOrEOF(IOUtils.java:91)

KeepAliveCache就是连接池的实现，与HttpClient的lazy delete 策略不同，JDK 是启用一个定时器去处理过期连接的，KeepAliveCache#put：

ThreadGroup var1 = Thread.currentThread().getThreadGroup();

for(ThreadGroup var2 = null; (var2 = var1.getParent()) != null; var1 = var2) {
}

KeepAliveCache.this.keepAliveTimer = new Thread(var1, KeepAliveCache.this, "Keep-Alive-Timer");
KeepAliveCache.this.keepAliveTimer.setDaemon(true);
KeepAliveCache.this.keepAliveTimer.setPriority(8);
KeepAliveCache.this.keepAliveTimer.setContextClassLoader((ClassLoader)null);
KeepAliveCache.this.keepAliveTimer.start();
return null;

从代码里可以看到，定时器5s检测一次。keep-alive的默认存活时间是5000ms，或者在response头通过Keep-Alive 指定。
所以，一般情况下，5秒内重复再次调用同一个url，那么会复用同一个连接。

2.2 JDK HttpURLConnection 拒绝归还连接

HttpURLConnection 并没有一个连接限制，如果一直不归还连接，会一直创建，造成泄漏。

2.3 提前归还连接的策略

如果发起了一个请求，但是处于特殊的原因，response流没有读完，就想归还连接。
JDK 会检测是否读完response，如果没有读完，会选择直接关掉这个连接。
HttpClient 则会直接尝试读完response流，然后归还连接池。所以，如果存在不读取完整response流的情况，应该使用HEAD去预请求，而不是直接用GET浪费带宽。