代码比较多,但是比较简单,一眼就看明白的,so,文字尽量少写了。

因为众所周知的网络原因,gavatar也开始越来越慢,写了一个小东西来解决这个问题,过程中遇到了get_headers这个函数,甚是忧伤,记录下来,以免后来人踩坑。 更新记录,函数稍微改了一下,返回值基本和之前序列化后的结果一致,暂时没考虑支持子项也支持数组等(考虑细节性能,还想把没用的http头砍掉….)

需求很简单:获取图片的head信息。

调试程序的时候发现这个函数的调用很缓慢,即使绑定ip,有时候都能蹦到20多秒。

寻思这个事情还是该加个超时吧,但是看官方文档,给出的导出函数接口如下:

array get_headers ( string $url [, int $format = 0 ] )

你没有看错,这个东西没有超时接口…

上GitHub翻看源码,期望可以用他的底层实现来重新实现一套:

/* {{{ proto array get_headers(string url[, int format])
   fetches all the headers sent by the server in response to a HTTP request */
PHP_FUNCTION(get_headers)
{
	char *url;
	size_t url_len;
	php_stream_context *context;
	php_stream *stream;
	zval *prev_val, *hdr = NULL, *h;
	HashTable *hashT;
	zend_long format = 0;

	if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|l", &url, &url_len, &format) == FAILURE) {
		return;
	}

	/** 省略其他一堆... **/
}
/* }}} */

但是很不幸的是zend_parse_parametersZEND_NUM_ARGS也都没有PHP版的导出函数。

于是造轮子开始:

function get_url_headers($url, $timeout = 10)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);

    $data = curl_exec($ch);
    $data = preg_split('/\n/', $data);

    $data = array_filter(array_map(function ($data) {
        $data = trim($data);
        if ($data) {
            $data = preg_split('/:\s/', trim($data), 2);
            $length = count($data);
            switch ($length) {
                case 2:
                    return array($data[0] => $data[1]);
                    break;
                case 1:
                    return $data;
                    break;
                default:
                    break;
            }
        }
    }, $data));

    sort($data);

    foreach ($data as $key => $value) {
        $itemKey = array_keys($value)[0];
        if (is_int($itemKey)) {
            $data[$key] = $value[$itemKey];
        } elseif (is_string($itemKey)) {
            $data[$itemKey] = $value[$itemKey];
            unset ($data[$key]);
        }
    }

    return $data;
}

对比最后结果,原版又是蛮长的等待,不知道校验啥去了(没继续追代码了,有兴趣的童鞋可以去跟下玩):

Array
(
    [0] => HTTP/1.0 302 Found
    [Accept-Ranges] => bytes
    [Cache-Control] => max-age=300
    [Content-Type] => Array
        (
            [0] => text/html; charset=utf-8
            [1] => text/html; charset=utf-8
        )

    [Date] => Array
        (
            [0] => Fri, 12 Dec 2014 15:35:40 GMT
            [1] => Fri, 12 Dec 2014 15:35:43 GMT
        )

    [Expires] => Fri, 12 Dec 2014 15:40:40 GMT
    [Last-Modified] => Wed, 11 Jan 1984 08:00:00 GMT
    [Link] => <http://www.gravatar.com/avatar/[省略...]?s=42&d=http%3A%2F%2F[省略...]&r=G>; rel="canonical"
    [Location] => http://i2.wp.com/[省略...]
    [Server] => Array
        (
            [0] => ECS (oxr/838B)
            [1] => nginx
        )

    [Source-Age] => 85
    [Via] => 1.1 varnish
    [X-Cache] => 302-HIT
    [X-Varnish] => 1470255088 1470006304
    [Content-Length] => 0
    [Connection] => Array
        (
            [0] => close
            [1] => close
        )

    [1] => HTTP/1.1 504 Gateway Timeout
)

轮子版返回(瞬间返回,两者内容略有不同,你仔细看就能发现一些有趣的地方了):

Array
(
    [0] => HTTP/1.1 302 Found
    [Accept-Ranges] => bytes
    [Via] => 1.1 varnish
    [Cache-Control] => max-age=300
    [Server] => ECS (oxr/838B)
    [Content-Type] => text/html; charset=utf-8
    [X-Varnish] => 1470255088 1470006304
    [Date] => Fri, 12 Dec 2014 20:31:02 GMT
    [Location] => http://i2.wp.com/[省略...]
    [Expires] => Fri, 12 Dec 2014 20:36:02 GMT
    [Source-Age] => 85
    [Last-Modified] => Wed, 11 Jan 1984 08:00:00 GMT
    [X-Cache] => 302-HIT
    [Link] => <http://www.gravatar.com/avatar/[省略...]?s=42&d=http%3A%2F%2F[省略...]&r=G>; rel="canonical"
    [Content-Length] => 0
)