Python3 用 urllib 下载图片非常慢，会是什么原因呢？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 2859 天前的主题，其中的信息可能已经有所发展或是发生改变。

初学者想学写个爬虫，边学边写

想要下载一张 Y 站的图片，代码为

urllib.request.urlopen('http://xxx.jpg').read()

其中 url 是可以正常访问的。图片不大，浏览器打开只需要几秒（排除缓存原因）。但在 python 中下载它却需要 30+秒，将下载到的数据写出为文件是可以正常查看的

那么问题来了，究竟是什么原因导致下载一张图片那么慢呢？

请问是还有什么地方需要配置吗？

附完整代码：

# 创建目录存放今天爬下来的图
dir_name = datetime.datetime.now().strftime('%Y%m%d')
if not os.path.exists(dir_name):
    os.mkdir(dir_name)
    
# info[1] 的值为 https://files.yande.re/sample/6718a8caa71a4547a417f41bc9f063bb/yande.re%20385001%20sample%20byakuya_reki%20seifuku.jpg
print('开始下载……')
print(info[1])
i = time.time()
img = urllib.request.urlopen(info[1]).read()
print('下载完毕。耗时：'+str(int(time.time() - i))+'s')

# 获取文件名，并将%20 替换为空格
file_name = info[1].split('/')[-1].replace('%20', ' ')
file = open(dir_name+'/'+file_name, 'wb')
file.write(img)
file.close()
exit(200)

第 1 条附言 · 2017-02-27 21:52:24 +08:00

经测试，是网站对爬虫限速了
加上 UA 、 Host 、 Referer 等头信息后一切正常， XD 谢谢各位

dir_name

info

Python

17 条回复 • 2017-02-28 23:02:07 +08:00