Scrapy
爬虫时,由于重定向或是其他原因,会导致原始的start_url
发生改变,怎样才能得到原始的start_url
?
def start_requests(self):
start_url = 'your_scrapy_start_url'
yield Request(start_url, self.parse)
def parse(self, response):
item = YourItem()
item['start_url'] = 原始请求的 start_url
yield item
1
revotu 2017-06-28 22:39:34 +08:00
Scrapy 爬虫常见问题总结 : http://www.revotu.com/scrapy-reptile-faq.html
利用 Request 中的 meta 参数传递信息 def start_requests(self): start_url = 'your_scrapy_start_url' yield Request(start_url, self.parse, meta={'start_url':start_url}) def parse(self, response): item = YourItem() item['start_url'] = response.meta['start_url'] yield item |
2
knightdf 2017-06-29 00:20:59 +08:00
response.request.url
|