在Python中实现爬虫自动更换IP地址,你可以使用以下方法:
1. 使用`requests`库的`proxies`参数:
```python
import requests
from random import choice
初始化代理IP池
proxies = {
"http": "http://127.0.0.1:1080",
"https": "https://127.0.0.1:1080"
}
爬取目标网站数据
for url in url_list:
response = requests.get(url, proxies=proxies)
处理网页数据
with open("output.html", "w", encoding="utf-8") as f:
f.write(response.text)
2. 使用第三方库,如`Selenium`:
```python
from selenium import webdriver
初始化代理IP池
proxies = [
"http://127.0.0.1:1080",
"https://127.0.0.1:1080"
]
创建Selenium WebDriver实例
driver = webdriver.Chrome()
爬取目标网站数据
for url in url_list:
driver.get(url)
处理网页数据
with open("output.html", "w", encoding="utf-8") as f:
f.write(driver.page_source)
3. 使用代理IP池服务,如`Scrapy-Redis`或`Scrapy-Proxy-Pool`:
```python
使用Scrapy-Proxy-Pool示例
from scrapy.http import Request
from scrapy_proxy_pool.middlewares import ProxyMiddleware
class CustomProxyMiddleware(ProxyMiddleware):
def process_request(self, request, spider):
request.meta['proxy'] = self.get_random_proxy()
def get_random_proxy(self):
从代理IP池中获取随机代理
return self.proxy_pool.get_random_proxy()
4. 使用API获取代理IP:
```python
import requests
获取代理IP列表
response = requests.get('http://www.you代理.net/free/getip.php')
proxies = response.json()
使用代理IP发送请求
url = 'http://www.example.com'
proxy = {
'http': 'http://your_proxy_ip:your_proxy_port',
'https': 'https://your_proxy_ip:your_proxy_port'
}
response = requests.get(url, proxies=proxy)
print('请求结果:', response.text)
请根据你的需求选择合适的方法,并确保代理IP池的质量和稳定性。