使用Python爬虫与RabbitMQ进行通信,通常涉及以下几个步骤:
1. 安装依赖库
确保你已经安装了`pika`库,这是与RabbitMQ通信的Python客户端库。
pip install pika
2. 配置RabbitMQ
确保RabbitMQ服务器正在运行,并且你知道连接的主机名、端口、用户名和密码。
3. 创建生产者
生产者负责将任务发送到RabbitMQ队列。
import pika设置连接参数username = 'admin'password = 'admin'host = '127.0.0.1'port = 5672queue_name = 'demo_write.queue'exchange_name = 'demo.exchange'routing_key = 'demo'创建凭证credentials = pika.PlainCredentials(username, password)建立连接并获取通道connection = pika.BlockingConnection(pika.ConnectionParameters(host=host, port=port, credentials=credentials))channel = connection.channel()声明队列channel.queue_declare(queue=queue_name)发送消息channel.basic_publish(exchange=exchange_name, routing_key=routing_key, body='Hello World!')print(" [x] Sent 'Hello World!'")关闭连接channel.close()connection.close()
4. 创建消费者

消费者负责从RabbitMQ队列中接收任务并处理。
import pikadef callback(ch, method, properties, body):print(" [x] Received %r" % body)设置连接参数username = 'admin'password = 'admin'host = '127.0.0.1'port = 5672queue_name = 'demo_write.queue'创建凭证credentials = pika.PlainCredentials(username, password)建立连接并获取通道connection = pika.BlockingConnection(pika.ConnectionParameters(host=host, port=port, credentials=credentials))channel = connection.channel()声明队列channel.queue_declare(queue=queue_name)消费消息channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)print(' [*] Waiting for messages. To exit press CTRL+C')channel.start_consuming()
5. 运行爬虫
将爬虫任务发送到RabbitMQ队列,消费者从队列中取出任务并执行爬虫。
from tasks import get_htmlfrom queue import Queuefrom bs4 import BeautifulSoup创建队列q = Queue()将爬虫任务发送到队列q.put(get_html.s('http://example.com'))运行消费者while True:url = q.get()if url is None:breakhtml = get_html.delay(url)soup = BeautifulSoup(html, 'html.parser')处理爬取到的数据
以上示例展示了如何使用Python爬虫与RabbitMQ进行通信的基本流程。实际应用中,你可能需要根据具体需求对代码进行调整和优化。
