使用Python爬虫与RabbitMQ进行通信,通常涉及以下几个步骤:
1. 安装依赖库
确保你已经安装了`pika`库,这是与RabbitMQ通信的Python客户端库。
pip install pika
2. 配置RabbitMQ
确保RabbitMQ服务器正在运行,并且你知道连接的主机名、端口、用户名和密码。
3. 创建生产者
生产者负责将任务发送到RabbitMQ队列。
import pika
设置连接参数
username = 'admin'
password = 'admin'
host = '127.0.0.1'
port = 5672
queue_name = 'demo_write.queue'
exchange_name = 'demo.exchange'
routing_key = 'demo'
创建凭证
credentials = pika.PlainCredentials(username, password)
建立连接并获取通道
connection = pika.BlockingConnection(pika.ConnectionParameters(host=host, port=port, credentials=credentials))
channel = connection.channel()
声明队列
channel.queue_declare(queue=queue_name)
发送消息
channel.basic_publish(exchange=exchange_name, routing_key=routing_key, body='Hello World!')
print(" [x] Sent 'Hello World!'")
关闭连接
channel.close()
connection.close()
4. 创建消费者
消费者负责从RabbitMQ队列中接收任务并处理。
import pika
def callback(ch, method, properties, body):
print(" [x] Received %r" % body)
设置连接参数
username = 'admin'
password = 'admin'
host = '127.0.0.1'
port = 5672
queue_name = 'demo_write.queue'
创建凭证
credentials = pika.PlainCredentials(username, password)
建立连接并获取通道
connection = pika.BlockingConnection(pika.ConnectionParameters(host=host, port=port, credentials=credentials))
channel = connection.channel()
声明队列
channel.queue_declare(queue=queue_name)
消费消息
channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)
print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
5. 运行爬虫
将爬虫任务发送到RabbitMQ队列,消费者从队列中取出任务并执行爬虫。
from tasks import get_html
from queue import Queue
from bs4 import BeautifulSoup
创建队列
q = Queue()
将爬虫任务发送到队列
q.put(get_html.s('http://example.com'))
运行消费者
while True:
url = q.get()
if url is None:
break
html = get_html.delay(url)
soup = BeautifulSoup(html, 'html.parser')
处理爬取到的数据
以上示例展示了如何使用Python爬虫与RabbitMQ进行通信的基本流程。实际应用中,你可能需要根据具体需求对代码进行调整和优化。