在Python中获取文件下载或上传进度可以通过以下几种方法实现:
1. 使用`tqdm`库:
`tqdm`是一个快速,可扩展的进度条库,可以很容易地添加到循环中显示进度。
from tqdm import tqdm
import requests
url = 'http://example.com/file.zip'
response = requests.get(url, stream=True)
file_size = int(response.headers.get('Content-Length', 0))
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
for chunk in tqdm(response.iter_content(chunk_size=1024),
total=file_size, unit='B', unit_scale=True, unit_divisor=1024):
if chunk:
f.write(chunk)
2. 使用`urllib.request.urlretrieve`的`reporthook`参数:
import sys
import os
from urllib.request import urlretrieve
def reporthook(count, blockSize, totalSize):
percent = int(count * blockSize * 100 / totalSize)
sys.stdout.write('\r%d%%' % percent)
sys.stdout.flush()
url = 'http://example.com/file.zip'
filename = 'file.zip'
urlretrieve(url, filename, reporthook=reporthook)
sys.stdout.write('\rDownload complete, saved as %s\n' % filename)
3. 使用`logging`模块记录进度信息:
import logging
logger = logging.getLogger('my_crawler')
logger.setLevel(logging.INFO)
def log_progress(message):
logger.info(message)
爬虫代码
for url in urls:
爬取页面
html = requests.get(url).text
记录进度
log_progress(f'已爬取 {len(html)} 个字节')
4. 自定义回调函数:
import time
def progress_callback(block_num, block_size, total_size):
percent = int(block_num * block_size * 100 / total_size)
sys.stdout.write('\r%d%%' % percent)
sys.stdout.flush()
下载文件的代码
urllib.request.urlretrieve(DATA_URL, name, reporthook=progress_callback)
以上方法可以帮助你在Python中跟踪文件的下载或上传进度。你可以根据具体需求选择合适的方法