1. 使用BeautifulSoup库提取HTML中的``标签的`href`属性:
```python
from bs4 import BeautifulSoup
import requests
url = 'http://example.com'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
urls = [a['href'] for a in soup.find_all('a', href=True)]
print(urls)
2. 使用正则表达式匹配带有URL的字符串:
```python
import re
text = 'This is a URL: https://example.com'
urls = re.findall(r'https?://[^\s]+', text)
print(urls)
3. 使用Requests库获取HTML响应并使用BeautifulSoup或正则表达式进一步提取URL:
```python
import requests
from bs4 import BeautifulSoup
url = 'http://example.com'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
urls = [a['href'] for a in soup.find_all('a', href=True)]
print(urls)
4. 使用`lxml`库的XPath功能提取URL:
```python
from lxml import etree
import requests
url = 'http://example.com'
response = requests.get(url)
html = response.text
tree = etree.HTML(html)
urls = tree.xpath('//@href')
print(urls)
5. 使用`selenium`库打开浏览器并获取网页中的所有链接:
```python
from selenium import webdriver
url = 'http://example.com'
driver = webdriver.Firefox()
driver.get(url)
links = driver.find_elements_by_tag_name('a')
for link in links:
print(link.get_attribute('href'))
以上方法可以帮助你在Python中查找和提取网址。请选择适合你需求的方法进行操作