Using rotating proxies for web scraping is an effective way, especially when you need to access the website frequently or bypass anti-crawler mechanisms. Rotating proxies can automatically change IP addresses, thereby reducing the risk of being blocked.
The following is an example of using rotating proxies with Python's requests library and Selenium for web scraping.
Using the requests
library
1. Install necessary libraries:
First, you need to install the requests
library.
2. Configure rotating proxy:
You need to get an API key or proxy list from the rotating proxy service provider and configure them in requests.
3. Send requests:
Use the requests
library to send HTTP requests and forward them through the proxy.
Sample code:
import requests
from some_rotating_proxy_service import get_proxy # Assuming this is the function provided by your rotating proxy service
#Get a new proxy
proxy = get_proxy()
# Set the proxy's HTTP and HTTPS headers (may vary depending on the proxy service's requirements)
proxies = {
'http': f'http://{proxy}',
'https': f'https://{proxy}'
}
# Sending a GET request
url = 'http://example.com'
try:
response = requests.get(url, proxies=proxies)
# Processing Response Data
print(response.text)
except requests.exceptions.ProxyError:
print('Proxy error occurred')
except Exception as e:
print(f'An error occurred: {e}')
Using Selenium
1. Install necessary libraries and drivers:
Install the Selenium
library and the WebDriver for your browser (such as ChromeDriver).
2. Configure rotating proxies:
Similar to requests
, you need to get the proxy information from the rotating proxy service provider and configure them in Selenium.
3. Launch a browser and set the proxy:
Launch a browser using Selenium and set the proxy through the browser options.
Sample code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from some_rotating_proxy_service import get_proxy # Assuming this is the function provided by your rotating proxy service
# Get a new proxy
proxy = get_proxy()
# Set Chrome options to use a proxy
chrome_options = Options()
chrome_options.add_argument(f'--proxy-server=http://{proxy}')
# Launch Chrome browser
driver = webdriver.Chrome(options=chrome_options)
# Visit the website
url = 'http://example.com'
driver.get(url)
# Processing web data
# ...(For example, use driver.page_source to get the source code of a web page, or use driver to find a specific element.)
# Close the browser
driver.quit()
Things to note
Make sure the rotating proxy service is reliable and provides enough proxy pools to avoid frequent IP changes and blockages.
Plan your scraping tasks properly according to the pricing and usage limits of the rotating proxy service.
When using Selenium, pay attention to handling browser window closing and resource release to avoid memory leaks or other problems.
Comply with the target website's robots.txt file and crawling agreement to avoid legal disputes.