web scraping involves extracting data from websites. Python offers several libraries for web scraping, with Beautiful Soup and requests being popular choices. Here's a basic example of web scraping using these libraries to extract the titles of articles from a webpage:
First, install the required libraries if you haven't already:
pip install requests beautifulsoup4
Source Code:
import requests
from bs4 import BeautifulSoup
URL of the webpage to scrape
url = 'https://example.com' # Replace with the URL of the website you want to scrape
Send a GET request to the URL
response = requests.get(url)
Parse the HTML content of the webpage using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
Find all the tags (assumed to contain article titles)
titles = soup.find_all('h1')
Extract and print the text of each title
for title in titles:
print(title.text.strip())
Replace 'https://example.com' with the actual URL of the website you want to scrape. This code fetches the content of the webpage, parses it using Beautiful Soup, finds all the
tags (which might contain article titles), and prints the text of each title.
Remember, when scraping websites, ensure you're following their terms of service and robots.txt guidelines. Some websites may prohibit scraping or have specific rules you need to adhere to while extracting data. Always check a website's terms and conditions before scraping its content.