Scrape News Headlines With Python in <10 Lines of Code!

Code_Jedi - Aug 3 '21 - - Dev Community

Today I'll show you a way to scrape news headlines in python in under 10 lines of code!


Let's get started...

First of all, make sure to import these libraries at the beginning of your python script:



import requests
from bs4 import BeautifulSoup


Enter fullscreen mode Exit fullscreen mode

For this tutorial, I'll be using BBC news as my news source, use these 2 lines of code to get it's url:



url='https://www.bbc.com/news'
response = requests.get(url)


Enter fullscreen mode Exit fullscreen mode

Now we're ready to scrape using BeautifulSoup!

Head over to BBC news and inspect a news headline by right clicking and pressing inspect.
As you'll see, all news headlines are contained within an "h3" tag:
h3 tags


Now add these 4 lines of code to scrape and display all the h3 tags from BBC news:



soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
for x in headlines:
    print(x.text.strip())


Enter fullscreen mode Exit fullscreen mode
  • First, we define "soup" as the innerHTML of the BBC news webpage.
  • Next, we define "headlines" as an array of all h3 tags found within the webpage.
  • Finally, paddle through the "headlines" array and display all of it's contents one by one ridding each element of it's outerHTML using the "text.strip()" method.

Full code



import requests
from bs4 import BeautifulSoup

url='https://www.bbc.com/news'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
for x in headlines:
    print(x.text.strip())


Enter fullscreen mode Exit fullscreen mode

Now if you run your script, your output should look something like this:
h3 results


Byeeeee👋

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .