The IT Documentation Problem

Everyone in IT can relate…you get that ticket or call…it’s the same issue that happened last month. You take a moment…desperately try to recall the fix…ask your partner…ask the senior member…no one…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Web scraping with python and BeautifulSoup

BeautifulSoup is a python library that makes web scraping very easy. You can install BeautifulSoup through ‘pip’. Just type ‘pip install bs4’ and you can choose to install it in a virtual environment.

Lets start, let’s create a python file called ‘crawler.py’. By the way, we are going to be using python3, well because it’s awesome and that is what I want to use.

In the python file, we are going to import BeautifulSoup and other functions like ‘urlopen’, ‘Request’ that we are going to use for webscraping.

We are going to use the Request function to set the ‘User-Agent’ because some websites won’t allow you to crawl them with the default agent of python.

We use the urlopen function to open the url and the read function to read the html of the file. Then you can use a load of functions provided by BeautifulSoup to get the data you need from the page. To get all the methods, you can just type “dir(BeautifulSoup)” in the python console and it will give you a list of methods you can use.

It will give you a list of all the h2 tags in the webpage. To get a tag with a specific class, input:

Thank you very much, feel free to comment, this is my first article and I welcome contributions and suggestions, thanks.

Add a comment

Related posts:

Bitcoin trading forum

If you are looking for information, a new community or something to say about crypto-currencies, you will be able to find your happiness through this article. Whether you are interested in ICOs…

My Tips For Some Good Night Sleep

It is 2AM and I wake up to the sound of my son’s cheers. He was on his phone playing a game with his friends, and he won. I don’t think he wanted anyone to know he is still awake nor wake anybody…

the lost art of doing nothing

We are in the era of movers and shakers. We are perpetually hustling, multi-tasking and overworking based on the perception that being busy can be equated to success. So we work the late hours…