Web Scraping Using Python
What is Web Scraping?
Web scraping is an automated method using which we can extract large amounts of data from websites. Web scraping helps collect unstructured data of websites and store it in a structured manner.
How you can Extract the data from a website?
You can follow the following steps to extract the data from any website.
- Find the URL that you want to scrape
- Inspecting the Page
- Find the data you want to extract
- Write the code
- Run the code and extract the data
- Store the data in the required format
So, first you need to find the Url that you want to scrape, inspect it, and find which data and division you want to extract.
I used the following libraries for data scraping.
- Selenium for chrome driver
- Beautiful Soup for data scraping
- Pandas for data manipulation
For this Practical, I scrape the data from the Flipkart website.
following is the code that I wrote and execute for scraping the data.
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("C:/Users/HP/Downloads/chromedriver")
products=[]
prices=[]
ratings=[]
spacification = [] driver.get("https://www.flipkart.com/laptops/pr?sid=6bo%2Cb5g&marketplace=FLIPKART&p%5B%5D=facets.price_range.from%3D40000&p%5B%5D=facets.price_range.to%3DMax")content = driver.page_source
soup = BeautifulSoup(content)for element in soup.findAll('div', attrs={'class':'_1AtVbE col-12-12'}):
name=element.find('div', attrs={'class':'_4rR01T'})
price=element.find('div', attrs={'class':'_30jeq3 _1_WHN1'})
rating=element.find('div', attrs={'class':'_3LWZlK'})
spacifications = element.find('div',attrs={'class':'fMghEO'})
try:
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
spacification.append(spacifications.text)
except:
continuedf=pd.DataFrame({'ProductName':products,'Price':prices,'Rating':ratings,'Spacifactions':spacification})
df.to_csv('products.csv', index=False, encoding='utf-8')
After executing this code I get the results like below.
We can store this data in a structured manner. we can store this data in a CSV file and we can use that data efficiently.