Christopher j Self

@christopherjsel Tasks: 34

🛠️ 1 tool 🙏 78 karma

Enthusiast

Joined: July 2024

Christopher j Self's tools

Ultimate AI

Generate flawless code in any language with pinpoint accuracy.

Coding

Open

349
23
4.0
77

Released 1y ago
100% Free
Share
Sure! Below is a basic Python script that demonstrates how to scrape data from a website and store it in a CSV file using the `requests` library for fetching web pages and `BeautifulSoup` from the `bs4` library for parsing HTML. Additionally, we'll use the `csv` module to create and write to a CSV file. ### Prerequisites You'll need to install the following libraries if you haven't done so already: ```bash pip install requests beautifulsoup4 ``` ### Example Script Here’s a simple example that scrapes quotes from a sample website called "http://quotes.toscrape.com". This website is designed for practicing web scraping. ```python import requests from bs4 import BeautifulSoup import csv # URL of the website to scrape url = 'http://quotes.toscrape.com' # Sending a GET request to the website response = requests.get(url) # Check if the request was successful if response.status_code == 200: # Parse the response content soup = BeautifulSoup(response.text, 'html.parser') # Open a CSV file for writing with open('quotes.csv', mode='w', newline='', encoding='utf-8') as file: writer = csv.writer(file) # Write header writer.writerow(['Quote', 'Author']) # Find all quote elements quotes = soup.find_all('div', class_='quote') # Loop through each quote and extract the text and author for quote in quotes: text = quote.find('span', class_='text').get_text() author = quote.find('small', class_='author').get_text() # Write the quote and author to the CSV file writer.writerow([text, author]) print("Quotes have been written to quotes.csv") else: print("Failed to retrieve the web page. Status code:", response.status_code) ``` ### Explanation 1. **Requests**: We use the `requests` library to send an HTTP GET request to the specified URL. 2. **BeautifulSoup**: This library parses the HTML content, allowing us to navigate and search the parse tree. 3. **CSV**: The `csv` module enables us to easily write data to a CSV file. ### How to Run the Script 1. Copy the above script into a Python file (e.g., `scraper.py`). 2. Make sure you have the `requests` and `beautifulsoup4` libraries installed. 3. Run the script using Python in your terminal or command prompt: ```bash python scraper.py ``` ### Important Notes - **Respecting Robots.txt**: Before scraping any website, always check its `robots.txt` file to see if scraping is allowed. - **Politeness**: To avoid overloading the server, consider adding delays between requests if scraping multiple pages. - **Data Handling**: Ensure that any data you collect complies with relevant laws and ethical standards. ### Further Reading - [Requests Documentation](https://docs.python-requests.org/en/master/) - [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - [Python CSV Documentation](https://docs.python.org/3/library/csv.html) This script is a straightforward introduction to web scraping in Python. Depending on your needs, you can expand this by adding features like pagination, error handling, and data cleaning. Happy scraping! 🚀

Search

Christopher j Self

Christopher j Self's tools

Help

People also viewed

Feedback and Incident Report

AI Options