In this tutorial, we’ll walk through the process of creating a Django web application for scraping data from a given URL. We’ll use Python, Django, and BeautifulSoup to build a simple yet effective web scraper. The application will allow users to input a URL, scrape data from the provided link, and display information such as the title, paragraphs, and extracted URLs.
Prerequisites
Before we begin, ensure that you have the following installed:
- Python (https://www.python.org/)
- Django (https://www.djangoproject.com/)
- BeautifulSoup (https://www.crummy.com/software/BeautifulSoup/)
Installation
- Install Python: If you don’t have Python installed, download and install it from
- Install Django: Install Django using pip, the Python package manager.bashCopy code
pip install django
- Install BeautifulSoup: Install BeautifulSoup for HTML parsing.bashCopy code
pip install beautifulsoup4
Project Setup
Let’s start by setting up our Django project and app:
- Create a Django Project:
django-admin startproject url_scraper
cd url_scraper
- Create a Django App:
python manage.py startapp scraper_app
Writing the Code
scraper_app/forms.py
In the forms.
py
file inside the scraper_app
folder, we define a simple form for user input.
# scraper_app/
forms.py from django import forms class ScrapeForm(forms.Form):
link = forms.URLField(label='Enter URL', required=True)
scraper_app/views.py
In the views.py
file, we implement the logic for scraping data from the provided URL.
# scraper_app/views.py
from django.shortcuts import render
from .forms import ScrapeForm
from bs4 import BeautifulSoup
import requests
def scrape_data(request):
if request.method == 'POST':
form = ScrapeForm(request.POST)
if form.is_valid():
link = form.cleaned_data['link']
response = requests.get(link, verify=False)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
# Extract data from the HTML using BeautifulSoup
urls = [a['href'] for a in soup.find_all('a', href=True)]
# Remove empty and None URLs
urls = [url for url in urls if url]
scraped_data = {
'title': soup.title.text,
'paragraphs': [p.text for p in soup.find_all('p')],
'urls': urls,
'url_count': len(urls),
}
return render(request, 'scraped_data.html', {'data': scraped_data, 'form': form})
else:
return render(request, 'scraped_data.html', {'error': f'Error: {response.status_code}', 'form': form})
else:
form = ScrapeForm()
return render(request, 'scrape_form.html', {'form': form})
scraper_app/templates/scrape_form.html
In the templates
folder, create a file named scrape_form.html
for the input form.
<!-- scraper_app/templates/scrape_form.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Django URL Data Scraper</title>
</head>
<body>
<h1>Django URL Data Scraper</h1>
<form method="post" action="{% url 'scrape_data' %}">
{% csrf_token %}
{{ form }}
<button type="submit">Scrape Data</button>
</form>
</body>
</html>
scraper_app/templates/scraped_data.html
Create another file named scraped_data.html
in the templates
folder for displaying the scraped data.
<!-- scraper_app/templates/scraped_data.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Scraped Data</title>
<!-- Adding a playful and vibrant style -->
<style>
body {
font-family: 'Comic Sans MS', cursive, sans-serif;
background-color: #ffe6e6;
color: #333;
margin: 20px;
text-align: center;
}
header {
background-color: #ff4500;
padding: 10px;
margin-bottom: 20px;
}
h1 {
color: #fff;
margin-bottom: 20px;
}
h2 {
color: #1e90ff;
border-bottom: 2px solid #1e90ff;
padding-bottom: 10px;
margin-top: 20px;
}
ul {
list-style-type: none;
padding: 0;
}
li {
margin-bottom: 10px;
font-size: 18px;
}
a {
color: #4caf50;
text-decoration: none;
font-weight: bold;
}
form {
margin-top: 20px;
display: flex;
justify-content: center;
}
input {
padding: 10px;
font-size: 16px;
}
button {
padding: 10px 20px;
background-color: #ff4500;
color: #fff;
border: none;
cursor: pointer;
border-radius: 5px;
font-size: 16px;
}
</style>
</head>
<body>
<header>
<h1>{{ data.title }}</h1>
<form method="post" action="{% url 'scrape_data' %}">
{% csrf_token %}
{{ form }}
<button type="submit">Scrape Again</button>
</form>
</header>
{% if data.paragraphs %}
<h2>Paragraphs:</h2>
<ul>
{% for paragraph in data.paragraphs %}
<li>{{ paragraph }}</li>
{% endfor %}
</ul>
{% endif %}
{% if data.urls %}
<h2>Extracted URLs ({{ data.url_count }} found):</h2>
<ul>
{% for url in data.urls %}
<li><a href="{{ url }}" target="_blank">{{ url }}</a></li>
{% endfor %}
</ul>
{% endif %}
</body>
</html>
url_scraper/urls.py
Finally, configure the project’s URLs in the urls.py
file.
# url_scraper/urls.py
from django.contrib import admin
from django.urls import path
from scraper_app.views import scrape_data
urlpatterns = [
path('admin/', admin.site.urls),
path('', scrape_data, name='scrape_data'),
]
Running the Application
- Run the development server:bashCopy code
python manage.py runserver
- Access the application at
http://127.0.0.1:8000
.
This tutorial guides you through the process of building a Django URL Data Scraper, providing a foundation for web scraping projects. Customize and expand upon this project based on your specific requirements
“Explore the Django URL Data Scraper tutorial and enhance your web scraping skills. Visit our blog for detailed steps and subscribe to our YouTube channel for more coding insights. Happy coding!”
. Happy coding!