I’d suggest making a separate module which does the scraping:
scrape.py
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from bs4 import NavigableString
def get_deaths():
page_url = 'https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html'
print ('Hitting URL: ', page_url)
uClient = uReq(page_url)
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
containers = page_soup.findAll("div",{"class":"callout"})
Deaths = containers[0].span.text
return Deaths
if __name__ == '__main__':
deaths = get_deaths()
print (deaths)
You can now run this manually with python scrape.py
or import the function to another module which is your flask app. Of course this is only pulls a single figure. But if you did want to render this in a table you could do:
app.py
from flask import Flask, render_template
from scrape import get_deaths
app = Flask(__name__)
@app.route('/')
def index():
# Create a list of column headers
headers = ['Deaths']
# Now the data, a list where each item is a list containing the row values
objects = [[ get_deaths() ]]
return render_template('index.html', headers=headers, objects=objects)
Which can then be rendered with the following template:
templates/index.html
{% if objects %}
<table id='result' class='display'>
<thead>
<tr>
{% for header in headers %}
<th>{{header}}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for object in objects %}
<tr>
{% for item in object %}
<td>{{item}}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
{% endif %}
Notice that this avoids hard coding column names into the template file, provided you pass the headers
list yourself from Python.
Run this with flask run
and connect to http://localhost:5000
in your browser.
You may also wish to have a look at my repo search-generic-tables which includes the JS library Datatables in the template to add things like search and pagination on the fly.
Be aware that the above code re-scrapes the site every time someone hits the page. There are a few ways around this. One option is to use flask_caching so run pip install flask_caching
and then decorate the index
view function to define a timeout.
# ...
app = Flask(__name__)
from flask_caching import Cache
cache = Cache(config={'CACHE_TYPE': 'simple'})
cache.init_app(app)
@app.route('/')
@cache.cached(timeout=50)
def index():
# Rest of code.
Now the API result will be cached for 50 seconds.