Solution 1 :

Try this:

from pprint import pprint

import bs4
import requests

resp = requests.get("https://www.visitmonmouth.com/page.aspx?Id=5017")
assert resp.status_code == 200
soup = bs4.BeautifulSoup(resp.content, 'html.parser')

dates = []
links = []
for tag in soup.find('div', {'id': 'content'}).find_all('li'):
    date = str(tag.contents[0]).strip().replace(':', '').split(' ')[0]
    if date.count('/') == 2:  # should use regexp here.
        a = tag.find('a')
        if a is not None:
            href = a.attrs['href'].strip()
            if href.startswith('http'):  # same should use regexp here.
                print(date, href)
                dates.append(date)
                links.append(href)
my_dict = dict(zip(dates, links))
pprint(my_dict, indent=2)

The output will look like:

7/18/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2962
7/17/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2960
7/16/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2959
7/15/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2958
7/14/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2956
7/13/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2955
7/12/20 https://www.co.monmouth.nj.us/PressDetail.aspx?ID=2954

And the data will be be available in a dictionary called my_dict.

Problem :

I am new to python. I have a site that has a list of sites. I need to get the href based on the date in the span style tag. Then open that url so i can grab data off it. I have the scraper for the sub site.

How do you read the site, find the date then pull the html as a dictionary? I can get the date in one line the html list in the other.

url = "https://www.visitmonmouth.com/page.aspx?Id=5017"
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser') 

tags = soup('a')
title = soup.title
print(title)
#get all HTML links.
for daily in tags:
    print(daily.get('href',None))
    c_date = soup.find_all(string=re.compile('7/18/20:'))
print(c_date) 

Comments

Comment posted by orphyux

huge thanks! This is very helpful. I will look into the regex as you recommended.

Comment posted by Philippe Remy

I’ve updated my answer with the dictionary. I am happy you found it useful!

By

Leave a Reply

Your email address will not be published. Required fields are marked *