Solution 1 :

You can store the base url of website in a variable and then once you get the href from link you can join them both to create the next url.

import requests
from bs4 import BeautifulSoup

base_url = "http://www.harness.org.au"

webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab')

soup = BeautifulSoup(webpage_response.content, "html.parser")

# only finding one track
# soup.table to find all links for days racing
harness_table = soup.table
# scraps a href that is an incomplete URL that im trying to get to
for link in soup.select(".meetingText > a"):
    webpage = requests.get(base_url + link["href"])

    new_soup = BeautifulSoup(webpage.content, "html.parser")

    # work through table to get links to tracks
    print(new_soup)

Solution 2 :

Try this solution. Maybe you’ll like this library.

from simplified_scrapy import SimplifiedDoc,req
url = 'http://www.harness.org.au/racing/results/?activeTab=tab'

html = req.get(url)
doc = SimplifiedDoc(html)
links = [doc.absoluteUrl(url,ele.a['href']) for ele in doc.selects('td.meetingText')]
print(links)

Result:

['http://www.harness.org.au/racing/fields/race-fields/?mc=BA040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=BH040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=RE040320']

Problem :

I’m trying to loop over a href and get the URL. I’ve managed to extrat the href but i need the full url to get into this link. This is my code at the minute

 import requests

 from bs4 import BeautifulSoup



 webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab')



 webpage_response.content

 webpage_response = requests.get


 soup = BeautifulSoup(webpage, "html.parser")


 #only finding one track
 #soup.table to find all links for days racing
 harness_table = soup.table
 #scraps a href that is an incomplete URL that im trying to get to
  for link in soup.select(".meetingText > a"):
     link.insert(0, "http://www.harness.org.au")

     webpage = requests.get(link)
     new_soup = BeautifulSoup(webpage.content, "html.parser")

    #work through table to get links to tracks
     print(new_soup)'''

Comments

Comment posted by Brad Langtry

Perfect! didn’t even think of this solution! Thank you

By