Solution 1 :

I had a space before and after the links in my dataframe column :/. That was it. The code works fine. just an oversight on my part. Thanks all

Problem :

This shouldn’t be too hard, although I can’t figure it, i’m betting i’m making a dumb mistake.

Here’s the code that works on an individual link and returns the zestimate (the req_headers variable prevents throwing a captcha):

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(results)

Here’s the code i’m trying to get to work and return the zestimate for each link and add to a new dataframe column, but I get AttributeError: 'NoneType' object has no attribute 'find_next' (Also, imagine i have a dataframe column of different zillow house links):

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

for link in df['links']:
    test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
    results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
    df['zestimate'] = results

Any help is appreciated.

Comments

Comment posted by Jack Fleeting

Are you sure the links sent to BS are valid? Try

Comment posted by max

@JackFleeting yep, good idea but it is printing out the links from within the dataframe, and those links are valid. you can click them from within jupyter and they work.

Comment posted by Jack Fleeting

In that case, it probably means that (at least) one of the urls’ soup doesn’t have a

Comment posted by HArdRe537

From your error looks like the results = test_soup.select_one(‘h4:contains(“Home value”)’) is returning None type. So first check if results exists, if so then go for the next operation

By