Solution 1 :

Use the selenium package, you will need to install chromedriver.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

URL = 'https://www.bvca.co.uk/Member-Directory'

BrowserOptions = Options()
BrowserOptions.add_argument("--headless")
Browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=BrowserOptions)
Browser.get(URL)
while True:
    if Browser.find_elements_by_class_name('companyName'):
        break
    
html_source_code = Browser.execute_script("return document.body.innerHTML;")

soup = BeautifulSoup(html_source_code, 'html.parser')

x = [r.text for r in soup.find_all('h5',class_='companyName')]
print(x)

>>> ['01 Ventures', '01 Ventures', '17Capital LLP', '17Capital LLP', '1818 Venture Capital', ..., 'Zouk Capital LLP', 'Zouk Capital LLP']

The while loop waits until the company names are loaded before the html code is saved

The output was too large to put into the answer, so I could only show some of it.

Problem :

I’m trying to get a list of company names (e.g. 01Venture) and types (e.g. GENERAL PATERNER) from this website https://www.bvca.co.uk/Member-Directory. I’m using the code below:

import requests
from bs4 import BeautifulSoup
URL = 'https://www.bvca.co.uk/Member-Directory'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

table = soup.find('table', attrs={'id':'searchresults'})
table_body = table.find('tbody')
rows = table_body.find_all('tr')

print(rows)

And I got an empty list.

Comments

Comment posted by QHarr

Is loaded dynamically. Use network tab to see where data is really coming from (additional xhr) or use selenium

Comment posted by coderoftheday

you may have to use selenium to get the source code.

By