Solution 1 :

BeautifulSoup runs faster than page loading.
so you should use Selenium library and ChromeDriver.

here it is

Problem :

I am new to scraping and coding as well. So far I am able to scrape data using beautiful soup using below code:

sub_soup = BeautifulSoup(sub_page, 'html.parser')
content = sub_soup.find('div',class_='detail-view-content')

This works correct when tag and class are in format:

<div class="masthead-card masthead-hover">

But fail when format is with _ngcontent:

<span _ngcontent-ixr-c5="" class="btn-trailer-text">
<div _ngcontent-wak-c4="" class="col-md-6">

An example of _ngcontent webpage screenshot I am trying to scrape is below :
enter image description here

All I tried results in blank or ‘None’. What am I missing.


Comment posted by ggorlen

How are you accessing the page? Likely this content is dynamically injected with JS. You’ll probably have to hit the API by hand or use a webdriver.

Comment posted by Saurabh

@ggorlen I am using their site.xml and got the url from it. Thereafter using BS4 to scrap it.

Comment posted by Saurabh

@ggorlen You are correct, the html tags are changing every time I refresh. Now that’s another trouble.

Comment posted by…

Does this solve your problem :

Comment posted by QHarr

html or .xml? I think you mean html. Try

Comment posted by Ask Question

If you have a new question, please ask it by clicking the