Solution 1 :

you can create the soup, useful_text and once the soup has been created, as documentation will explain, you can go up and down the tree by selecting tags, in this case I selected the td tag and if it has multiple strings I will use the get_text() function.

from bs4 import BeautifulSoup, SoupStrainer

html = '''<tr>
    <img src='imgsrc' alt='*'>

useful_text = BeautifulSoup(html)

'n    usefultextn    n'

If you have multiple td tags then you will want to use the next_siblings function.

I highly recommend reading the documentation and playing around.

Problem :

Im using beautifulsoup to do some webscraping and want to know the best way to filter out the img tags from any table entries i scrape so the result of filtering the td.text attribute in this code fragment would return only usefultext

    <img src='imgsrc' alt='*'>


Comment posted by doc

Have you seen the

Comment posted by Andrej Kesely

when you do


Leave a Reply

Your email address will not be published. Required fields are marked *