Solution 1 :

you can create the soup, useful_text and once the soup has been created, as documentation will explain, you can go up and down the tree by selecting tags, in this case I selected the td tag and if it has multiple strings I will use the get_text() function.

from bs4 import BeautifulSoup, SoupStrainer

html = '''<tr>
  <td>
    usefultext
    <img src='imgsrc' alt='*'>
  </td>
</tr>'''

useful_text = BeautifulSoup(html)
useful_text.td.get_text()
[out]:

'n    usefultextn    n'

If you have multiple td tags then you will want to use the next_siblings function.

I highly recommend reading the documentation and playing around.

Problem :

Im using beautifulsoup to do some webscraping and want to know the best way to filter out the img tags from any table entries i scrape so the result of filtering the td.text attribute in this code fragment would return only usefultext

<tr>
  <td>
    usefultext
    <img src='imgsrc' alt='*'>
  </td>
</tr>

Comments

Comment posted by doc

Have you seen the

Comment posted by Andrej Kesely

when you do

By

Leave a Reply

Your email address will not be published. Required fields are marked *