Solution 1 :

You can use select() function to find tags by css selector.

tds = container.select('div > table > tbody > tr > td')
# or just select('td'), since there's no other td tag

print(tds[1].text)

The return value of select() function is the list of all HTML tags that matches the selector. The one you want to retrieve is second one, so using index 1, then get text of it.

Solution 2 :

Try this:

from bs4 import BeautifulSoup
import requests

url = "yourUrlHere"

pageRaw = requests.get(url).text
soup = BeautifulSoup(pageRaw , 'lxml')
print(soup.table)

In my code i use lxml library to parse the data. If you want to install pip install lxml… or just change into your libray in this part of the code:

soup = BeautifulSoup(pageRaw , 'lxml')

This code will return the first table ok?

Take care

Problem :

I am doing web scraping for a DS project, and i am using BeautifulSoup for that. But i am unable to extract the Duration from “tbody” tag in “table” class.
Following is the HTML code :

<div class="table-responsive">
    <table class="table">
        <thead>
            <tr>
                <th>Start Date</th>
                <th>Duration</th>
                <th>Stipend</th>
                <th>Posted On</th>
                <th>Apply By</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>
                    <div id="start-date-first">Immediately</div>
                </td>
                <td>1 Month</td>
                <td class="stipend_container_table_cell"> <i class="fa fa-inr"></i>
                1500 /month
                </td>
                <td>26 May'20</td>
                <td>23 Jun'20</td>
            </tr>
        </tbody>
    </table>
</div>

Note : for extracting ‘Immediately’ text, i use the following code :

x = container.find("div", {"class" : "table-responsive"})
x.table.tbody.tr.td.div.text

By

Leave a Reply

Your email address will not be published. Required fields are marked *