One way would be to use CSS selectors and the select
function:
actual_items = soup.select('div.content > div.items > div.item.watching')
for item in actual_items:
print(item.prettify())
One way would be to use CSS selectors and the select
function:
actual_items = soup.select('div.content > div.items > div.item.watching')
for item in actual_items:
print(item.prettify())
I am not a beautiful soup expert, but I had a similar problem where using find_all
and then creating a smaller variable did help me to visualize the information.
df=pd.DataFrame()
for i in soup:
class_items = i.find_all("div", class_="item_watching")
for x in class_items:
df = df.append({'Actual Items': x.text.strip()}, ignore_index=True)
I want to scrape info from each of the "item watching"
in the "items"
class. I’m stuck because when I try find
it only finds the HTML for the first “item watching” but I don’t want to use find_all
because it gives a massive blob that I can’t prettify and it would make it more difficult to cycle through the information.
soup = BeautifulSoup(res.text, "html.parser") # SOUP
class_items = soup.find("div", attrs={"data-name":"watching"}).find("div", class_="items") # Narrowed Down
actual_items = class_items.find("div", class_="item watching") # Was thinking [x] so I can cycle?
The whole shabang:
import requests
from bs4 import BeautifulSoup
payload = {"username":"?????", "password":"?????"}
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
with requests.Session() as s:
res = s.post(loginurl, data=payload)
res = s.get(url)
soup = BeautifulSoup(res.text, "html.parser")
class_items = soup.find("div", attrs={"data-name":"watching"}).find("div", class_="items")
actual_items = class_items.find_next("div", class_="item watching")
print(actual_items.prettify())
site url: https://9anime.to/
login url: https://9anime.to/user/ajax/login
Expected output for each “item watching” (Similar format for each):
<div class="item watching">
<a class="thumb" href="/watch/kaguya-sama-love-is-war-season-2.omkj?ep=7">
<img alt="Kaguya-sama: Love is War Season 2" src="https://static.akacdn.ru/files/images/2019/10/f53e6536aa7b3b95e6fe4c6d7b8e1a9b.jpg"/>
</a>
<a class="link" data-jtitle="Kaguya-sama wa Kokurasetai?: Tensai-tachi no Renai Zunousen" data-tip="/ajax/film/tooltip/omkj?v=5dab1c5b" href="/watch/kaguya-sama-love-is-war-season-2.omkj?ep=7">
Kaguya-sama: Love is War Season 2
</a>
<span class="current">
7
</span>
<div class="info">
<span class="state old tip" data-id="omkj" data-unwatched="Unwatched" data-value="0" data-watched="Watched" title="Click to change">
Watched
</span>
<span class="status">
7/12
</span>
<span class="dropdown userbookmark" data-id="omkj">
<i class="icon icon-pencil-square" data-toggle="dropdown">
</i>
<ul class="dropdown-menu bookmark choices pull-right" data-id="omkj">
<li data-value="watching">
<a>
<i class="fa fa-eye">
</i>
Watching
</a>
</li>
<li data-value="watched">
<a>
<i class="fa fa-check">
</i>
Completed
</a>
</li>
<li data-value="onhold">
<a>
<i class="fa fa-hand-grab-o">
</i>
On-Hold
</a>
</li>
<li data-value="dropped">
<a>
<i class="fa fa-eye-slash">
</i>
Drop
</a>
</li>
<li data-value="planned">
<a>
<i class="fa fa-bookmark">
</i>
Plan to watch
</a>
</li>
<li class="divider" role="separator">
</li>
<li data-value="remove">
<a>
<i class="fa fa-remove">
</i>
Remove entry
</a>
</li>
</ul>
</span>
</div>
<div class="clearfix">
</div>
</div>
Can you share the URL and expected output?
You probably want to use
@AndrejKesely updated my post for you.
@EvanSchwartzentruber as I don’t have login to that site, I’m just guessing:
I’ve looked into CSS selectors and this looks great but I was wondering if there was any way to store each
If I understand you correctly, the
Ahh yes you’re correct! this is exactly what I wanted thank you!