You can use BeautifulSoup4 (bs4), which is a a Python library for pulling data out of HTML and XML files, with a combination of Regular Expressions (RegEx). In this case I used the python re library for the RegEx purposes.
Something like this is what you want (source):

In the example above soup.find_all(class_=re.compile("itle"))
returns all instances where the word “itle” is found in the class tag, such as class = "title"
from the html document shown below.

For your RegEx it would look something like "arrowTo*"
or even just "arrowTo"
. soup.find_all(class_=re.compile("arrowTo"))
.
Your final code should look something like:
from bs4 import BeautifulSoup
#i think result was your html document from requests library
#the first parameter is your html document variable
soup = BeautifulSoup(result, 'html.parser')
myArrowToList = soup.find_all(class_=re.compile("arrowTo"))
If you wanted "arrowToStrongBuy"
just use that in the regex input to the find_all
function.
soup.find_all(class_=re.compile("arrowToStrongBuy"))
I’m trying to scrape data in a widget using python and the requests-html library.
The the value I want is in a gauge with an arrow pointing to five possible results.
Each label on the gauge is the same on all pages of the website. The problem I face is I cannot use a css selector on the gauge labels to extract the text, I need to extract the value of the arrow itself as it will be pointing to a label. The arrow doesn’t have a text attribute so if I use a css selector I get none as a response.
Each arrow has a unique class name.
<div class="arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo arrowStrongBuyShudder-3xsGK8k5">
https://www.tradingview.com/symbols/NASDAQ-MDB/
StrongBuy:
<div class="arrow-F-uE7IX8 arrowToBuy-1R7d8UMJ arrowBuyShudder-3GMCnG5u">
https://www.tradingview.com/symbols/NYSE-XOM/
Buy:
<div class="arrow-F-uE7IX8 arrowToStrongSell-3UWimXJs arrowStrongSellShudder-2UJhm0_C">
https://www.tradingview.com/symbols/NASDAQ-IDEX/
StrongSell:
What can I do to ensure I get the correct value? I’m not sure how I can check if the selector contains the arrowTo{foo} and store as variable.
import pyppdf.patch_pyppeteer
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
async def get_page():
code = 'NASDAQ-MDB'
r = await asession.get(f'https://www.tradingview.com/symbols/{code}/')
await r.html.arender(wait=3)
return r
results = asession.run(get_page)
for result in results:
arrow_class_placeholder = "//div[contains(@class,'arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo')]//div[1]"
arrow_class_name = result.html.xpath(arrow_class_placeholder,first=True)
if arrow_class_name == "//div[contains(@class,'arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo')]//div[1]":
print('StrongBuy')
else:
print('not strong buy')
I’ve just realised that the xpath’s aren’t being found in the response Results so requests-html isn’t rendering properly.