Solution 1 :

You seem to have the wrong xpath. From looking at the source document, the value you are looking for is contained in the content attribute of meta tags which have the itemprop attribute "ratingValue".

The following is a working reproducible example using your question’s url:

read_html(url) %>%
  html_nodes(xpath = "//meta[contains(@itemprop, 'ratingValue')]") %>%
  html_attr("content") %>%
  unique()
#> [1] "4.0"

Problem :

I have never worked with HTML or CSS but I know R, so I have looked at several scraping methods online and here in Stack to do it in R. I am keep having issues extracting company ratings from the job listing page. I get character(0) where as in the example url the company has 4.0 rating.

Here is my attempt:

library(rvest)
library(tidyverse)
library(xml2)

#example URL
url<- "https://www.indeed.com/viewjob?jk=a25a91736b1f7042&tk=1e3q54n49heai800&from=serp&vjs=3&advn=8876452989351355&adid=95236293&sjdu=TDSJNe66qIM3gcXFOG94m--bPylNW2vvO3WAHEKN7JhCAD1FQ-2FXD1gQyElsLNkg6gfXO2CD3rQYOYjO9iXITyFdYOp8tCECkHuDmf3Og8qdMmciGFIv2ahigETjLmuY8uXdLjnQTg4__yOXqHJkA"

page<- read_html(url)


page%>%
   rvest::html_nodes("span")  %>%
   rvest::html_nodes(xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "ratingsContent", " " ))]')%>%
   rvest::html_text()

#Output is 
#character(0)
#Where as it should have been 4.0!

Could anyone please show me how to get it, and if the company is missing the rating how do I return NA? Thank you!

Comments

Comment posted by WolfgangBagdanow

I used an extension on chrome to get the xpath, other were pretty accurate, don’t know how this went wrong. But thank you Allan. Really appreciate your answer!

By