This is not going to work as the entire page is created with JavaScript. The body of the document just contains a single script tag. Open up the page source or look at the raw response instead of just looking at the rendered DOM in the web inspector/developer tools.
Nokogiri is just a HTML parser and not a browser and thus does not run JavaScript. While you could use a headless browser like phantom.js you might just want to look for an API that provides the data you want instead. A web scraper is usually the wrong answer to any question.
Solution 2 :
I found a more interesting solution )) for example: link_driver = Nokogiri::HTML(page.source).at('a:contains("mac")').values.join('') chromedriver_storage_page = 'https://chromedriver.storage.googleapis.com/' File.new('filename.zip', 'w') << URI.parse(chromedriver_storage+link).read
contains(“mac”) can change contains(“linux”) or contains(“win”), it does not matter, choose any version of the operating system
And 2 solution – to parse the page chromedriver.chromium.org and to get information about all versions. If the version on the site is more recent than mine, then I substitute the version number in a new line for downloading chromedriver_storage = 'https://chromedriver.storage.googleapis.com/' chromedriver = '79.0.3945.36/' – get using Capybara and cut only the version zip = 'chromedriver_mac64.zip' link = chromedriver_storage+chromedriver+zip File.new('filename.zip', 'w') << URI.parse(link).read
it turns out that the parser, in headless mode, can be inserted into the crontab task to update the version of the current browser
#(Document:0x3fcdda1b988c {
name = "document",
children = [
#(DTD:0x3fcdda1b5b24 { name = "html" }),
#(Element:0x3fcdda1b46fc {
name = "html",
children = [
#(Element:0x3fcdda1b0804 {
name = "body",
children = [
#(Element:0x3fcdda1ac920 {
name = "p",
children = [ #(Text "https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/")]
})]
})]
})]
})
puts links.to_html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/</p></body></html>
=> nil
Comments
Comment posted by MCVE
When asking for help debugging we need the minimal code and input data that demonstrate the problem and your required output. Anything beyond the minimal wastes our time helping you which wastes your time. See “
Comment posted by the Tin Man
Use
Comment posted by Vitalii
@the Tin Man And what is unclear in the question ? A simple question – why can’t Nokogiri parse the name of this page The simple answer – is that Nokogiri doesn’t parse the page if it needs js That’s all I needed to figure out what the problem is. I have already found a few options for not depending on chromedriver and being able to download the updated version using ruby, without curl. Why do I need curl if I can automate a script that checks the current version of the driver in the system, with the latest version on the site and replaces it ? I don’t understand your displeasure…
Comment posted by the Tin Man
Before you write any code, you should use one of those tools to look at the page to determine what it’s doing, or, at a minimum, turn off JavaScript in the browser and see what page elements do
Comment posted by the Tin Man
Also, when asking about a problem like this with web scraping, we need the minimal code and input data to test the problem
Comment posted by Vitalii
What can you recommend for parsing js in ruby ?
Comment posted by max
Its not a matter of parsing JS. You need an actual browser that has a DOM and runs javascript. You can automate browsers with capybara. But as I said in the answer this is probally a stupid idea and as you can probally get the data through an API that will give you JSON instead. Web scraping is really fragile and this will just break over time.
Comment posted by max
I don’t see how that would change anything unless you’re actually opening the page in your browser and copying the rendered HTML after that js runs. Give it up. It aint gonna work.