Yes, you can harvest (“scrape”) data from web pages like that. Here’s a crude roadmap.
Normally you’d get the page — retrieve from the web server a string with the HTML of that web page — using a tool like LWP::UserAgent
or Mojo::UserAgent,
and then parse the HTML to extract data of interest, using a library like Mojo::DOM
or HTML::TreeBuilder
There are many posts around here for use of these tools (and for yet other tools). Here is a rounded example with Mojo::DOM in a Perl.com article.
If that web page uses JavaScript for displaying data of interest to you then that’s a different game. It means that the HTML downloaded from the server to your browser also contains JavaScript code — programs — which can run right in the browser. They get triggered when you click on (or hover, etc) elements of a page and rework the page without having to go back to the server.
This is a very (over-)simplified explanation, but the point is that the libraries need to understand the JavaScript in order to hand you that last page for parsing, otherwise you’d only get HTML that last came from the server. But the main libraries linked above don’t know any JavaScript; they just go to the server with HTTP and hand you what the server returns.
For a tool that understands JavaScript I’d recommend Selenium, meant for testing webpages but perfectly suitable for this job as well, itself written with JavaScript. One way to use it in Perl is with Selenium::Chrome
(or ::Firefox), and Selenium::Remote::Driver.
use feature qw(say);
use strict;
use warnings;
use LWP::UserAgent;
use Mojo::DOM;
my $ua = LWP::UserAgent->new();
my $url = 'https://www.ogimet.com/display_sond.php?' .
'lang=en&lugar=63741&tipo=ALL&ord=DIR&nil=SI&fmt=html' .
'&ano=2021&mes=09&day=02&hora=19&anof=2021&mesf=09&dayf=03&horaf=19&send=send';
my $res = $ua->get( $url );
if (!$res->is_success) {
die $res->status_line;
}
my $html = $res->content;
my $dom = Mojo::DOM->new($html);
my @tables_raw_txt = $dom->find('table')->map('all_text')->each;
say $tables_raw_txt[1];
say "--------------- TABLE DATA --------------n";
say $tables_raw_txt[2];
OGIMET website supports TXT format of data. It is a matter of regex usage to extract desired data.
If Mojo::DOM module is not available the data can be extracted in this way.
Perl script utilizes hash %params representing location and dates specified in OP’s question. By adding module Getops::Long each parameter can be tuned individually on command line.
Note: the script accepts one parameter station id
use strict;
use warnings;
use feature 'say';
use LWP::UserAgent;
my $station = shift;
my $url = 'https://www.ogimet.com/display_sond.php?';
my @params;
my %params = (
'lang' => 'en', // language
'lugar' => 63741, // station
'tipo' => 'ALL', // report type
'ord' => 'DIR', // sort order
'nil' => 'SI', // null report - SI (Yes)
'fmt' => 'txt', // format
'ano' => 2021, // year
'mes' => 9, // month
'day' => 2, // day
'hora' => 19, // hour
'anof' => 2021, // year
'mesf' => 9, // month
'dayf' => 3, // day
'horaf' => 19, // hour
'send' => 'send'
);
$params{lugar} = $station if defined $station;
while( my($k,$v) = each %params ) {
push @params, "$k=$v";
}
$url .= join('&',@params);
my $ua = LWP::UserAgent->new();
my $res = $ua->get( $url );
if (!$res->is_success) {
die $res->status_line;
}
my($data) = $res->content =~ m!<pre>(.*?)</pre>!gs;
say $data;
There is the ogimet webpage for weather data. I believe this data is free for use. The webpage provides a script for requesting surface observations data which is as follows
I am able to use this data for my area. In addition to this I would like to access upper air observation data. No script has been provided for accessing this data. It can be accessed manually, on e.g. for one station, with the following link
This gives me text content as in the screenshot figure attached.
I am wondering if it is possible to copy the text, once I arrive at this page, using, e.g. a Perl script. Unfortunately i do not have any minimum working example which I could try.
Comments
Comment posted by Ronaldo Ferreira de Lima
Contact the webmaster and talk about your purposes. Anyway, you will need to learn about Web Scraping / Web Spider, best practices and ethics.
Comment posted by Web Scraping in Perl using Mojo::DOM
See also
Comment posted by Mojo Web Clients
There’s also my book
Comment posted by Zilore Mumba
@Hægland sorry my belated reply. This works perfectly, exactly what I was looking for. Thanks for your effort and time and sharing.
Comment posted by brian d foy
Constructs such as
Comment posted by Polar Bear
@briandfoy – I assume that there will only one
Comment posted by brian d foy
Why assume when you can do it the right way the first time?
Comment posted by Zilore Mumba
This code works perfectly @Polar Bear. Thank you very much. I have up voted it. Once more my apologies for very late reaction and acknowledgement.