Solution 1 :

The status code 511 means you are required to authenticate to access more data.

There’s probably a data limit set by the company to block unauthorized scraping. Do read their terms and conditions of use.

Problem :

I’m trying to make a program that opens Firefox using Selenium, Gets the HAR files using BrowserMobProxy, Getting the link inside the files which leads you to a JSON page. The program scrapes the HAR files and scrapes the JSON data every 5 seconds. The problem is, that sometimes when I try to scrape it I’ll get a 511 Error –

<!DOCTYPE html><html><head><title>Apache Tomcat/8.0.32 (Ubuntu) - Error report</title><style type="text/css">H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}</style> </head><body><h1>**H**TTP Status 511 - something went wrong with your request**** (2)</h1><div class="line"></div><p><b>type</b> Status report</p><p><b>message</b> <u>**something went wrong with your request** (2)</u></p><p><b>description</b> <u>**T**he client needs to authenticate to gain network access.****</u></p><hr class="line"><h3>Apache Tomcat/8.0.32 (Ubuntu)</h3></body></html>

Notice the 511 Erorr code T**he client needs to authenticate to gain network access.

And sometimes it does succeed and returns the wanted dictionary-

{"alerts":[{"country":"IL","nThumbsUp":2,"city"... 

Why is that?

It may be important to say that the JSON page has about 1-3 seconds expiration time but I measured the until the program fetches the data and it’s about 0.0000385 seconds, so it doesn’t seem to be the problem.

My theories, for now, are that because the program scrapes the data every x seconds, the connection cuts out, but I guess it would’ve thrown a gigantic error, my second debunked theory was it was just a rate-limiting problem, so I used time.sleep() and paused it for 3 seconds, still without success.

It would be a great help if you come up a way to fix it, or point me to an error in my code.

The code (It’s a little dirty now but I still didn’t get a chance to fix it)

import os
import json
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from browsermobproxy import Server
import schedule
import requests
import timeit
from fake_useragent import UserAgent

i = 1

print("""Waze Police Scraper

Waze Police Scraper will open the Mozilla Firefox browser, onto Waze's live map website.
It'll scrape all the police's lcoations from your preferred lcoation, including police traps that are voted by Waze's users.
Near every cop location it'll show the Number of upvotes, and downvotes and from that the probability of the users's report being true.

Instructions:


""")


def personalised_info():
    auto_or_manual = input("Do you want the software to scrape the waze map manually or Automatically? ")

    sec = input("Every how much seconds do you want the scraper to scrape the data? maximum is 30 seconds, while minimum is 5 seconds. By not inputting anything the value will be set to the recommended value of 5 seconds. ")
    return auto_or_manual, sec

def start_server():
    global server, proxy, driver
    server = Server("C:\Users\Yahav\Downloads\browsermob-proxy-2.1.4-bin\browsermob-proxy-2.1.4\bin\browsermob-proxy")

    useragent = UserAgent()
    useragent.update()

    server.start()
    proxy = server.create_proxy()

    #proxy.wait_for_traffic_to_stop(6000, 9000)


    profile = webdriver.FirefoxProfile()
    profile.set_proxy(proxy.selenium_proxy())
    profile.set_preference("general.useragent.override", useragent.random)
    driver = webdriver.Firefox(executable_path = "C:\Users\Yahav\Downloads\geckodriver-v0.26.0-win64\geckodriver.exe", firefox_profile=profile)
    # Navigate to the application home page
    driver.get("https://www.waze.com/livemap?utm_source=waze_website&utm_campaign=waze_website")

urls = []
t = 1
data_parsed = {}
inner_nested_data_parsed = {}
data_list = []

def get_data(urls, t, data_parsed, inner_nested_data_parsed):
    start = timeit.timeit()  # Measure time
    global i
    #tag the har(network logs) with a name
    har = proxy.new_har("waze_{0}.format(i)")

    # Finding the URL requests where the data is stored in JSON format
    har = str(har)
    str_1 = "https://www.waze.com/il-rtserver/web/TGeoRSS?"
    str_2 = "&types=alerts%2Ctraffic%2Cusers"

    indx_1 = har.find(str_1)
    indx_2 = har.find(str_2)

    url = har[indx_1:indx_2]

    url = url + str_2

    urls.append(url)

    print(urls)

    for d in urls:
        if d == str_2:
            data = {}  # To overcome the 'refrenced before assignment' error
            print("what")
        if d != str_2:
            data_request = requests.get(url)
            data = data_request.text  #Getting data
            end = timeit.timeit()  # Measure time
            data_list.append(data)
            print(type(data), url)
            print(end - start)  #Time to get data

    if url == "&types=alerts%2Ctraffic%2Cusers":  # If user not moving than 'url' will be equal to the string
        print("Move your cursor to your preferred location.")
    else:
        if type(data) is dict:
            for x in range(len(data["alerts"])):
                if (data["alerts"][x]["type"]) == "POLICE":
                    inner_nested_data_parsed["type"] = (data["alerts"][x]["type"])
                    if data["alerts"][x]["subtype"] != "":  # Not working for some reason
                        inner_nested_data_parsed["subtype"] = (data["alerts"][x]["subtype"])
                    else:
                        True # Just to fill the space
                    inner_nested_data_parsed["country"] = (data["alerts"][x]["country"])
                    inner_nested_data_parsed["nThumbsUp"] = (data["alerts"][x]["nThumbsUp"])
                    inner_nested_data_parsed["confidence"] = (data["alerts"][x]["confidence"])
                    inner_nested_data_parsed["reliability"] = (data["alerts"][x]["reliability"])
                    inner_nested_data_parsed["speed"] = (data["alerts"][x]["speed"])
                    inner_nested_data_parsed["location_x"] = (data["alerts"][x]["location"]["x"])
                    inner_nested_data_parsed["location_y"] = (data["alerts"][x]["location"]["y"])

                    data_parsed[t] = inner_nested_data_parsed

                    t += 1
                    inner_nested_data_parsed = {}  # resets the dictionary so the elements in the list "alerts" won't be added to the same value of "t" in the dictionary "data_parsed"
                else:
                    continue
        else:
            print("fuck", type(data))

    print(data)
    """ # Logs to file
    path_log_file = "demofile3.txt"
    if os.path.exists(path_log_file):  #Checks if file exists
        f = open(path_log_file, "w")
        print(data)
        f.write(str(data))
        f.flush()
        f.close()

    else:
        f = open(path_log_file, "x")
        f = open(path_log_file, "w")
        f.write(str(data))
        f.flush()
        f.close()
    """

    server.stop()
    # close the browser window
    #driver.quit()
    i += 1
    return i

print(data_parsed)

auto_or_manual, sec = personalised_info()

if auto_or_manual == "A":
    if not sec:  # Default
        sec = 10
        print(True, 4)
        start_server() 
        schedule.every(sec).seconds.do(get_data, urls, t, data_parsed, inner_nested_data_parsed)
    if sec.isdigit() == True:  # If input is digit
        print(True, 4)
        start_server() 
        schedule.every(int(sec)).seconds.do(get_data, urls, t, data_parsed, inner_nested_data_parsed)
    else:
        print("Please enter a valid number.")
        personalised_info()

else:
    print(None)
    #Manual
#proxy.new_har("waze")


#driver.get("about:preferences#privacy")

while True:  # User defined 
    schedule.run_pending()

By