Solution 1 :

You don’t have your scraping logic inside the for-loop, so you will get information only from the last value of x (your Line Id).

This script will get information from pages 1 to 2:

import requests
from bs4 import BeautifulSoup

url = '{}'

data = {}
for line_no in range(1, 3):
    soup = BeautifulSoup(requests.get(url.format(line_no)).content, 'html.parser')

    for val in'input[name="stop-naptan-id"][value]'):
        data.setdefault(line_no, []).append((val.find_next('a').contents[0].strip(), val['value']))

from pprint import pprint


{1: [('New Oxford Street', '490000235Z'),
     ('Museum Street', '490010131WB'),
     ('Kingsway / Holborn Station', '490000112M'),
     ('Aldwych / The Royal Courts Of Justice', '490019704K'),
     ('Aldwych / Somerset House', '490003193S'),
     ('Waterloo Bridge / South Bank', '490014271N'),
     ('Waterloo Station / Waterloo Road', '490000254E'),
     ('The Old Vic', '490013485S1'),
     ("St George's Circus", '490012693S2'),

... and so on.

Problem :

#! /usr/bin/python
import urllib
import pandas as pd
import source as source
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

for line_id in range(1, 3):
    line_id = line_id + 1
    x = str(line_id)

req = Request('' + x, headers={'User-Agent': 'Mozilla/5.0'})
web_page = urlopen(req).read()
soup = BeautifulSoup(web_page, 'html.parser')
search_data = soup.find('input', {'class': 'stopNaptanId'}).get('value')

I just see the last record. I want to saved all the values in order.


Comment posted by

the best thing to do is read the docs. It tells that