Solution 1 :

This script will get District Names, Numbers and Names of neighborhoods of Madrid:

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_neighborhoods_of_Madrid"
soup = BeautifulSoup(requests.get(url).content,'html.parser')

rows = iter(soup.select('.wikitable tr:has(td, th)'))
next(rows)  # skip headers

district_name = ''
for tr in rows:
    cells = tr.select('td, th')
    if len(cells) != 3:
        district_name = cells[0].get_text(strip=True)
    number, name, _ = cells[-3:]
    number = number.get_text(strip=True)
    name = name.get_text(strip=True)
    print('{:<30}{:<5}{}'.format(district_name, number, name))

Prints:

Centro(1)                     11   Palacio
Centro(1)                     12   Embajadores
Centro(1)                     13   Cortes
Centro(1)                     14   Justicia
Centro(1)                     15   Universidad
Centro(1)                     16   Sol
Arganzuela(2)                 21   Imperial
Arganzuela(2)                 22   Acacias
Arganzuela(2)                 23   Chopera
Arganzuela(2)                 24   Legazpi
Arganzuela(2)                 25   Delicias
Arganzuela(2)                 26   Palos de Moguer
Arganzuela(2)                 27   Atocha
Retiro(3)                     31   Pacífico
Retiro(3)                     32   Adelfas
Retiro(3)                     33   Estrella
Retiro(3)                     34   Ibiza
Retiro(3)                     35   Jerónimos
Retiro(3)                     36   Niño Jesús
Salamanca(4)                  41   Recoletos
Salamanca(4)                  42   Goya
Salamanca(4)                  43   Fuente del Berro
Salamanca(4)                  44   La Guindalera
Salamanca(4)                  45   Lista
Salamanca(4)                  46   Castellana
Chamartín(5)                  51   El Viso
Chamartín(5)                  52   Prosperidad
Chamartín(5)                  53   Ciudad Jardín
Chamartín(5)                  54   Hispanoamérica
Chamartín(5)                  55   Nueva España
Chamartín(5)                  56   Castilla
Tetuán(6)                     61   Bellas Vistas
Tetuán(6)                     62   Cuatro Caminos
Tetuán(6)                     63   Castillejos
Tetuán(6)                     64   Almenara
Tetuán(6)                     65   Valdeacederas
Tetuán(6)                     66   Berruguete
Chamberí(7)                   71   Gaztambide
Chamberí(7)                   72   Arapiles
Chamberí(7)                   73   Trafalgar
Chamberí(7)                   74   Almagro
Chamberí(7)                   75   Ríos Rosas
Chamberí(7)                   76   Vallehermoso
Fuencarral-El Pardo(8)        81   El Pardo
Fuencarral-El Pardo(8)        82   Fuentelarreina
Fuencarral-El Pardo(8)        83   Peñagrande
Fuencarral-El Pardo(8)        84   Pilar
Fuencarral-El Pardo(8)        85   La Paz
Fuencarral-El Pardo(8)        86   Valverde
Fuencarral-El Pardo(8)        87   Mirasierra
Fuencarral-El Pardo(8)        88   El Goloso
Moncloa-Aravaca(9)            91   Casa de Campo
Moncloa-Aravaca(9)            92   Argüelles
Moncloa-Aravaca(9)            93   Ciudad Universitaria
Moncloa-Aravaca(9)            94   Valdezarza
Moncloa-Aravaca(9)            95   Valdemarín
Moncloa-Aravaca(9)            96   El Plantío
Moncloa-Aravaca(9)            97   Aravaca
Latina(10)                    101  Los Cármenes
Latina(10)                    102  Puerta del Ángel
Latina(10)                    103  Lucero
Latina(10)                    104  Aluche
Latina(10)                    105  Campamento
Latina(10)                    106  Cuatro Vientos
Latina(10)                    107  Las Águilas
Carabanchel(11)               111  Comillas
Carabanchel(11)               112  Opañel
Carabanchel(11)               113  San Isidro
Carabanchel(11)               114  Vista Alegre
Carabanchel(11)               115  Puerta Bonita
Carabanchel(11)               116  Buenavista
Carabanchel(11)               117  Abrantes
Usera(12)                     121  Orcasitas
Usera(12)                     122  Orcasur
Usera(12)                     123  San Fermín
Usera(12)                     124  Almendrales
Usera(12)                     125  Moscardó
Usera(12)                     126  Zofío
Usera(12)                     127  Pradolongo
Puente de Vallecas(13)        131  Entrevías
Puente de Vallecas(13)        132  San Diego
Puente de Vallecas(13)        133  Palomeras Bajas
Puente de Vallecas(13)        134  Palomeras Sureste
Puente de Vallecas(13)        135  Portazgo
Puente de Vallecas(13)        136  Numancia
Moratalaz(14)                 141  Pavones
Moratalaz(14)                 142  Horcajo
Moratalaz(14)                 143  Marroquina
Moratalaz(14)                 144  Media Legua
Moratalaz(14)                 145  Fontarrón
Moratalaz(14)                 146  Vinateros
Ciudad Lineal(15)             151  Ventas
Ciudad Lineal(15)             152  Pueblo Nuevo
Ciudad Lineal(15)             153  Quintana
Ciudad Lineal(15)             154  Concepción
Ciudad Lineal(15)             155  San Pascual
Ciudad Lineal(15)             156  San Juan Bautista
Ciudad Lineal(15)             157  Colina
Ciudad Lineal(15)             158  Atalaya
Ciudad Lineal(15)             159  Costillares
Hortaleza(16)                 161  Palomas
Hortaleza(16)                 162  Piovera
Hortaleza(16)                 163  Canillas
Hortaleza(16)                 164  Pinar del Rey
Hortaleza(16)                 165  Apóstol Santiago
Hortaleza(16)                 166  Valdefuentes
Villaverde(17)                171  Villaverde Alto
Villaverde(17)                172  San Cristóbal
Villaverde(17)                173  Butarque
Villaverde(17)                174  Los Rosales
Villaverde(17)                175  Los Ángeles
Villa de Vallecas(18)         181  Casco Histórico de Vallecas
Villa de Vallecas(18)         182  Santa Eugenia
Vicálvaro(19)                 191  Casco Histórico de Vicálvaro
Vicálvaro(19)                 192  Ambroz
San Blas-Canillejas(20)       201  Simancas
San Blas-Canillejas(20)       202  Hellín
San Blas-Canillejas(20)       203  Amposta
San Blas-Canillejas(20)       204  Arcos
San Blas-Canillejas(20)       205  Rosas
San Blas-Canillejas(20)       206  Rejas
San Blas-Canillejas(20)       207  Canillejas
San Blas-Canillejas(20)       208  Salvador
Barajas(21)                   211  Alameda de Osuna
Barajas(21)                   212  Aeropuerto
Barajas(21)                   213  Casco Histórico de Barajas
Barajas(21)                   214  Timón
Barajas(21)                   215  Corralejos

Problem :

My goal is to scrape a Wikipedia table and join it with coordinates from GeoHack. The code I have found here How to scrape data from different Wikipedia pages? works only for the first row in my case and I assume it occurs because of the merged cells that the Wiki table contains and this particular code td rowspan.
I am new at coding and I am a bit stuck with the issue, so, any help would be very helpful.

import requests
from bs4 import BeautifulSoup as BS
import re

def parse_district(url):
    r = requests.get(url)

    soup = BS(r.text, 'html.parser')

    link = soup.find('a', {'href': re.compile('//tools.wmflabs.org/.*')})

    item = link['href'].split('params=')[1].split('type:')[0].replace('_', ' ').strip()
    #print(item)

    items = link.find_all('span', {'class':('latitude', 'longitude')})

    #print('>>>', [item] + [i.text for i in items][:3] )

    return [item] + [i.text for i in items]

def main():
    url = 'https://en.wikipedia.org/wiki/List_of_neighborhoods_of_Madrid'

    r = requests.get(url)

    soup = BS(r.text, 'html.parser')

    table = soup.find_all('table', {'class': 'wikitable'})
    for row in table[0].find_all('tr'):
        items = row.find_all('td')
        if items:
            row = [i.text.strip() for i in items]

            link = 'https://en.wikipedia.org' + items[0].a['href']
            data = parse_district(link)

            row += data
            print(row)
main() 

Comments

Comment posted by tools.wmflabs.org/geohack

thanks! It works perfectly for scraping the table from the Wiki page. Maybe you know how to join it with

Comment posted by tools.wmflabs.org/geohack

@Anna The link

Comment posted by Anna

I honestly do not know how it works exactly, but I think this code

Comment posted by zamarov

This general question gets an extremely specific answer, thus polluting search results. You’re not answering “How to scrape data with merged cells?”, you’re answering “How to scrape data with merged cells on this very specific page, and in a way nobody else will find useful also while polluting the Google search results with this crap?”. I’m also downvoting the question because the asker gleefully went along with this travesty of an answer.

By

Leave a Reply

Your email address will not be published. Required fields are marked *