MakeoverMonday: Mapping a Global City Network — 5,470 nodes, 10,596 ties
MakeoverMonday
Python
Finance
Reframing the 5,470-city, 10,596-link graph (Wikidata QIDs) into a map to surface geographic hubs
Author
chokotto
Published
May 11, 2026
Overview
This week’s dataset ships as two tidy files: a city register with coordinates and a link list connecting city QIDs. The original community takes on this challenge often reach for a force-directed “hairball” to show how places relate. I took a map-first angle instead: put every city where it actually sits on Earth, then let the relationships ride on top as size and color. That way the structure reads through a geographic lens, not just network physics.
Why the change? A node-link diagram is great for topology, but it hides where those ties live. With lng/lat, country, and continent in hand, we can make adjacency visible while still surfacing connectivity. That supports questions like “which regions are overrepresented?” and “where are the big hubs?” without losing a sense of place.
A quick look at the schema shows a globally scoped graph: - 5,470 rows in cities.csv; 10,596 rows in links.csv - Longitude spans -172 to 178 and latitude -54.8 to 72.8 (median 13.3°E, 44.9°N) - countrycd has 7 missing values; continent is fully populated
The post walks through a compact, map-centered redesign and what it reveals about coverage and connectivity. See the data on data.world for context.
The original visualization for this challenge was a node-link network: cities (QIDs) as circular nodes connected by straight edges, typically laid out with a force-directed algorithm. It aimed to communicate which places act as hubs by using node size/labels and the density of connections, with no geographic axes—just x/y from the layout to emphasize topology over location.
Dataset
Code
import sysfrom pathlib import Pathimport pandas as pdimport numpy as npimport plotly.express as pximport plotly.graph_objects as go# posts/_mm_layout.py を import(実行 cwd は投稿サブフォルダまたは quarto プロジェクトルートのいずれか)_p = Path.cwd()if (_p /"_mm_layout.py").exists(): _posts = _pelif (_p.parent /"_mm_layout.py").exists(): _posts = _p.parentelif (_p /"posts"/"_mm_layout.py").exists(): _posts = _p /"posts"else: _posts = _psys.path.insert(0, str(_posts))from _mm_layout import apply_mm_layout
Code
# Load this week's data# df = pd.read_csv("data.csv")
My Makeover
What I Changed
I shifted from a force-directed network to a geo-anchored view so location informs the reading. With explicit lng/lat, each city lands on a map; connectivity is encoded in size and color rather than in where the algorithm chooses to place nodes. This keeps the spatial story intact while still elevating structure.
Concretely, I: - Chose a map with points for cities, sizing nodes by link count (degree) and coloring by continent to balance topology with geography. - Swapped arbitrary force x/y for actual lng/lat and used a consistent projection; adjusted size with a mild nonlinear scale to control outliers. - Framed the analysis around coverage and hubs by region (e.g., completeness across continents, missing country codes) rather than listing individual high-degree cities.
This combination preserves the graph’s relationships but avoids the “hairball” effect, surfaces regional patterns at a glance, and makes data quality (like the 7 missing countrycd values) immediately visible.
Visualization
Code
df_cities = pd.read_csv('data/cities.csv')df_links = pd.read_csv('data/links.csv')# Compute degree (link count) per city by counting occurrences in source and targetedges = pd.concat([df_links['source'], df_links['target']], ignore_index=True).rename('id').to_frame()deg = edges['id'].value_counts().rename('degree').reset_index().rename(columns={'index': 'id'})# Join degree back to city table; missing degrees mean 0cities = df_cities.merge(deg, on='id', how='left')cities['degree'] = cities['degree'].fillna(0).astype(int)# Drop rows missing coordinatescities = cities.dropna(subset=['lng', 'lat'])# Prepare continents (treat any missing as 'Unknown')cities['continent'] = cities['continent'].fillna('Unknown')continents =sorted(cities['continent'].unique().tolist())fig = go.Figure()for i, cont inenumerate(continents): sub = cities[cities['continent'] == cont]# mild nonlinear size scale so high-degree hubs stand out without overwhelming sizes = (sub['degree'].astype(float).pow(0.5) *3) +4 hover = sub['name'] +' ('+ sub['id'] +')<br>Links: '+ sub['degree'].astype(str) fig.add_trace(go.Scattergeo( lon=sub['lng'], lat=sub['lat'], text=hover, hoverinfo='text', mode='markers', name=cont, marker=dict( size=sizes, color=PAL[i %len(PAL)], opacity=0.8, line=dict(width=0.2, color='rgba(0,0,0,0.2)') ) ))fig.update_layout(**THEME, xaxis=dict(title='Longitude', tickformat=',.2f'), yaxis=dict(title='Latitude', tickformat=',.2f'), geo=dict( scope='world', projection_type='natural earth', showcountries=True, showcoastlines=True, landcolor='rgba(240,240,240,0.9)' ), legend=dict(traceorder='normal'))apply_mm_layout( fig,'Cities positioned by real coordinates, sized by link count', subtitle='Each point is a city; marker size = number of connections (degree). Colors show continent.', legend_position='top', n_legend_items=len(continents),)add_source(fig)assert_no_title_overlap(fig)fig.show()try: fig.write_image('chart-1.png', width=1200, height=520, scale=2)exceptException:pass
Key Takeaways
Global spread is clear in the numeric bounds: longitude runs from -172 to 178, and latitude from -54.8 to 72.8, with medians at 13.3°E and 44.9°N as shown in the stats.
Scale matters: there are 5,470 cities connected by 10,596 links, enough for a worldwide network while staying sparse enough that many nodes will show few ties.
In the head sample, all five rows are Japanese cities—Fukaya, Ishigaki, Tsuru, Koga, and Kakegawa—tagged Asia (countrycd JP), a reminder to check regional balance beyond the first look.
This post is part of the MakeoverMonday weekly data visualization project.
CautionDisclaimer
This analysis is for educational and practice purposes only. Data visualizations and interpretations are based on the provided dataset and may not represent complete or current information.