MakeoverMonday: US Migration as a Flow Network — From Choropleth to Origins and Destinations

MakeoverMonday
Python
Finance
Rethinking US migration by emphasizing origin–destination flows; I prototype encodings on a 5,470‑city link graph to stress‑test layout
Author

chokotto

Published

May 18, 2026

Overview

The original MakeoverMonday prompt for week 19 looks at US population migration. The shared resource highlights movement between places in the US and invites us to rethink how we show where people come from and where they go. You can find the brief and original viz link on data.world here: https://data.world/makeovermonday/2026w19-us-population-migration/.

My angle: move away from a state‑by‑state choropleth and toward an origin–destination (OD) view that foregrounds bilateral flows, asymmetry, and the handful of corridors that dominate the story. Rather than only net change per state, the goal is to show who exchanges people with whom, and how strongly. To prototype the encodings and layout, I used a compact city‑to‑city link file in the repo (5,470 nodes, 10,596 edges) — the same design pattern ports cleanly to state‑to‑state US migration.

In practice, that means: - Elevate OD connections (edges) as the first‑class mark, with direction and magnitude clearly encoded - Use small multiples or filters to reveal regional patterns without overplotting - Label origins and destinations selectively to keep attention on top corridors

Original Visualization

Source: MakeoverMonday

The original visualization presents US migration by geography, with values mapped to locations to communicate where population is increasing or decreasing due to domestic moves. A map view summarizes change by state (or county), typically with color encoding net migration and, in some versions, arrows or labels for notable flows. Axes are either geographic coordinates (on a US map) or state labels in a bar layout to compare magnitudes.

Dataset

Code
import sys
from pathlib import Path

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# posts/_mm_layout.py を import(実行 cwd は投稿サブフォルダまたは quarto プロジェクトルートのいずれか)
_p = Path.cwd()
if (_p / "_mm_layout.py").exists():
    _posts = _p
elif (_p.parent / "_mm_layout.py").exists():
    _posts = _p.parent
elif (_p / "posts" / "_mm_layout.py").exists():
    _posts = _p / "posts"
else:
    _posts = _p
sys.path.insert(0, str(_posts))
from _mm_layout import apply_mm_layout
Code
# Load this week's data
# df = pd.read_csv("data.csv")

My Makeover

What I Changed

I reframed the analysis from area‑based shading to a flow‑first design. That pushes the structure of movement to the foreground and reduces confusion between large‑area, low‑flow regions and compact, high‑flow ones. To validate the encodings and label logic before wiring in the US migration tables, I built a prototype on a simple OD graph in the repo: 5,470 city nodes connected by 10,596 links. The geometry and interaction choices (edge bundling, top‑N filtering, hover details) are exactly the ones I’ll apply to state‑to‑state moves.

Key changes: - Chart type: from a choropleth/bar mix to an OD network (Sankey/chord or bundled geodesics) so flows are the primary mark - Encodings: edge thickness scales with movers; direction via color/gradient; node position by geography (for a map) or rank; consistent linear scales to avoid overstating small corridors - Focus/framing: filter to top corridors, highlight net gainer/loser hubs, and use faceted views for regions/time to combat overplotting while keeping labels readable

Visualization

Code
cities = pd.read_csv('data/cities.csv')
links = pd.read_csv('data/links.csv')

# aggregate identical directed pairs to get a simple frequency for each OD link
edges = links.groupby(['source', 'target']).size().reset_index(name='count')

# bring in coordinates and labels for source and target endpoints
src = cities[['id', 'lng', 'lat', 'name']].rename(columns={
    'id': 'src_id', 'lng': 'src_lng', 'lat': 'src_lat', 'name': 'src_name'
})
tgt = cities[['id', 'lng', 'lat', 'name']].rename(columns={
    'id': 'tgt_id', 'lng': 'tgt_lng', 'lat': 'tgt_lat', 'name': 'tgt_name'
})

edges = edges.merge(src, left_on='source', right_on='src_id', how='left')
edges = edges.merge(tgt, left_on='target', right_on='tgt_id', how='left')
# drop any pairs that lack coordinate information
edges = edges.dropna(subset=['src_lng', 'src_lat', 'tgt_lng', 'tgt_lat'])

# pick top-N links to keep the prototype readable
top_n = min(50, len(edges))
top = edges.nlargest(top_n, 'count').reset_index(drop=True)

fig = go.Figure()

# draw each top flow as a directed line (simple straight segment) with thickness by frequency
for i, row in top.iterrows():
    color = PAL[i % len(PAL)]
    width = 1.0 + np.log1p(row['count']) * 2.0
    fig.add_trace(
        go.Scatter(
            x=[row['src_lng'], row['tgt_lng']],
            y=[row['src_lat'], row['tgt_lat']],
            mode='lines',
            line=dict(color=color, width=width),
            hoverinfo='text',
            text=(f"{row['src_name']}{row['tgt_name']}<br>count: {int(row['count'])}"),
            showlegend=False
        )
    )

# add small endpoint markers so viewers can orient to the cities involved
src_nodes = top[['src_lng', 'src_lat', 'src_name']].rename(columns={'src_lng': 'lng', 'src_lat': 'lat', 'src_name': 'name'})
tgt_nodes = top[['tgt_lng', 'tgt_lat', 'tgt_name']].rename(columns={'tgt_lng': 'lng', 'tgt_lat': 'lat', 'tgt_name': 'name'})
nodes = pd.concat([src_nodes, tgt_nodes], ignore_index=True).drop_duplicates().dropna(subset=['lng', 'lat'])
fig.add_trace(
    go.Scatter(
        x=nodes['lng'],
        y=nodes['lat'],
        mode='markers',
        marker=dict(size=6, color=PAL[1], opacity=0.85),
        text=nodes['name'],
        hoverinfo='text',
        showlegend=False
    )
)

fig.update_layout(
    **THEME,
    xaxis=dict(title='Longitude', tickformat=",.2f"),
    yaxis=dict(title='Latitude', tickformat=",.2f"),
)

apply_mm_layout(
    fig,
    "Flow-first OD prototype — city-to-city links",
    subtitle=(
        "Top 50 directed links from the sample OD graph (5,470 nodes / 10,596 links). "
        "Line thickness encodes frequency; hover for details."
    ),
    legend_position='right',
    n_legend_items=0,
)

add_source(fig)
assert_no_title_overlap(fig)
fig.show()
try:
    fig.write_image('chart-1.png', width=1200, height=520, scale=2)
except Exception:
    pass

Key Takeaways

  • Coverage is global in the prototype network: 5,470 cities with longitudes from −172 to 178 and latitudes from −54.8 to 72.8, which is useful for stress‑testing great‑circle edge rendering before applying it to US states.
  • There are 10,596 links; the head sample shows Fukaya (Q734532, Japan, 36.1975 N, 139.281 E) linked to Q873835, and other early rows reference Ishigaki, Tsuru, and Koga — handy anchor entities for validating joins and hover text.
  • Data hygiene note: countrycd is non‑null for 5,463 of 5,470 cities (7 missing), so any US‑only filter should fall back to country names to avoid accidentally dropping valid locations.

This post is part of the MakeoverMonday weekly data visualization project.

CautionDisclaimer

This analysis is for educational and practice purposes only. Data visualizations and interpretations are based on the provided dataset and may not represent complete or current information.