Berlin V1-V5 replication¶

End-to-end reproduction of the paper’s 5 model variants on the synthetic 96-zone Berlin scenario, baseline + East-West Express shock, at seed 42.

Prerequisites¶

git clone https://github.com/MASE-eLab/agent-urban-planning.git
cd agent-urban-planning
pip install -e ".[llm,plot,berlin]"

The bundled Berlin Ortsteile NPZ files at data/berlin/ortsteile/ are required. They ship in the git repo (~15 MB; see data/README.md for size + license attribution to Ahlfeldt et al. 2015) but NOT in the PyPI sdist. pip install alone is not enough — you need the git clone.

Reproducibility tiers¶

Tier	Variant	Wall-clock	LLM credits?
Tier 3a	V1 (Baseline-softmax)	~3 hr	No
Tier 3b	V2 (Baseline-ABM argmax)	~3 hr	No
Tier 3c	V3 (Normal-ABM argmax)	~3 hr	No
Tier 3d	V4 (Hybrid-ABM)	~3 hr	Yes (~$5)
Tier 4	V5 (LLM-ABM, paper headline)	~10 hr live / ~5 min cache-replay	Yes (~$30-50 live) / No (cache replay)

A --no-llm mode on run_v5_score_all.py replays the bundled cached LLM responses at data/berlin/llm_cache_v5/ so Tier 4 can be reproduced without any LLM credits.

Run V1 (no LLM, simplest case)¶

python examples/02_berlin_replication/run_v1_softmax.py

Produces:

output/berlin_v1_softmax/per_zone.csv (baseline)
output/berlin_v1_softmax_shock_east_west/per_zone.csv (post-shock)

Run V5 (paper headline, cache-replay mode)¶

python examples/02_berlin_replication/run_v5_score_all.py --no-llm

Replays cached LLM responses; ~5 min wall-clock. Produces:

output/berlin_v5_score_all/per_zone.csv

Run V5 (live LLM mode)¶

python examples/02_berlin_replication/run_v5_score_all.py --llm-provider codex-cli

Live mode requires the codex CLI to be authenticated (codex login). Estimated cost: $30-50 in API credits. Wall-clock: ~10 hr.

Compare + plot¶

Each variant’s output/{variant}/per_zone.csv and output/{variant}_shock_east_west/per_zone.csv contain the per-zone sim/obs values for cross-variant analysis. The paper’s headline moments table and choropleths are reproduced in figures/ of this repo (comparison_moments.csv, berlin_dlogQ.png, berlin_dlogW.png) and bundled in the README. Custom analyses can be built directly off the per_zone.csv outputs using pandas/matplotlib.

Numerical reproducibility¶

V1, V2, V3 are deterministic (closed-form or seeded-stochastic). V4 and V5 reproducibility depends on:

Same LLM provider (codex-cli recommended for V5)
Same prompt cache (bundled at data/berlin/llm_cache_v5/)
Same seed (42)

Cross-variant numerical equivalence to the dev repo’s outputs is documented to within 1e-3 tolerance in the Berlin V1-V5 reproducibility page.

Troubleshooting¶

“Bundled Berlin data missing” error: You ran pip install instead of git clone. The data files are git-only. Re-clone the repo or download the data separately.

LLM provider not configured: For V4 and V5 live runs, verify codex --version works (or your chosen provider’s auth). Use --no-llm on V5 to skip live LLM calls.

Numerical divergence from paper: Verify your seed is 42 (the paper default) and you’re using the bundled LLM cache for V5. Live LLM runs at different seeds will not be bit-identical to the paper.

Next steps¶

API reference — full API reference.
Berlin V1-V5 reproducibility — reproducibility tier definitions.