Skip to content

Model State

Last updated: 2026-05-03 (Streamlit scenario explorer landed)

Built

Historical baseline

  • Status: complete
  • Run command: uv run historical
  • Main memo: docs/historical_findings.md
  • Main output table: outputs/tables/historical_trend_estimates.csv
  • Processed dataset: data/processed/historical_models.{csv,parquet}
  • Headline (Rule A 2018+): training compute 5.97×/yr (R²=0.84, n=113); training cost 4.89×/yr (R²=0.72, n=74); cost per FLOP 0.76×/yr (~24%/yr decline)
  • Charts (8): all under outputs/charts/historical_*.png

Supply capacity model

  • Status: complete (sourced inputs, sensitivity, cost variants)
  • Run command: uv run supply
  • Main memo: docs/supply_findings.md
  • Main output table: outputs/tables/supply_fundamental_inputs_by_year.csv
  • Processed dataset: data/processed/supply_fundamental_inputs.csv
  • Assumptions: data/assumptions/supply_input_assumptions.yaml (15 parameters × 4 scenarios)
  • Scenarios: scenarios/supply_*.yaml (base / capex_rich / chip_bottleneck / power_datacenter_bottleneck)
  • Headline (base case): 45.7%/yr CAGR 2024→2040; 1.65e+31 FLOP/yr by 2040; capex binds 2024–2036, chip binds 2037–2040
  • Charts (9): all under outputs/charts/supply_*.png

Allocation layer

  • Status: complete
  • Run command: uv run allocation (requires uv run supply to have produced supply outputs first)
  • Main memo: docs/allocation_findings.md
  • Main output table: outputs/tables/allocation_largest_frontier_run.csv
  • Processed dataset: data/processed/allocation_compute_by_bucket.csv
  • Assumptions: data/assumptions/allocation_input_assumptions.yaml (9 parameters × 4 scenarios)
  • Scenarios: scenarios/allocation_*.yaml (base / inference_heavy / training_race / rnd_acceleration)
  • Combined cross-product: 4 supply × 4 allocation = 16 combined scenarios
  • Headline (base × base): largest frontier run grows 27.6%/yr 2024→2040 (1.39e+27 → 6.93e+28 FLOP). Frontier-run share of total compute falls from 3.5% to 0.4%.
  • Range across scenarios: 14.1%/yr (chip_bottleneck × inference_heavy) to 48.1%/yr (capex_rich × training_race) CAGR; ~50× spread in absolute 2040 FLOP.
  • Charts (6): all under outputs/charts/allocation_*.png

Review layer (DuckDB + Excel workbook)

  • Status: complete
  • Run commands: uv run database (DuckDB, ~5 MB) and uv run workbook (Excel, ~110 KB)
  • Main guide: docs/review_workbook_guide.md
  • Outputs:
  • outputs/database/ai_economy.duckdb — 14 tables + 6 SQL views
  • outputs/database/database_manifest.json — schema version + git commit + row counts
  • outputs/workbooks/ai_economy_model_review.xlsx — 11 sheets (README, Model Flow, Scenario Matrix, Historical Baseline, Supply Capacity, Allocation Buckets, Largest Frontier Run, Phase 4 Handoff, Assumptions, Sources & Confidence, Output Inventory)
  • outputs/runs/latest_run_manifest.json — run metadata + pass/fail counts
  • Validation: uv run validate-outputs walks the outputs tree and verifies every promised artifact exists and is non-empty (53 checks; current state 53/53 pass).

Scenario explorer (Streamlit)

  • Status: complete
  • Run command: uv run demo (wraps streamlit run app/streamlit_app.py)
  • Main guide: docs/streamlit_demo_guide.md
  • Pages:
    1. Model Overview — built/next/future status + headline numbers
    1. Scenario Matrix — 16 combined scenarios with slow/base/fast tags
    1. Supply Capacity — 4 charts + tables, year-range slider
    1. Allocation Layer — bucket stacked-area + share trajectories
    1. Largest Frontier Run — headline forward output, optional historical overlay
    1. Effective-Compute Handoff — slow/base/fast envelope for downstream consumers
    1. Assumptions — source/confidence audit, share assumptions by year
    1. Source Provenance — aggregate audit + full table
    1. Run Manifest — when each artifact was last regenerated
  • Data source: DuckDB review database (with CSV fallback). All loaders cached via @st.cache_data.

Current run commands

uv sync                    # one-time setup (installs deps, registers entry points)
uv run historical          # rebuild historical-baseline deliverables
uv run supply              # rebuild supply-capacity deliverables
uv run allocation          # rebuild allocation deliverables (requires supply)
uv run database            # build the DuckDB review database
uv run workbook             # build the Excel review workbook
uv run demo                # launch the Streamlit scenario explorer
uv run validate-outputs    # confirm artifacts present + non-empty
uv run pytest              # run the test suite (32 tests)

All three pipelines are idempotent — re-running them overwrites the existing artifacts in outputs/charts/ and outputs/tables/. The allocation pipeline reads outputs/tables/supply_fundamental_inputs_by_year.csv and will raise a clear error if you haven't run uv run supply first.

Current main outputs

Tables (in outputs/tables/):

File What it is
historical_trend_estimates.csv All historical log-linear trend fits (45 rows: compute, cost, cost-per-FLOP × 4 cost variants × 9 frontier rules)
historical_hardware_summary.csv Hardware-type usage by year for frontier-flagged historical models
supply_fundamental_inputs_by_year.csv Annual scenario projections (4 scenarios × 17 years × ~25 columns)
supply_scenario_summary.csv Pivot-table summary at milestone years
supply_binding_constraints.csv Years-by-binding-constraint counts per scenario
supply_capex_requirements.csv Capex required vs capex available, per scenario per year
supply_sensitivity_analysis.csv One-parameter sensitivity perturbations of the base scenario
allocation_compute_by_bucket.csv Year-by-combined-scenario allocation across the 6 buckets
allocation_largest_frontier_run.csv The headline largest_frontier_run_flop per year per combined scenario
allocation_scenario_summary.csv Per-combined-scenario milestone summary (2024 / 2030 / 2040) + 16-year CAGRs
allocation_vs_historical_trend.csv Year-by-year gap_ratio between allocation projections and the historical Rule A 2018+ extrapolation
allocation_share_assumptions_by_year.csv Interpolated allocation parameters by year (audit trail)

Charts (in outputs/charts/):

  • 8 historical_*.png (compute / cost / cost-per-FLOP / by-org / residuals / hardware-timeline)
  • 9 supply_*.png (accelerator stock / theoretical / usable compute / power constraint / capex required / binding-constraint heatmap / cost variants / sensitivity bands / supply-vs-historical)
  • 6 allocation_*.png (compute by bucket / largest frontier run / vs historical / training-vs-inference share / frontier-run share of total / 4×4 scenario grid)

For per-file interpretation see output_guide.md.

Not yet built

Effective compute

  • Purpose: convert raw frontier training-run FLOP into algorithmically-adjusted effective compute, accounting for architectural and post-training efficiency gains.
  • Depends on: allocation layer's largest_frontier_run_flop_by_year output (now available).
  • Reason this is next: all upstream layers feeding it are now built; the historical-vs-projection 7-OOM gap (visible in outputs/charts/allocation_vs_historical_training_compute.png) is the obvious phenomenon for this layer to address by adjusting raw FLOP for algorithmic-efficiency gains.

Capability mapping

  • Purpose: map effective compute into task horizons / benchmark performance / automation levels.
  • Depends on: effective-compute layer.

Probabilistic projections

  • Purpose: combine all upstream layers into Monte-Carlo-style projections with confidence bands rather than scenario point estimates.
  • Depends on: capability mapping (or earlier layers, depending on what's being projected).

Economy feedback

  • Purpose: revenue / reinvestment loops feeding back into supply-side capex assumptions, closing the macro loop.
  • Depends on: all of the above.

The effective-compute layer. Now that allocation is shipped, the largest_frontier_run_flop_by_year output is available. The effective-compute layer should consume it and adjust upward for algorithmic-efficiency gains (Epoch's published estimate is ~3×/yr for language-model training, with ranges 1.5×–10× depending on sub-field). The output effective_compute_flop_by_year becomes the input to the capability-mapping layer that follows.

Recommended layout, following the established conventions:

  • pipelines/effective_compute.py (entry point: uv run effective_compute)
  • model/effective_compute_engine.py
  • data/assumptions/effective_compute_input_assumptions.yaml
  • scenarios/effective_compute_*.yaml