The Build Log

Every change to the model, in order.

From the first commit in September 2025 through today's live routing. Each entry is dated, documented, and tied to a real bankroll figure. Nothing retroactive. Nothing hidden.

Project snapshot · As of May 24, 2026
2,785 Graded picks Since Feb 11, 2026
4 Sports MLB · NHL · NBA · NCAAM
v2.1 Routing version Live since May 7
3 Versions shipped After Conviction prototype: v1 → v2 → v2.1
Tap any entry to expand →

How the system evolved.

Top to bottom — earliest first. Solid dots mark milestones with their own performance record. Outlined dots mark infrastructure, corrections, or upcoming work.

Sep 25, 2025
Day 1
Origin·First commit
Sep 25, 2025 · day 1

The project starts

A personal data-modeling project, before it had a name.

Initial repo setup. The thesis: ESPN shows hundreds of games a day; a sophisticated model might uncover bets the market hasn't priced correctly. Started as a personal tool first — no public surface, no Best Bets page, just a developer poking at sports data.

What was committed

The first commits (Sep 25, 2025)

  • Repo skeleton — .gitignore, outputs tree, templates
  • Development tools + CI scaffold
  • Automation system foundation with Mac support
  • Enhanced prediction system with score predictions and market variance

Why this is on the timeline

Trust starts with showing the work. The system didn't appear in February 2026 when the public Best Bets page launched — it had been running and learning for five months before then. The early commits aren't pretty but they're the actual start.

Oct 14, 2025
First production sport
NCAAF + NFL·Dual-sport deploy
Oct 14, 2025 · first production sport

NFL & NCAAF go live in production

Two sports, one stack. Real games getting predicted on a deployed server.

First production deployment. NCAAF EPA training pipeline with CFBD integration. NFL phases 4–6 implemented two days later (Oct 16). Pinnacle integration on the same day. The system was now predicting real games on a real schedule — but with no public-facing tracking yet.

Read what shipped

October 2025 milestones

  • Phase 3.7 baseline model deployment (Oct 13–14)
  • NCAAF EPA pipeline with CFBD integration (Oct 14)
  • Dual-sport NFL + NCAAF deployment (Oct 14)
  • NFL Phases 4–6 + Pinnacle integration (Oct 16)
  • NCAAF margin model + production automation (Oct 21–23)
Nov 14, 2025
Off-season build
NCAAM live·Warehouse layer begins
Nov 14, 2025 · off-season build

NCAAM joins. Warehouse infrastructure starts.

From two sports to three, plus the data layer that everything else would depend on.

NCAAM model joins the production lineup mid-November. Behind the scenes, the warehouse schema and pick-storage layer get built (Nov 23: "Data quality improvements and warehouse infrastructure"). This is the layer the public Track Record would later be built on top of.

Dec 29, 2025
Multi-sport prep
NHL + NBA + MLB·Scaffolded
Dec 29, 2025 · multi-sport prep

NHL, NBA, MLB scripts staged for production

Off-season expansion. Six sports in the pipeline by year-end.

All three remaining major sports get their scripts and pipelines built in a single commit on Dec 29, 2025 — the foundation for the daily multi-sport coverage that would launch in January. CLV tracking, autonomous reliability infrastructure, and bottleneck fixes round out the December push.

Jan 15, 2026
Major model upgrade
NCAAM·XGBoost upgrade
Jan 15, 2026 · major model upgrade

NCAAM XGBoost replaces Ridge regression

First major model architecture change. Haslametrics joins the input mix.

The NCAAM model swaps Ridge regression for XGBoost — a structural improvement, not a tuning change. Haslametrics ratings are blended at 0.5 weight. The result: 393/718 ATS (54.7%) and 406/538 O/U (75.5%, +48.5% ROI) across the full season.

Read what changed

Why XGBoost

Ridge regression couldn't capture nonlinearity in conference-level matchups. XGBoost handles feature interactions natively. The Haslametrics blend at 0.5 weight imports external ratings as a regularization signal against the model's own estimates.

2025-26 NCAAM season result

ATS
54.7%
ROI ATS
+4.9%
O/U
75.5%
ROI O/U
+48.5%
Jan 18, 2026
Self-correcting layer
Edge calibration·Self-correcting layer
Jan 18, 2026 · self-correcting layer

NCAAM edge calibration system built

Raw model edges shrunk and capped against 808 games of historical performance.

First systematic answer to "do our claimed edges actually predict our actual win rate?" Raw edges of +7 only hit 46.6% historically, so the calibration system reduces them — preventing overconfident betting on edges the data doesn't support. Refined Jan 20–22 (home court bias disabled, ATS calibration improved, simplified v2 added).

Jan 19, 2026
First full-season result
NCAAF·2025 season closes
Jan 19, 2026 · first full-season result

NCAAF 2025: 62.0% ATS, +20.2% ROI over 703 picks

National Championship pick: Indiana −7.5 over Miami. Hit. The system had earned the right to be tested in public.

The NCAAF model finished its first full season as the best-performing system in the lineup. 436 ATS wins on 703 picks, +20.2% ROI. The O/U side hit 106.7%. The national championship prediction (Indiana −7.5, UNDER 48.5) was the model's final pick of the season — and it landed.

The 2025 season in numbers

Full season performance

ATS Record
436-267
ATS Win %
62.0%
ROI
+20.2%
Graded
777

What it proved

That the architecture (XGBoost on SP+ ratings, Elo, tempo features, conference-aware) could sustain edge across a 5-month season including bowl variance.

Jan 23, 2026
Different sport, same rigor
UFC·Built, not yet published
Jan 23, 2026 · different sport, same rigor

UFC prediction system built

First non-team-sport architecture: fighter-level predictions, consensus odds, post-event stat updates. Infrastructure complete; not in the public Best Bets lineup as of May 25, 2026.

A structural departure from the team-sport models. UFC predictions work at the fighter level, pull from multiple odds sources to build consensus prices (Phase 4 deploy on Jan 27), and run a post-event update job to ingest fighter stats after each card. The infrastructure shipped; the picks themselves have not been published publicly yet. Why it's on the timeline anyway: it proved the architecture transferred to a fundamentally different sport.

January 2026 UFC milestones

What shipped (Jan 23–27)

  • Critical prediction system bug fixes + initial documentation
  • Added to unified dashboard
  • Database ingestion with predicted home/away score fields
  • Calibration, expanded data sources, test coverage
  • Post-event fighter stats update script
  • Phase 4: Multiple odds sources with consensus calculation

Why it's on the timeline

Proves the architecture wasn't team-sport-specific. The same discipline — predict before the event, grade against actual results, track calibration — extended cleanly to a fundamentally different sport.

Jan 25, 2026
Daily production
NHL·● Live daily
Jan 25, 2026 · daily production

NHL goes daily — 3× per day predictions

Margin MAE 1.96 goals. Starting goalie tracking online.

NHL active in production with three daily prediction cycles. The model's margin error is well-calibrated for hockey's tight scoring (MAE 1.96 goals on a typical 5–6 goal total). Starting goalie tracking added on the same date — a critical feature for hockey since goalies swing win probabilities more than any other position.

Jan 27, 2026
Daily production
NBA·● Live daily
Jan 27, 2026 · daily production

NBA goes daily — 3× per day predictions

Win accuracy 66.8%. Margin MAE 11.1 pts.

NBA joins NHL in daily production one day later. Three prediction cycles per day (morning, mid-afternoon, evening). Win-prediction accuracy of 66.8% on the training set.

Feb 11, 2026
The public pivot
Best Bets·Public tracking begins
Feb 11, 2026 · the public pivot

Best Bets tab launches — every pick now graded publicly

The day the system stopped being a personal tool and became a public track record.

"Today's Best Bets" launches as a tab with performance tracking. From this day forward, every pick is published before games start and graded against actual results. This is the dividing line between "personal modeling project" and "public, auditable system" — and where today's all-time numbers begin counting.

Why this date matters

What changed structurally

  • Pick storage in PostgreSQL warehouse — every prediction timestamped before games lock
  • Grading pipeline — automatic comparison to actual scores
  • Performance tracker UI — visible record, not just a personal spreadsheet

The integrity claim begins here

"Published before games start, graded against actual results" — that's the spine of the trust claim. It became enforceable on Feb 11, 2026, when the warehouse started recording picks with timestamps.

Feb 14, 2026
A bad day, documented
Incident·Calibration bypassed in error
Feb 14, 2026 · a bad day, documented

Calibration system destruction — 4 hours, 3 losing bets, reverted

An AI assistant assumed the calibration was a bug and forced raw edges. It wasn't a bug. The reverted-same-day version of how trust gets built.

An AI assistant noticed the database's edges didn't match raw formula output and "fixed" them by forcing recalculation across the board. The mismatch wasn't a bug — it was the calibration system from Jan 18 doing exactly what it was designed to do, shrinking raw edges against 808 games of historical reality. By 7:15 PM the user had bet on three uncalibrated picks. By 8:30 the day was 1-3 against the model's normal 4-0. By 10:30 the changes were reverted and the incident written up.

The full timeline

What went wrong

  • 6:22 PM: Assistant flagged "wrong" edges in the database. Assumed bug.
  • 6:30 PM: Force-recalculated 132 ATS edges. Didn't investigate the discrepancy first.
  • 6:45 PM: Same for O/U edges — Pacific OVER edge went from calibrated +6.7 to forced +24.8.
  • 7:00 PM: Modified sync_odds_to_predictions.py to always recalculate. Committed as a "fix."
  • 7:15 PM: User bet on 3 picks the calibration would have either dropped (Pacific +7 → PASS) or sized differently.
  • 8:30 PM: 1-3 day. User: "This is exactly what i was worried about happening and we let it happen."
  • 9:00 PM: User asked to see what the old system would have recommended. The calibration system was discovered, still on disk, still working — just bypassed.
  • 10:30 PM: Changes reverted (commit ac5dd40). Incident report written. Estimated avoidable loss: ~$93.

What it actually proved

The calibration discipline works — and is dangerous to bypass without understanding it. The same discipline, applied to MLB ML three months later, caught the 10× formula error. The lesson got carved into CLAUDE.md in a section titled NEVER BYPASS.

Why this is on the timeline

Most picks sites would delete this and never speak of it. Publishing it is the trust claim made real. Every entry on this page passes the same test — the calibration corrections, the v2 bugs, the conviction-era losses. If it happened, it's here.

Mar 6, 2026
Engine v1.1
Wave 1 Recalibration·engine.py v1.1
Mar 6, 2026 · engine v1.1

Wave 1 — tiers removed, edge floors set

First major engine cut based on 1,017 picks of evidence.

After the freeze period (Feb 17 – Mar 5) graded 1,017 picks, Wave 1 acted on the findings. Removed LEAN + HIDDEN_GEMS tiers (both harmful). Set edge floor at 5.0 and conviction floor at 0.55 for NCAAM/NBA (NHL exempt). Introduced league-specific thresholds (NHL TOP_PLAY 0.45, NBA ATS TOP_PLAY 0.70). Added ENGINE_VERSION tracking — every pick now records which engine version emitted it.

Mar 12, 2026
The $1k test begins
Bankroll System·Live
Mar 12, 2026 · the $1k test begins

The $1,000 bankroll system goes live

"If I took $1,000 of real money and only bet what the model wanted to bet, would it work?"

The first version of the bankroll system. A tier-driven formula (TOP_PLAY ATS at $50, SOLID NHL at $25, etc.) decided what was bankroll-worthy. A daily cron logged each evening's close into a permanent ledger. Picks had been published since Feb 11; this is when the system started testing them with stakes.

Mar 21, 2026
Tournament-aware
March Madness 2026·Tournament-aware engine
Mar 21, 2026 · tournament-aware

Tournament awareness shipped to NCAAM engine

Cross-conference matchups were tanking ATS — the engine had no idea. Now it does.

After 9 days into the bracket showed NCAA Tournament ATS TOP_PLAYs at 22% (2-7), the model got tournament-aware. Caution notes on cross-conference NCAA Tournament ATS picks. Value flag on tournament O/U at 5+ edge (the historical sweet spot — 66.7% hit). Tournament games now visually distinct from regular-season picks.

~Mar 25, 2026
Silent until May 24
Data gap · Daily ledger stopped writing Discovered May 24, 2026
~Mar 25, 2026 · silent until May 24

The daily ledger went silent

We didn't notice for two months. The audit for this page is what found it.

Around March 25, 2026, we replaced the daily bankroll email with a consolidated "Best Bets Daily Digest." The old email script had a quiet side-effect: every morning, it wrote one row to bankroll_daily_log summarizing the previous day's bankroll close. The new digest doesn't write that row.

Picks kept getting published and graded the whole time. Only the once-a-day "this is where the bankroll closed" record stopped. It surfaced two months later during the audit that built this page — when we tried to verify what the bankroll did on May 5 and found no row.

What this means + what we're doing about it

What we still have

  • Every individual pick from Mar 25 onward — all stored, all graded
  • 13 days of frozen daily ledger from before the gap (Mar 12 – Mar 24)
  • Picks can be re-aggregated under any routing rules at any time

What we lost

  • The frozen "this is what the bankroll showed at end-of-day" record for every day Mar 25 onward
  • So daily totals on this site are now recomputed live from the picks each page load — under whatever routing logic is current
  • Same picks, slightly different bet sizing applied retroactively when our routing rules change. Today's numbers are honest; the history can shift slightly

What we're doing about it

  • The "How we measure" section at the bottom of this page explains the gap to visitors
  • The snapshot architecture in Phase 2 will store routing decisions per-pick — so the next gap can't happen silently
Apr 8, 2026
Era retires
Conviction Era Feb 11 – Apr 8, 2026 Prototype
Feb 11 – Apr 8, 2026 · prototype, retired

The Conviction Era — the prototype

The original methodology. Where the system learned what to track.

A single conviction score drove both selection and sizing. The bankroll system itself didn't exist until March 12 — it was built to answer one question: if we staked $1,000 on only the picks the model most believed in, would it work? Across all 1,594 picks the model hit 52.9%; the bankroll-tracked tail (Mar 12 – Apr 8) went 99-103-3 (49.0%) — the losing stretch that triggered the routing v1 cut.

Record Fact
834-744-16P
Picks
1,594
Win %
52.9%
Bankroll tail
−$937

Bankroll tracking began Mar 12, partway through the era. The −$937 covers the tracked tail only: $1,000 → $62.70, Mar 12 – Apr 8.

Apr 9, 2026
First version cut
Routing v1 Apr 9 – May 4, 2026 Retired
Apr 9 – May 4, 2026 · retired

Routing v1 — the split

Selection logic separated from sizing. The first true version cut.

The engine proposed picks; a new routing layer decided what to publish and at what size. The model picked well across the stretch — 379-300-13P, 55.8% — but the bankroll lost ground, and that gap between good picks and a shrinking balance is what drove the move to v2. The split was structural groundwork, not a profit story yet.

Record Fact
379-300-13P
Win %
55.8%
Window
26 days
Bankroll P/L
−$458*

* Under v1's own live-then sizing. Different sizing rules put this between roughly −$460 and −$270; all show a loss. 692 graded picks. See "How we measure."

Why it was retired

NBA O/U at 34.8% on bankroll-routed picks. The 5.0-point edge floor wasn't protecting — picks clearing the floor were still losing badly. The routing config had structural issues that v2 would address.

May 5, 2026
Retired in 2 days
Routing v2 May 5 – May 6, 2026 Retired · 2 days
May 5 – May 6, 2026 · retired in 2 days

Routing v2 — caught and corrected in 48 hours

A day-one audit found three structural bugs. All fixed by v2.1.

Removed NBA from the bankroll, added MLB at test sizing. A same-day audit of v2's results flagged three structural issues — edge-sign discarded, NHL oversized, SQL gaps across endpoints. Rather than wait out the variance, it was cut and corrected within 48 hours. Retired for how it was built, not for a two-day score.

Record Fact
31-31-4P
Picks
66
Window
2 days
Why retired
Structural,
not results

Two days is too short for a meaningful P/L. The record is exact; the reason for retirement was structural — three bugs caught in a day-one audit and fixed in v2.1.

What went wrong

The three v2 bugs

  • Edge sign discarded. Routing used abs(edge), so a pick with negative model edge was treated identically to one with positive edge.
  • NHL ATS at 0.75u was oversized. A 44-pick sample basis was too small; one bad day (Min Wild −$75 on May 5) erased a full week of MLB grind.
  • SQL omissions on 5 consumer endpoints. The routing-classification SQL didn't pass needed fields, so +1.5 dog ATS picks silently fell through to BET.

Variance band — documented before deploy

v2 expected variance (from routing-versioning.md)

At $343/day exposure, 7-day 5th-percentile worst case was −$225, 14-day was −$190. Bankroll dropping into the $800s was within modeled variance, not system failure.

May 7 – now
Live now
Routing v2.1 May 7 – present Live now
May 7 – present · live now

Routing v2.1 — corrections

A correction cut, not a rewrite. The version generating today's picks.

v2.1 corrected each of v2's three structural issues. Because it's the version running right now, its numbers are measured live and match the Bankroll page exactly — same computation, no reconstruction.

Record
93-73-7P
Win %
56.0%
ROI
+13.4%
Live Bankroll
$1,269

Net +$269.36 across the combined v2 + v2.1 window (May 5 onward), measured live. Matches the Bankroll page — same live computation.

What got fixed + counterfactual

What got fixed

  • Edge sign respected (no more abs(edge))
  • NHL ATS sized 0.75u → 0.5u
  • SQL fixes verified across every consumer endpoint — not just the originally-reported one. Multi-endpoint regression test added.
  • Python service restart procedure documented in CLAUDE.md (deploy-day lesson)

Counterfactual — what if v2.1 had been live during v2's days?

May 5 production data

Live v2 emitted 21 BET picks → −$41.36 P/L. v2.1's logic applied to the same slate would have emitted 13 BET picks → +$16.20 P/L. Delta: +$57.56 over a single day.

Variance band reminder

The current run is going the right direction — but a 14-day 5th-percentile drawdown of −$190 remains within modeled expectation. One bad two-week stretch doesn't break the routing thesis.

May 15, 2026
Math correction
Calibration·MLB Moneyline
May 15, 2026 · math correction

MLB Moneyline calibration corrected

A 10× math error in the win-probability formula. Diagnosed, fixed, and published the same day.

Not a routing cut. A model correction. The formula converting predicted margin into win probability was overconfident by approximately 10× on MLB moneyline picks — reporting 75% confidence on picks that historically won 53% of the time. Caught by routine calibration audit, refit against 187 historical picks, deployed and documented the same day.

The math + impact

What & impact

  • What: 10× overconfidence in MLB ML win-probability formula
  • How we caught it: calibration audit comparing predicted probabilities to actual win rates
  • Fix: refit logistic scale parameter against 187 historical Routing Era picks (Apr 9 – May 14)
  • Calibration error: dropped from 0.27 to 0.03 (≈10× improvement)
  • Visible impact: MLB ML Value picks dropped ~80% in daily volume (5–8/day → 1–5/day)
  • NHL / NBA: same formula, samples too small (n=23, n=4) — will refit at 75 picks each

Post-fix performance (7-day window)

May 16–21 MLB ML BET picks

15 graded picks, 10-5 (66.7%). Edges range 2–17 percentage points, down from 41–65 pre-fix. Volume in expected range. The fix is doing exactly what it was supposed to do.

Phase 2
Next · gates v3
Phase 2·Up next
Phase 2 · up next

Routing snapshot architecture

Ships before any further version cut. Non-negotiable.

Today, historical pages can re-route dynamically — meaning an old page might show a routing decision that isn't what was actually published when those picks were live. That contradicts "nothing retroactive." Snapshot freezes each day's routing as it was actually shown. Ships before v3.

Phase 2
Research · likely v3
Phase 2·Research
Phase 2 · research

Conviction formula fix → likely v3

The Conviction-Era bug, finally addressed. Research-shaped — possibly weeks.

The conviction-formula bug carried since the Conviction Era. Research-shaped: analysis-heavy, possibly weeks. Likely produces the v3 routing cut. Sequenced after snapshot so v3 lands on a non-rewriting architecture. Also gates NBA reintroduction.

Phase 3+
Deferred
Phase 3+·Deferred
Phase 3+ · deferred

Growth & monetization

Deliberately last. The model earns the audience before it asks for anything.

Live pricing, line-movement tracking, parlay constructor, distribution (Twitter, newsletter, Discord). Monetization sits here too — and waits on legal. The ordering is the point: the public track record has to be undeniable before any of this gets built on top of it.

How we measure

Each pick's win or loss is graded against the real game result and never recomputed — that's why every record above is exact and marked as a fact.

Dollar profit/loss is different: it depends on which picks counted as bankroll bets and at what size, and that sizing logic changed across versions. For retired versions we show the P/L under that version's own sizing at the time, and footnote the spread other rules would give. The live version (v2.1) shows exact dollars because it's measured in real time under the routing running right now.

Every version ran for real.

Every pick on this timeline was published before games and graded after. The parts that worked, the parts that didn't, the gaps we found later — all of it is here.