Evolution | ParlayGenIQ

Sep 25, 2025

Day 1

Origin·First commit

Sep 25, 2025 · day 1

The project starts

A personal data-modeling project, before it had a name.

Initial repo setup. The thesis: ESPN shows hundreds of games a day; a sophisticated model might uncover bets the market hasn't priced correctly. Started as a personal tool first — no public surface, no Best Bets page, just a developer poking at sports data.

What was committed ▾

The first commits (Sep 25, 2025)

Repo skeleton — .gitignore, outputs tree, templates
Development tools + CI scaffold
Automation system foundation with Mac support
Enhanced prediction system with score predictions and market variance

Why this is on the timeline

Trust starts with showing the work. The system didn't appear in February 2026 when the public Best Bets page launched — it had been running and learning for five months before then. The early commits aren't pretty but they're the actual start.

Oct 14, 2025

First production sport

NCAAF + NFL·Dual-sport deploy

Oct 14, 2025 · first production sport

NFL & NCAAF go live in production

Two sports, one stack. Real games getting predicted on a deployed server.

First production deployment. NCAAF EPA training pipeline with CFBD integration. NFL phases 4–6 implemented two days later (Oct 16). Pinnacle integration on the same day. The system was now predicting real games on a real schedule — but with no public-facing tracking yet.

Read what shipped ▾

October 2025 milestones

Phase 3.7 baseline model deployment (Oct 13–14)
NCAAF EPA pipeline with CFBD integration (Oct 14)
Dual-sport NFL + NCAAF deployment (Oct 14)
NFL Phases 4–6 + Pinnacle integration (Oct 16)
NCAAF margin model + production automation (Oct 21–23)

Nov 14, 2025

Off-season build

NCAAM live·Warehouse layer begins

Nov 14, 2025 · off-season build

NCAAM joins. Warehouse infrastructure starts.

From two sports to three, plus the data layer that everything else would depend on.

NCAAM model joins the production lineup mid-November. Behind the scenes, the warehouse schema and pick-storage layer get built (Nov 23: "Data quality improvements and warehouse infrastructure"). This is the layer the public Track Record would later be built on top of.

Dec 29, 2025

Multi-sport prep

NHL + NBA + MLB·Scaffolded

Dec 29, 2025 · multi-sport prep

NHL, NBA, MLB scripts staged for production

Off-season expansion. Six sports in the pipeline by year-end.

All three remaining major sports get their scripts and pipelines built in a single commit on Dec 29, 2025 — the foundation for the daily multi-sport coverage that would launch in January. CLV tracking, autonomous reliability infrastructure, and bottleneck fixes round out the December push.

Jan 15, 2026

Major model upgrade

NCAAM·XGBoost upgrade

Jan 15, 2026 · major model upgrade

NCAAM XGBoost replaces Ridge regression

First major model architecture change. Haslametrics joins the input mix.

The NCAAM model swaps Ridge regression for XGBoost — a structural improvement, not a tuning change. Haslametrics ratings are blended at 0.5 weight. The result: 393/718 ATS (54.7%) and 406/538 O/U (75.5%, +48.5% ROI) across the full season.

Read what changed ▾

Why XGBoost

Ridge regression couldn't capture nonlinearity in conference-level matchups. XGBoost handles feature interactions natively. The Haslametrics blend at 0.5 weight imports external ratings as a regularization signal against the model's own estimates.

2025-26 NCAAM season result

ATS

54.7%

ROI ATS

+4.9%

O/U

75.5%

ROI O/U

+48.5%

Jan 18, 2026

Self-correcting layer

Edge calibration·Self-correcting layer

Jan 18, 2026 · self-correcting layer

NCAAM edge calibration system built

Raw model edges shrunk and capped against 808 games of historical performance.

First systematic answer to "do our claimed edges actually predict our actual win rate?" Raw edges of +7 only hit 46.6% historically, so the calibration system reduces them — preventing overconfident betting on edges the data doesn't support. Refined Jan 20–22 (home court bias disabled, ATS calibration improved, simplified v2 added).

Jan 19, 2026

First full-season result

NCAAF·2025 season closes

Jan 19, 2026 · first full-season result

NCAAF 2025: 62.0% ATS, +20.2% ROI over 703 picks

National Championship pick: Indiana −7.5 over Miami. Hit. The system had earned the right to be tested in public.

The NCAAF model finished its first full season as the best-performing system in the lineup. 436 ATS wins on 703 picks, +20.2% ROI. The O/U side hit 106.7%. The national championship prediction (Indiana −7.5, UNDER 48.5) was the model's final pick of the season — and it landed.

The 2025 season in numbers ▾

Full season performance

ATS Record

436-267

ATS Win %

62.0%

ROI

+20.2%

Graded

777

What it proved

That the architecture (XGBoost on SP+ ratings, Elo, tempo features, conference-aware) could sustain edge across a 5-month season including bowl variance.

Jan 23, 2026

Different sport, same rigor

UFC·Built, not yet published

Jan 23, 2026 · different sport, same rigor

UFC prediction system built

First non-team-sport architecture: fighter-level predictions, consensus odds, post-event stat updates. Infrastructure complete; not in the public Best Bets lineup as of May 25, 2026.

A structural departure from the team-sport models. UFC predictions work at the fighter level, pull from multiple odds sources to build consensus prices (Phase 4 deploy on Jan 27), and run a post-event update job to ingest fighter stats after each card. The infrastructure shipped; the picks themselves have not been published publicly yet. Why it's on the timeline anyway: it proved the architecture transferred to a fundamentally different sport.

January 2026 UFC milestones ▾

What shipped (Jan 23–27)

Critical prediction system bug fixes + initial documentation
Added to unified dashboard
Database ingestion with predicted home/away score fields
Calibration, expanded data sources, test coverage
Post-event fighter stats update script
Phase 4: Multiple odds sources with consensus calculation

Why it's on the timeline

Proves the architecture wasn't team-sport-specific. The same discipline — predict before the event, grade against actual results, track calibration — extended cleanly to a fundamentally different sport.

Jan 25, 2026

Daily production

NHL·● Live daily

Jan 25, 2026 · daily production

NHL goes daily — 3× per day predictions

Margin MAE 1.96 goals. Starting goalie tracking online.

NHL active in production with three daily prediction cycles. The model's margin error is well-calibrated for hockey's tight scoring (MAE 1.96 goals on a typical 5–6 goal total). Starting goalie tracking added on the same date — a critical feature for hockey since goalies swing win probabilities more than any other position.

Jan 27, 2026

Daily production

NBA·● Live daily

Jan 27, 2026 · daily production

NBA goes daily — 3× per day predictions

Win accuracy 66.8%. Margin MAE 11.1 pts.

NBA joins NHL in daily production one day later. Three prediction cycles per day (morning, mid-afternoon, evening). Win-prediction accuracy of 66.8% on the training set.

Feb 11, 2026

The public pivot

Best Bets·Public tracking begins

Feb 11, 2026 · the public pivot

Best Bets tab launches — every pick now graded publicly

The day the system stopped being a personal tool and became a public track record.

"Today's Best Bets" launches as a tab with performance tracking. From this day forward, every pick is published before games start and graded against actual results. This is the dividing line between "personal modeling project" and "public, auditable system" — and where today's all-time numbers begin counting.

Why this date matters ▾

What changed structurally

Pick storage in PostgreSQL warehouse — every prediction timestamped before games lock
Grading pipeline — automatic comparison to actual scores
Performance tracker UI — visible record, not just a personal spreadsheet

The integrity claim begins here

"Published before games start, graded against actual results" — that's the spine of the trust claim. It became enforceable on Feb 11, 2026, when the warehouse started recording picks with timestamps.

Feb 14, 2026

A bad day, documented

Incident·Calibration bypassed in error

Feb 14, 2026 · a bad day, documented

Calibration system destruction — 4 hours, 3 losing bets, reverted

An AI assistant assumed the calibration was a bug and forced raw edges. It wasn't a bug. The reverted-same-day version of how trust gets built.

An AI assistant noticed the database's edges didn't match raw formula output and "fixed" them by forcing recalculation across the board. The mismatch wasn't a bug — it was the calibration system from Jan 18 doing exactly what it was designed to do, shrinking raw edges against 808 games of historical reality. By 7:15 PM the user had bet on three uncalibrated picks. By 8:30 the day was 1-3 against the model's normal 4-0. By 10:30 the changes were reverted and the incident written up.

The full timeline ▾

What went wrong

6:22 PM: Assistant flagged "wrong" edges in the database. Assumed bug.
6:30 PM: Force-recalculated 132 ATS edges. Didn't investigate the discrepancy first.
6:45 PM: Same for O/U edges — Pacific OVER edge went from calibrated +6.7 to forced +24.8.
7:00 PM: Modified sync_odds_to_predictions.py to always recalculate. Committed as a "fix."
7:15 PM: User bet on 3 picks the calibration would have either dropped (Pacific +7 → PASS) or sized differently.
8:30 PM: 1-3 day. User: "This is exactly what i was worried about happening and we let it happen."
9:00 PM: User asked to see what the old system would have recommended. The calibration system was discovered, still on disk, still working — just bypassed.
10:30 PM: Changes reverted (commit ac5dd40). Incident report written. Estimated avoidable loss: ~$93.

What it actually proved

The calibration discipline works — and is dangerous to bypass without understanding it. The same discipline, applied to MLB ML three months later, caught the 10× formula error. The lesson got carved into CLAUDE.md in a section titled NEVER BYPASS.

Why this is on the timeline

Most picks sites would delete this and never speak of it. Publishing it is the trust claim made real. Every entry on this page passes the same test — the calibration corrections, the v2 bugs, the conviction-era losses. If it happened, it's here.

Mar 6, 2026

Engine v1.1

Wave 1 Recalibration·engine.py v1.1

Mar 6, 2026 · engine v1.1

Wave 1 — tiers removed, edge floors set

First major engine cut based on 1,017 picks of evidence.

After the freeze period (Feb 17 – Mar 5) graded 1,017 picks, Wave 1 acted on the findings. Removed LEAN + HIDDEN_GEMS tiers (both harmful). Set edge floor at 5.0 and conviction floor at 0.55 for NCAAM/NBA (NHL exempt). Introduced league-specific thresholds (NHL TOP_PLAY 0.45, NBA ATS TOP_PLAY 0.70). Added ENGINE_VERSION tracking — every pick now records which engine version emitted it.

Mar 12, 2026

The $1k test begins

Bankroll System·Live

Mar 12, 2026 · the $1k test begins

The $1,000 bankroll system goes live

"If I took $1,000 of real money and only bet what the model wanted to bet, would it work?"

The first version of the bankroll system. A tier-driven formula (TOP_PLAY ATS at $50, SOLID NHL at $25, etc.) decided what was bankroll-worthy. A daily cron logged each evening's close into a permanent ledger. Picks had been published since Feb 11; this is when the system started testing them with stakes.

Mar 21, 2026

Tournament-aware

March Madness 2026·Tournament-aware engine

Mar 21, 2026 · tournament-aware

Tournament awareness shipped to NCAAM engine

Cross-conference matchups were tanking ATS — the engine had no idea. Now it does.

After 9 days into the bracket showed NCAA Tournament ATS TOP_PLAYs at 22% (2-7), the model got tournament-aware. Caution notes on cross-conference NCAA Tournament ATS picks. Value flag on tournament O/U at 5+ edge (the historical sweet spot — 66.7% hit). Tournament games now visually distinct from regular-season picks.

~Mar 25, 2026

Silent until May 24

Data gap · Daily ledger stopped writing Discovered May 24, 2026

~Mar 25, 2026 · silent until May 24

The daily ledger went silent

We didn't notice for two months. The audit for this page is what found it.

Around March 25, 2026, we replaced the daily bankroll email with a consolidated "Best Bets Daily Digest." The old email script had a quiet side-effect: every morning, it wrote one row to bankroll_daily_log summarizing the previous day's bankroll close. The new digest doesn't write that row.

Picks kept getting published and graded the whole time. Only the once-a-day "this is where the bankroll closed" record stopped. It surfaced two months later during the audit that built this page — when we tried to verify what the bankroll did on May 5 and found no row.

What this means + what we're doing about it ▾

What we still have

Every individual pick from Mar 25 onward — all stored, all graded
13 days of frozen daily ledger from before the gap (Mar 12 – Mar 24)
Picks can be re-aggregated under any routing rules at any time

What we lost

The frozen "this is what the bankroll showed at end-of-day" record for every day Mar 25 onward
So daily totals on this site are now recomputed live from the picks each page load — under whatever routing logic is current
Same picks, slightly different bet sizing applied retroactively when our routing rules change. Today's numbers are honest; the history can shift slightly

What we're doing about it

The "How we measure" section at the bottom of this page explains the gap to visitors
The snapshot architecture in Phase 2 will store routing decisions per-pick — so the next gap can't happen silently

Apr 8, 2026

Era retires

Conviction Era Feb 11 – Apr 8, 2026 Prototype

Feb 11 – Apr 8, 2026 · prototype, retired

The Conviction Era — the prototype

The original methodology. Where the system learned what to track.

A single conviction score drove both selection and sizing. The bankroll system itself didn't exist until March 12 — it was built to answer one question: if we staked $1,000 on only the picks the model most believed in, would it work? Across all 1,594 picks the model hit 52.9%; the bankroll-tracked tail (Mar 12 – Apr 8) went 99-103-3 (49.0%) — the losing stretch that triggered the routing v1 cut.

Record Fact

834-744-16P

Picks

1,594

Win %

52.9%

Bankroll tail

−$937

Bankroll tracking began Mar 12, partway through the era. The −$937 covers the tracked tail only: $1,000 → $62.70, Mar 12 – Apr 8.

Apr 9, 2026

First version cut

Routing v1 Apr 9 – May 4, 2026 Retired

Apr 9 – May 4, 2026 · retired

Routing v1 — the split

Selection logic separated from sizing. The first true version cut.

The engine proposed picks; a new routing layer decided what to publish and at what size. The model picked well across the stretch — 379-300-13P, 55.8% — but the bankroll lost ground, and that gap between good picks and a shrinking balance is what drove the move to v2. The split was structural groundwork, not a profit story yet.

Record Fact

379-300-13P

Win %

55.8%

Window

26 days

Bankroll P/L

−$458^*

^* Under v1's own live-then sizing. Different sizing rules put this between roughly −$460 and −$270; all show a loss. 692 graded picks. See "How we measure."

Why it was retired ▾

NBA O/U at 34.8% on bankroll-routed picks. The 5.0-point edge floor wasn't protecting — picks clearing the floor were still losing badly. The routing config had structural issues that v2 would address.

May 5, 2026

Retired in 2 days

Routing v2 May 5 – May 6, 2026 Retired · 2 days

May 5 – May 6, 2026 · retired in 2 days

Routing v2 — caught and corrected in 48 hours

A day-one audit found three structural bugs. All fixed by v2.1.

Removed NBA from the bankroll, added MLB at test sizing. A same-day audit of v2's results flagged three structural issues — edge-sign discarded, NHL oversized, SQL gaps across endpoints. Rather than wait out the variance, it was cut and corrected within 48 hours. Retired for how it was built, not for a two-day score.

Record Fact

31-31-4P

Picks

66

Window

2 days

Why retired

Structural,
not results

Two days is too short for a meaningful P/L. The record is exact; the reason for retirement was structural — three bugs caught in a day-one audit and fixed in v2.1.

What went wrong ▾

The three v2 bugs

Edge sign discarded. Routing used abs(edge), so a pick with negative model edge was treated identically to one with positive edge.
NHL ATS at 0.75u was oversized. A 44-pick sample basis was too small; one bad day (Min Wild −$75 on May 5) erased a full week of MLB grind.
SQL omissions on 5 consumer endpoints. The routing-classification SQL didn't pass needed fields, so +1.5 dog ATS picks silently fell through to BET.

Variance band — documented before deploy

v2 expected variance (from routing-versioning.md)

At $343/day exposure, 7-day 5th-percentile worst case was −$225, 14-day was −$190. Bankroll dropping into the $800s was within modeled variance, not system failure.

May 7 – now

Live now

Routing v2.1 May 7 – present Live now

May 7 – present · live now

Routing v2.1 — corrections

A correction cut, not a rewrite. The version generating today's picks.

v2.1 corrected each of v2's three structural issues. Because it's the version running right now, its numbers are measured live and match the Bankroll page exactly — same computation, no reconstruction.

Record

93-73-7P

Win %

56.0%

ROI

+13.4%

Live Bankroll

$1,269

Net +$269.36 across the combined v2 + v2.1 window (May 5 onward), measured live. Matches the Bankroll page — same live computation.

What got fixed + counterfactual ▾

What got fixed

Edge sign respected (no more abs(edge))
NHL ATS sized 0.75u → 0.5u
SQL fixes verified across every consumer endpoint — not just the originally-reported one. Multi-endpoint regression test added.
Python service restart procedure documented in CLAUDE.md (deploy-day lesson)

Counterfactual — what if v2.1 had been live during v2's days?

May 5 production data

Live v2 emitted 21 BET picks → −$41.36 P/L. v2.1's logic applied to the same slate would have emitted 13 BET picks → +$16.20 P/L. Delta: +$57.56 over a single day.

Variance band reminder

The current run is going the right direction — but a 14-day 5th-percentile drawdown of −$190 remains within modeled expectation. One bad two-week stretch doesn't break the routing thesis.

May 15, 2026

Math correction

Calibration·MLB Moneyline

May 15, 2026 · math correction

MLB Moneyline calibration corrected

A 10× math error in the win-probability formula. Diagnosed, fixed, and published the same day.

Not a routing cut. A model correction. The formula converting predicted margin into win probability was overconfident by approximately 10× on MLB moneyline picks — reporting 75% confidence on picks that historically won 53% of the time. Caught by routine calibration audit, refit against 187 historical picks, deployed and documented the same day.

The math + impact ▾

What & impact

What: 10× overconfidence in MLB ML win-probability formula
How we caught it: calibration audit comparing predicted probabilities to actual win rates
Fix: refit logistic scale parameter against 187 historical Routing Era picks (Apr 9 – May 14)
Calibration error: dropped from 0.27 to 0.03 (≈10× improvement)
Visible impact: MLB ML Value picks dropped ~80% in daily volume (5–8/day → 1–5/day)
NHL / NBA: same formula, samples too small (n=23, n=4) — will refit at 75 picks each

Post-fix performance (7-day window)

May 16–21 MLB ML BET picks

15 graded picks, 10-5 (66.7%). Edges range 2–17 percentage points, down from 41–65 pre-fix. Volume in expected range. The fix is doing exactly what it was supposed to do.

Phase 2

Next · gates v3

Phase 2·Up next

Phase 2 · up next

Routing snapshot architecture

Ships before any further version cut. Non-negotiable.

Today, historical pages can re-route dynamically — meaning an old page might show a routing decision that isn't what was actually published when those picks were live. That contradicts "nothing retroactive." Snapshot freezes each day's routing as it was actually shown. Ships before v3.

Phase 2

Research · likely v3

Phase 2·Research

Phase 2 · research

Conviction formula fix → likely v3

The Conviction-Era bug, finally addressed. Research-shaped — possibly weeks.

The conviction-formula bug carried since the Conviction Era. Research-shaped: analysis-heavy, possibly weeks. Likely produces the v3 routing cut. Sequenced after snapshot so v3 lands on a non-rewriting architecture. Also gates NBA reintroduction.

Phase 3+

Deferred

Phase 3+·Deferred

Phase 3+ · deferred

Growth & monetization

Deliberately last. The model earns the audience before it asks for anything.

Live pricing, line-movement tracking, parlay constructor, distribution (Twitter, newsletter, Discord). Monetization sits here too — and waits on legal. The ordering is the point: the public track record has to be undeniable before any of this gets built on top of it.

How the system evolved.

The project starts

The first commits (Sep 25, 2025)

Why this is on the timeline

NFL & NCAAF go live in production

October 2025 milestones

NCAAM joins. Warehouse infrastructure starts.

NHL, NBA, MLB scripts staged for production

NCAAM XGBoost replaces Ridge regression

Why XGBoost

2025-26 NCAAM season result

NCAAM edge calibration system built

NCAAF 2025: 62.0% ATS, +20.2% ROI over 703 picks

Full season performance

What it proved

UFC prediction system built

What shipped (Jan 23–27)

Why it's on the timeline

NHL goes daily — 3× per day predictions

NBA goes daily — 3× per day predictions

Best Bets tab launches — every pick now graded publicly

What changed structurally

The integrity claim begins here

Calibration system destruction — 4 hours, 3 losing bets, reverted

What went wrong

What it actually proved

Why this is on the timeline

Wave 1 — tiers removed, edge floors set

The $1,000 bankroll system goes live

Tournament awareness shipped to NCAAM engine

The daily ledger went silent

What we still have

What we lost

What we're doing about it

The Conviction Era — the prototype

Routing v1 — the split

Routing v2 — caught and corrected in 48 hours

The three v2 bugs

Variance band — documented before deploy

Routing v2.1 — corrections

What got fixed

Counterfactual — what if v2.1 had been live during v2's days?

Variance band reminder

MLB Moneyline calibration corrected

What & impact

Post-fix performance (7-day window)

Routing snapshot architecture

Conviction formula fix → likely v3

Growth & monetization

Every version ran for real.