Skip to content

rbpilgrim/cricket_variance_simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Does Batting Variance Matter in Test Cricket?

A Monte Carlo simulation study exploring whether "boom-or-bust" or consistent batsmen are more valuable in Test cricket.

Research Question

If two batsmen have the same batting average, but one is high-variance (more ducks AND more centuries) and one is consistent (steady scores), which is more valuable to a Test cricket team?

Key Findings

Strategy Win Rate Draw Rate Loss Rate
All Consistent 53.8% ~8% 38.2%
High Variance Openers Only ~52% ~8% ~40%
All High Variance 44.8% ~7% 48.2%

Bottom line: Consistency wins. High-variance batting creates collapse risk that isn't compensated by occasional big scores.


How the Simulation Works

Ball-by-Ball Engine

Every delivery is simulated individually. The base scoring distribution (before normalization):

Outcome Base Weight
Dot ball 0.650
Single 0.220
Two 0.070
Four 0.040
Six 0.020
Wicket Dynamic

The wicket probability changes based on balls faced - this is the core innovation.

After calculating the dynamic wicket probability, all outcomes are normalized to sum to 100%. This means when wicket probability is higher (early in an innings for high-variance batsmen), scoring shot probabilities are slightly reduced proportionally.

The Variance Model

Both batsman types produce identical expected runs per dismissal (~38 runs for openers/middle order):

LOW VARIANCE (Consistent):
  p(wicket) = 1.68% constant per ball

  Result: Steady scores, few ducks, few centuries

HIGH VARIANCE (Boom-or-Bust):
  p(wicket) = 0.75% + (3.4% - 0.75%) * e^(-0.025 * balls_faced)

  At ball 0:   3.4% (very risky - often out early)
  At ball 50:  1.8% (average risk)
  At ball 100: 1.0% (safe when "set")

  Result: More ducks AND more centuries

Both are calibrated using survival analysis to have the same expected batting average.

Example normalization:

  • High-variance batsman at ball 0: wicket = 3.4%, so scoring outcomes scaled to 96.6%
  • High-variance batsman at ball 100: wicket = 1.0%, so scoring outcomes scaled to 99.0%
  • Low-variance batsman (any ball): wicket = 1.68%, so scoring outcomes scaled to 98.3%

Match Structure

A full Test match simulation includes:

  1. Four innings - Team 1 bats, Team 2 bats, Team 1 bats again, Team 2 chases

  2. Weather variation - 60% good (400-450 overs), 25% moderate (320-400), 15% poor (250-320)

  3. Pitch deterioration - Later innings are harder to bat:

    Innings Wicket Multiplier Effect
    1st 1.00x Fresh pitch
    2nd 1.10x Slightly worn
    3rd 1.22x Worn
    4th 1.45x Difficult chase
  4. Declarations - Teams can declare when:

    • 1st innings: 350+ runs with 6+ wickets down, or 450+ runs
    • 3rd innings: Lead of 250+ with 120+ overs remaining

How Matches End

Outcome Condition
Team 1 wins Team 2 all out in 4th innings before reaching target
Team 2 wins Team 2 reaches target in 4th innings
Draw Time runs out (max overs reached) before result

Draw rate is calibrated to ~8-10%, matching modern Test cricket.


Team Composition

Each team has 11 players with different base averages:

Position Players Expected Average Role
Openers 2 ~38 runs Face new ball
Middle Order 4 ~38 runs Main run scorers
All-rounders 2 ~25 runs Balance bat/bowl
Tail 3 ~13 runs Primarily bowlers

Team Strategies Tested

We tested 25 different variance allocation strategies:

Uniform strategies:

  • All Very Low - Everyone consistent (decay=0.0)
  • All High - Everyone boom-or-bust (decay=0.85)

Opener-focused:

  • Both Openers High - Aggressive openers, consistent rest
  • Aggro + Anchor Open - One aggressive, one steady opener

Position-based:

  • Solid Middle - Consistent middle order, aggressive ends
  • Swinging Tail - Aggressive tail-enders
  • Top Heavy Gradient - Variance decreases down the order

Mixed:

  • Alternating Pair - Alternate high/low through lineup
  • Explosive Start - Top 3 aggressive, rest consistent

Experimental Results

Round-Robin Tournament (25 strategies, 50 matches each)

Rank Strategy Win Rate Key Insight
1 All Very Low 51.7% Consistency is optimal
2 Explosive Start 50.5% Top 3 aggressive works
3 Mixed Middle 50.3% Alternating in middle
... ... ... ...
21 All High 45.0% Too much variance hurts
25 Both Openers Low 43.4% Worst strategy

Head-to-Head Statistical Tests (300 matches each)

Matchup Result Z-score Significant?
All Very Low vs Both Openers High 142-143 0.12 NO (tied)
All Very Low vs Explosive Start 161-120 4.73 YES
All Very Low vs All High 157-132 2.89 YES
Both Openers High vs All High 154-125 3.35 YES

Collapse Analysis

High variance teams experience significantly more collapses:

Team Type Collapse Rate (<150 all out)
All Low Variance ~7%
All High Variance ~18%

The centuries from high-variance batsmen don't compensate for the collapses.


Why Does Consistency Win?

  1. Collapse Risk: High variance means more early dismissals. When multiple batsmen fail early in the same innings, you get a collapse that's very hard to recover from.

  2. Test Cricket is Long: Unlike T20/ODI, there's no "required rate" pressure in most situations. Steady accumulation works.

  3. Compounding Effect: One collapse in a 4-innings match can lose you the game, even if other innings went well.

  4. "Getting Set" Comes Too Late: By the time a high-variance batsman becomes safe (50+ balls), a consistent batsman has scored similar runs with less risk.

But High Variance Openers Are "Free"

Interestingly, having just the two openers be high-variance (with a consistent middle/lower order) performs identically to all-consistent teams. This is because:

  • Openers face the new ball anyway (inherently risky)
  • If they survive, they score big; if not, consistent middle order rescues
  • The consistent tail prevents collapses

Quick Start

# Clone the repository
git clone https://github.com/rbpilgrim/cricket_variance_simulation.git
cd cricket_variance_simulation

# Run the main experiment
python team_composition_experiment.py

# Run head-to-head analysis
python analyze_matchups.py

# Or open the Jupyter notebook
jupyter notebook cricket_variance_analysis.ipynb

Open In Colab


Project Structure

File Description
ball_by_ball_simulation.py Core simulation engine - ball outcomes, innings, matches
team_composition_experiment.py 25-strategy round-robin tournament
analyze_matchups.py Head-to-head statistical analysis
analyze_match_details.py Detailed match statistics and collapse analysis
variance_exchange_rate.py Quantifies variance value in runs
cricket_variance_analysis.ipynb Self-contained Colab notebook
RESEARCH_NOTE.md Full research writeup with methodology

Limitations

  • Simplified bowling: All bowlers treated equally (no swing, spin, pace variation)
  • No player matchups: Real cricket has bowler-batsman specific interactions
  • Fixed conditions: Same pitch model for all matches (no spinning tracks, green seamers)
  • Basic declarations: Real captains use more nuanced decision-making
  • No follow-on: The follow-on rule is not implemented

Future Work

  • Model different pitch types (spinning, seaming, flat)
  • Add bowler skill variation
  • Implement follow-on rule
  • Model batting partnerships (not just individuals)
  • Analyze specific match situations (chasing 300+ in 4th innings)

License

MIT


Built with Claude Code

About

Monte Carlo simulation: Does batting variance matter in Test cricket?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors