Skip to content

Vela-Engineering/VCBench-Starter-Kit

Repository files navigation

VCBench-Starter-Kit

A minimal toolkit for evaluating LLMs on venture capital founder success prediction using the VCBench dataset.

Objective of the Task

Participants will develop ML/LLM-based systems to predict founder success in venture capital. The task requires analyzing anonymized founder profiles and determining whether their startup will achieve major success.

Success is defined as the founder's most recent startup achieving an IPO/acquisition above $500M or raising $500M+. Startups that raised $100K–$4M at founding but failed to reach a major outcome in eight years are marked unsuccessful.

Participants must design effective prompts and utilize the anonymized prose format to make binary predictions ("Yes"/"No"), optimizing for F0.5 primarily and precision secondarily.

Dataset

VCBench Dataset Paper: https://arxiv.org/abs/2509.14448

9,000 anonymized founder–startup profiles with a 9% success rate. Profiles include founder background (education, jobs, prior IPO/acquisitions) and startup details (industry, outcomes). Data was collected from LinkedIn and Crunchbase, restricted to only information available prior to the founding of the company, simulating real-world early-stage prediction.

Training Dataset Public training set: 4,500 founders (available in this repository as vcbench_final_public.csv)

Testing Dataset Private test set: 4,500 founders (3 folds) - held by Vela Research for leaderboard evaluation

Data Format

  • Anonymized prose (natural-language summaries optimized for direct LLM use)
  • Structured JSON (for feature-based ML models)

Evaluation Metrics: precision, recall, and F0.5, averaged across folds. Private labels are never released to prevent data leakage into LLM training corpora. The primary evaluation metric adopted is F0.5; precision is treated as the secondary metric.

Additional Requirements

  • To protect the private test set, internet access is prohibited for all models except for calling LLM APIs.

What Participants Should Submit

  • Complete code for running LLM predictions
  • Prediction results on the training set demonstrating model performance
  • An evaluation script that processes results and computes metrics
  • Clear documentation of prompt engineering and methodology

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Set up your OpenAI API key:

    cp .env.sample .env
    # Edit .env and add your OPENAI_API_KEY
  3. Run the sample test:

    python openai_testing_sample.py

Key Files

Core Implementation

  • openai_testing_sample.py - Main script that runs LLM predictions on founder data using multiprocessing
  • evaluation.py - Evaluates model predictions and calculates metrics (precision, accuracy, recall, F0.5)

Data

  • vcbench_final_public_sample100.csv - Sample dataset (100 founders) for testing
  • vcbench_final_public.csv - Full VCBench dataset

LLM Integration

  • llms/ - LLM provider implementations
    • llms/__init__.py - Main interface with get_llm_provider() function
    • llms/openai/_openai.py - Minimal OpenAI provider implementation

Configuration

  • core/config.py - Settings management using pydantic-settings
  • .env - Environment variables (API keys, etc.)
  • requirements.txt - Python dependencies

Output

  • vanilla_llm_testing_results/ - Generated prediction results (gitignored)

Usage

The toolkit processes founder descriptions and predicts success using the prompt:

  • Success definition: Companies with >$500M funding or exit/IPO >$500M
  • Input: Anonymized founder LinkedIn/Crunchbase profiles
  • Output: JSON with prediction ("Yes"/"No") and reasoning

Results are saved as CSV files with predictions that can be evaluated using evaluation.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages