Welcome to the SQL Query Writer Agent Competition Organized by MindBridge AI! This competition challenges you to build an intelligent agent that can translate natural language questions into SQL queries for a relational database.
- Competition Overview
- Objectives
- LLM Access Options
- Getting Started
- Understanding Python Environments
- Project Structure
- Submission Requirements
- How Your Code Will Be Evaluated
- Database Schema
- Tips and Resources
- FAQ
Your task is to create an AI-powered agent that can:
- Accept natural language questions from users (e.g., "What are the top 5 best-selling products?")
- Understand the database schema and structure
- Generate valid SQL queries that answer the user's question
For this competition, you must use open source LLMs through Ollama. You have two options for accessing these models:
- Carleton University LLM Server (Recommended) - Access powerful models without local hardware requirements
- Local Ollama Installation - Run models on your own machine
See the LLM Access Options section for detailed setup instructions.
Build a SQL Query Writer Agent that accurately translates natural language questions into executable SQL queries for the provided bike store database.
Your submission will be evaluated on:
- Accuracy: Does the generated SQL correctly answer the user's question?
- Robustness: Can your agent handle various types of questions (aggregations, joins, filters, etc.)?
- Error Handling: Does your agent gracefully handle invalid or ambiguous questions?
- Code Quality: Is your code well-organized, documented, and maintainable?
- Innovation: Creative approaches and novel solutions are encouraged!
You have two options for accessing open source LLMs via Ollama. Choose the one that works best for you.
Carleton's Research Computing Services provides access to a powerful LLM server with multiple open source models. This is the recommended option as it doesn't require local GPU resources.
- Go to https://carleton.ca/rcs/llm-access/
- Fill out the access request form with:
- Your Carleton University email (required)
- List of models you need
- You will receive API endpoint details and authentication credentials
Once you have access, configure your environment:
# Set environment variables (add to .env file)
OLLAMA_HOST=<provided-server-url>
# Additional auth variables as provided by RCSThe Carleton server uses Ollama-compatible REST API endpoints, so you can use the standard ollama Python package:
import ollama
# Configure client to use Carleton server
client = ollama.Client(host='<carleton-server-url>')
# Generate a response
response = client.chat(
model='llama3.3',
messages=[{'role': 'user', 'content': 'Write a SQL query to count all customers'}]
)
print(response['message']['content'])If you prefer to run models locally or want to develop offline, you can install Ollama on your own machine.
- macOS: Apple Silicon (M1/M2/M3) recommended, Intel supported
- Linux: NVIDIA GPU recommended for larger models
- Windows: WSL2 with NVIDIA GPU support
macOS/Linux:
curl -fsSL https://ollama.com/install.sh | shWindows: Download from https://ollama.com/download
After installation, pull the models you want to use:
# Pull a model (downloads it locally)
ollama pull llama3.2
# List available models
ollama list
# Run a model interactively (to test)
ollama run llama3.2Note: You are encouraged to explore different open source models available on Ollama's model library. Different models have different strengths - experiment to find what works best for SQL generation tasks!
Before starting, ensure you have:
- Python installed on your system (we recommend Python 3.11+)
- Git for version control
- A Kaggle account (for downloading the dataset via kagglehub)
- LLM Access - Either:
- Access to Carleton's LLM server (request at carleton.ca/rcs/llm-access), OR
- Local Ollama installation (see LLM Access Options)
-
Clone the repository to your local machine:
git clone https://github.com/araskay/carleton_competition.git cd carleton_competition -
Create a virtual environment (see Understanding Python Environments below)
-
Install dependencies:
pip install -r requirements.txt
-
Set up Kaggle credentials (required for downloading the dataset):
- Go to Kaggle and sign in
- Navigate to your account settings
- Create a new API token (this downloads
kaggle.json) - Place
kaggle.jsonin~/.kaggle/(Linux/Mac) orC:\Users\<username>\.kaggle\(Windows)
-
Run the template to verify everything works:
python main.py
If you're new to Python development, this section explains virtual environments and package management.
A virtual environment is an isolated Python installation that keeps your project's dependencies separate from other projects. This prevents conflicts between different projects that might need different versions of the same package.
- Reproducibility: Others can recreate your exact environment
- Isolation: Project dependencies don't conflict with each other
- Clean Development: Easy to start fresh if something goes wrong
There are several tools for managing Python environments. Here are the most common:
# Create a virtual environment
python -m venv myenv
# Activate it (Linux/Mac)
source myenv/bin/activate
# Activate it (Windows)
myenv\Scripts\activate
# Install packages
pip install -r requirements.txt
# Deactivate when done
deactivate# Create environment with specific Python version
conda create -n myenv python=3.11
# Activate it
conda activate myenv
# Install packages
pip install -r requirements.txt
# Deactivate when done
conda deactivate# Install specific Python version
pyenv install 3.11.9
# Create virtual environment
pyenv virtualenv 3.11.9 myenv
# Activate it
pyenv activate myenv
# Install packages
pip install -r requirements.txtThe requirements.txt file lists all Python packages your project needs. Each line specifies a package and its version:
package_name==1.2.3
The == ensures the exact version is installed, which is crucial for reproducibility.
-
Install the package in your virtual environment:
pip install package_name
-
Check the installed version:
pip show package_name
-
Add to requirements.txt with the exact version:
package_name==X.Y.Z
You can generate a requirements.txt from your current environment:
pip freeze > requirements.txtWarning: This includes ALL packages in your environment. It's often better to manually list only the packages you directly use, as indirect dependencies will be installed automatically.
The runtime.txt file specifies the exact Python version your project requires. The format is:
python-X.Y.Z
For example:
python-3.11.9
This ensures evaluators use the same Python version you developed with.
Your submission should follow this structure:
your_submission/
├── runtime.txt # Python version (REQUIRED)
├── requirements.txt # Package dependencies (REQUIRED)
├── agent.py # Your QueryWriter implementation (REQUIRED)
├── main.py # Interactive testing interface (PROVIDED)
├── db/
│ ├── __init__.py
│ └── bike_store.py # Database loader (PROVIDED)
└── src/ # Additional source code (optional)
├── __init__.py
└── ...
| File | Description |
|---|---|
runtime.txt |
Contains the Python version (e.g., python-3.11.9) |
requirements.txt |
Lists all package dependencies with exact versions |
agent.py |
Contains your QueryWriter class implementation |
| File | Description |
|---|---|
main.py |
Interactive interface for testing your agent |
db/bike_store.py |
Downloads and loads the bike store dataset into DuckDB |
Create a runtime.txt file with your exact Python version:
python-3.11.9
To find your Python version:
python --version
# Output: Python 3.11.9List ALL packages your project needs with pinned versions:
# Core dependencies (already included)
duckdb==1.1.3
sqlalchemy==2.0.36
kagglehub==0.3.4
ollama==0.4.7
# Add your additional dependencies below, for example:
# langchain==0.3.7
# langchain-ollama==0.2.3
Important:
- Use
==for exact versions, not>=or~= - Include ALL packages you import (directly or indirectly)
- Test your requirements.txt in a fresh environment before submitting
Your agent.py must contain a QueryWriter class with the following interface:
class QueryWriter:
def __init__(self, db_path: str = 'bike_store.db'):
"""
Initialize the QueryWriter.
Args:
db_path (str): Path to the DuckDB database file.
"""
# Your initialization code here
pass
def generate_query(self, prompt: str) -> str:
"""
Generate a SQL query from a natural language prompt.
Args:
prompt (str): The natural language question.
Example: "What are the top 5 most expensive products?"
Returns:
str: A valid SQL query that answers the question.
Example: "SELECT product_name, list_price FROM products ORDER BY list_price DESC LIMIT 5"
"""
# Your implementation here
passImportant Requirements:
- The
generate_querymethod must accept a natural language prompt and return a SQL query string - Return ONLY the SQL query, no explanations or markdown formatting
- The returned query must be valid SQL that can be executed against the bike store database
- Keep your code clean and well-documented
- Use meaningful variable and function names
- Handle errors gracefully
- You may add additional helper methods or classes to
agent.py - You may create additional files in a
src/directory if needed
Your submission will be set up using the following process:
# Read Python version from runtime.txt
PYTHON_VERSION=$(cut -d- -f2 runtime.txt)
# Create virtual environment (using pyenv)
pyenv install -s $PYTHON_VERSION
pyenv virtualenv $PYTHON_VERSION submission-env
pyenv activate submission-env
# Install dependencies
pip install -r requirements.txtYour QueryWriter class will be evaluated against a hidden evaluation dataset containing natural language prompts and expected SQL queries. The evaluation will:
- Import your
QueryWriterclass fromagent.py - Initialize it with the bike store database
- Call
generate_query()with various natural language prompts - Compare the results of executing your generated queries against expected results
# Example of how your agent will be evaluated
from agent import QueryWriter
agent = QueryWriter(db_path='bike_store.db')
# For each prompt in the evaluation dataset:
generated_sql = agent.generate_query("What are the top 5 most expensive products?")
# Execute and compare results...Note: The evaluation dataset is not available to students. Design your agent to handle a variety of question types.
-
Test in a clean environment: Before submitting, create a fresh virtual environment and verify your code runs with only the packages in
requirements.txt -
Pin all versions: Unpinned dependencies might install different versions, breaking your code
-
Don't rely on global packages: If it's not in
requirements.txt, assume it won't be available -
Test your QueryWriter class: Make sure your
generate_querymethod works correctly with various types of questions -
Use
main.pyfor testing: Runpython main.pyto interactively test your agent during development
The bike store database is sourced from the Bike Store Sample Database on Kaggle. It contains the following tables:
| Table | Description |
|---|---|
brands |
Bicycle brands |
categories |
Product categories |
customers |
Customer information |
order_items |
Individual items in orders |
orders |
Customer orders |
products |
Products available for sale |
staffs |
Store staff members |
stocks |
Inventory levels by store |
stores |
Store locations |
Run the database loader to see the full schema:
python -c "from db.bike_store import BikeStoreDb, get_schema_info; BikeStoreDb(); print(get_schema_info())"Or use the helper function in your code:
from db.bike_store import get_schema_info
schema = get_schema_info()
for table, columns in schema.items():
print(f"\nTable: {table}")
for col in columns:
print(f" - {col['name']}: {col['type']}")- Start Simple: Get a basic prompt working before adding complexity
- Test Incrementally: Test your agent with various types of questions
- Provide Schema Context: LLMs need to know the database structure to write correct SQL
- Handle Errors: Not every question can be answered - handle these gracefully
- Validate SQL: Consider validating generated SQL before execution
- "How many customers are there?"
- "What are the top 5 most expensive products?"
- "Show me all orders from 2018"
- "Which store has the most inventory?"
- "What is the total revenue by brand?"
- "List all staff members and their stores"
Example Implementations:
- Query Writer Agent Example - A reference implementation of a SQL query writer agent
- Agentic Design Patterns - Overview of common agentic design patterns
Documentation:
- DuckDB Documentation
- SQLAlchemy Documentation
- Ollama Documentation
- Ollama Python Library
- LangChain SQL Agent
- LangChain Ollama Integration
- Carleton LLM Access
A: You must use open source models via Ollama. You can access these through the Carleton University LLM server or by running Ollama locally. See LLM Access Options for details.
A: No. This competition requires the use of open source models through Ollama. This ensures a level playing field and gives you experience with self-hosted LLM solutions.
A: Use the Carleton University LLM server! It provides access to large models (up to 235B parameters) without requiring local hardware. Request access at carleton.ca/rcs/llm-access.
A: Use environment variables for the Ollama host and model name. This allows evaluators to easily point your code at their infrastructure. See the "Making Your Code Flexible" section under LLM Access Options.
A: Yes! You can organize your code however you like. Just ensure agent.py contains your QueryWriter class and all imports work correctly.
A: Ensure you have valid Kaggle credentials set up. See the Getting Started section.
A: No. All submissions must use the provided bike store database in DuckDB to ensure fair evaluation.
A: Run python --version in your terminal.
A: Common issues:
- Missing packages in
requirements.txt - Unpinned package versions
- Relying on packages installed globally but not listed
- Hardcoded file paths that only work on your machine
If you have questions about the competition:
- Check this guide and the FAQ first
- Review the provided template code
- Post your question in the Discord channel
- Contact Dr. Aras Kayvan at aras.kayvan@mindbridge.ai
Good luck and happy coding!
Organized by MindBridge AI
This competition is for educational purposes only and is non-commercial in nature.