This guide provides step-by-step instructions to run the Python vs Cython performance benchmark.
- Python 3.8 or higher
- GCC compiler (for building Cython extensions)
- pip package manager
git clone https://github.com/nasirus/bench_python_c.git
cd bench_python_cpip install -r requirements.txtThis will install:
- pandas (DataFrame operations)
- pyarrow (Parquet file I/O)
- numpy (Numerical computations)
- Cython (C extension compilation)
python setup.py build_ext --inplaceThis compiles the Cython code (cython_impl.pyx) into a C extension module that can be imported by Python.
python benchmark.pyThis will:
- Run both Python and Cython implementations 5 times each
- Generate 1 million row DataFrames in each iteration
- Write each DataFrame to a parquet file with snappy compression
- Calculate and display performance statistics
- Clean up generated files automatically
python test_validation.pyThis verifies that:
- Both implementations produce identical DataFrames
- Parquet files are written and read correctly
- Data integrity is maintained
================================================================================
BENCHMARK RESULTS
================================================================================
Pure Python Implementation:
Average Generation Time: 0.9574s
Average Writing Time: 0.1696s
Average Total Time: 1.1270s
Min Total Time: 1.1170s
Max Total Time: 1.1508s
Cython Implementation:
Average Generation Time: 0.1371s
Average Writing Time: 0.1562s
Average Total Time: 0.2934s
Min Total Time: 0.2904s
Max Total Time: 0.2956s
Performance Comparison:
Generation Speedup: 6.98x
Writing Speedup: 1.09x
Total Speedup: 3.84x
Cython is 284.1% faster than Pure Python
================================================================================
- Generation Time: Time to create the 1M row DataFrame
- Writing Time: Time to write DataFrame to parquet file
- Total Time: Combined generation + writing time
- Speedup: Ratio of Python time / Cython time
- Percentage Improvement: ((Speedup - 1) × 100)%
| File | Purpose |
|---|---|
benchmark.py |
Main benchmark runner with statistics |
python_impl.py |
Pure Python implementation |
cython_impl.pyx |
Cython optimized implementation |
setup.py |
Build configuration for Cython |
test_validation.py |
Correctness validation tests |
requirements.txt |
Python package dependencies |
README.md |
Complete documentation |
You can modify the benchmark parameters by editing benchmark.py:
# Change number of iterations
num_iterations = 5 # Default: 5
# Change DataFrame size
num_rows = 1_000_000 # Default: 1 millionIf you encounter build errors, ensure you have:
- A C compiler installed (GCC on Linux, Xcode on macOS, MSVC on Windows)
- Python development headers (
python3-devon Ubuntu)
If you get import errors, make sure:
- You've built the Cython extension:
python setup.py build_ext --inplace - You're running from the project root directory
- All dependencies are installed:
pip install -r requirements.txt
Benchmark results may vary based on:
- CPU speed and architecture
- Available RAM
- Disk I/O speed
- System load
- Python version
- Experiment with different DataFrame sizes
- Add more data types and transformations
- Profile specific bottlenecks
- Compare with other optimization approaches (Numba, PyPy)
For issues or questions, please open an issue on GitHub.