This repository contains the four programming projects for the CSC4005 Parallel Programming course. Each project focuses on different aspects of parallel programming, from basic embarrassingly parallel problems to advanced GPU acceleration techniques.
This introductory project focuses on embarrassingly parallel problems using image processing as a practical example. Learn to implement parallel solutions using various programming languages and paradigms.
Key Concepts:
- Embarrassingly parallel problems
- Image processing algorithms
- Multiple parallel programming languages ß
This project focuses on optimizing dense matrix multiplication, a fundamental operation in AI and scientific computing. Systematically improve performance through multiple optimization techniques.
Key Concepts:
- Memory locality optimization
- SIMD (Single Instruction, Multiple Data)
- Thread-level parallelism
- Process-level parallelism
- Performance profiling and analysis
This project explores parallel implementations of classical sorting and searching algorithms, which are more challenging due to dependencies between threads.
Key Concepts:
- Parallel Merge Sort with parallel merging (CPU - OpenMP)
- Parallel Quick Sort with parallel partitioning (CPU - OpenMP)
- Parallel Radix Sort (GPU - OpenACC)
- Parallel Multi-Data Binary Searching (CPU & GPU)
The final project implements FlashAttention, a high-performance attention mechanism used in large language models. Gain experience with modern GPU programming frameworks.
Key Concepts:
- CUDA and Triton programming
- Softmax implementation
- FlashAttention v1 algorithm
- Sparse matrix optimization
- Modern LLM acceleration techniques