Skip to content

STTT153/csc4005-parallel-programming-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSC4005 Parallel Programming Lab

This repository contains the four programming projects for the CSC4005 Parallel Programming course. Each project focuses on different aspects of parallel programming, from basic embarrassingly parallel problems to advanced GPU acceleration techniques.

Project Overview

Project 1: Embarrassingly Parallel Programming

This introductory project focuses on embarrassingly parallel problems using image processing as a practical example. Learn to implement parallel solutions using various programming languages and paradigms.

Key Concepts:

  • Embarrassingly parallel problems
  • Image processing algorithms
  • Multiple parallel programming languages ß

Project 2: Efficient Dense Matrix Multiplication

This project focuses on optimizing dense matrix multiplication, a fundamental operation in AI and scientific computing. Systematically improve performance through multiple optimization techniques.

Key Concepts:

  • Memory locality optimization
  • SIMD (Single Instruction, Multiple Data)
  • Thread-level parallelism
  • Process-level parallelism
  • Performance profiling and analysis

Project 3: Parallel Sorting/Searching Algorithms

This project explores parallel implementations of classical sorting and searching algorithms, which are more challenging due to dependencies between threads.

Key Concepts:

  • Parallel Merge Sort with parallel merging (CPU - OpenMP)
  • Parallel Quick Sort with parallel partitioning (CPU - OpenMP)
  • Parallel Radix Sort (GPU - OpenACC)
  • Parallel Multi-Data Binary Searching (CPU & GPU)

Project 4: Parallel Programming with FlashAttention

The final project implements FlashAttention, a high-performance attention mechanism used in large language models. Gain experience with modern GPU programming frameworks.

Key Concepts:

  • CUDA and Triton programming
  • Softmax implementation
  • FlashAttention v1 algorithm
  • Sparse matrix optimization
  • Modern LLM acceleration techniques

About

This repo contains the projects of csc4005, which is parallel programming course in CUHKSZ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors