Skip to content

N3Twork-nc/Distributed_systems_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 

Repository files navigation

Big Data Processing

Subject: Distributed Computing Systems (NT533)
Project duration: 24/03/2024 - 22/05/2024

Member of project

  1. Nguyen Cao Thi
  2. Nguyen Tra Bao Ngan
  3. Le Huynh Anh Thu
  4. Thai Nhat Thu

Abstract

In this project, we aim to develop a distributed computing system that tackles specific challenges in data processing and analysis. The system architecture is designed to handle big data and leverage distributed computing technologies to achieve high performance and scalability.

System architecture

image

Technologies

  1. Infrastructure: Kubernetes
  2. Collecting & Processing: Apache Kafka & Apache Spark
  3. Distributed database: Cassandra

References

[1] Sherry Tiao. March 11, 2024. What Is Big Data?

[2] The Kubernetes Authors. Overview of Kubernetes.

[3] Cao Lê Viết Tiến. 12/05/2021. Kubernetes là gì? Tìm hiểu về cách hoạt động của Kubernetes.

[4] Amazon Web Service AWS. What is Apache Kafka?

[5] Amazon Web Service AWS. What is Apache Spark?

[6] Cao Lê Viết Tiến. 19/06/2023. Apache Spark là gì? Tìm hiểu lợi ích khi sử dụng Apache Spark.

[7] Thuy Coi. 25/06/2018. Tìm hiểu về Cassandra. Cassandra là gì?

[8] Martin Kratochvíl. 07/02/2024. Building a Data Streaming Pipeline - Airflow, Kafka, Spark, Cassandra.

About

Big Data processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •