Subject: Distributed Computing Systems (NT533)
Project duration: 24/03/2024 - 22/05/2024
- Nguyen Cao Thi
- Nguyen Tra Bao Ngan
- Le Huynh Anh Thu
- Thai Nhat Thu
In this project, we aim to develop a distributed computing system that tackles specific challenges in data processing and analysis. The system architecture is designed to handle big data and leverage distributed computing technologies to achieve high performance and scalability.
- Infrastructure: Kubernetes
- Collecting & Processing: Apache Kafka & Apache Spark
- Distributed database: Cassandra
[1] Sherry Tiao. March 11, 2024. What Is Big Data?
[2] The Kubernetes Authors. Overview of Kubernetes.
[3] Cao Lê Viết Tiến. 12/05/2021. Kubernetes là gì? Tìm hiểu về cách hoạt động của Kubernetes.
[4] Amazon Web Service AWS. What is Apache Kafka?
[5] Amazon Web Service AWS. What is Apache Spark?
[6] Cao Lê Viết Tiến. 19/06/2023. Apache Spark là gì? Tìm hiểu lợi ích khi sử dụng Apache Spark.
[7] Thuy Coi. 25/06/2018. Tìm hiểu về Cassandra. Cassandra là gì?
[8] Martin Kratochvíl. 07/02/2024. Building a Data Streaming Pipeline - Airflow, Kafka, Spark, Cassandra.
