This course describes the critical technology trends enabling cloud computing and the services and applications they offer. The course covers various advanced topics in data-intensive computing, including:
- Distributed file systems, e.g., HDFS
- No SQL databases, e.g., BigTable, Cassandra, Neo4j
- Big Data execution engines, e.g., Map-Reduce, Spark, Spark SQL
- Scalable messaging systems, e.g., Kafka
- Stream processing, e.g., Spark Streaming
- Graph processing, e.g., GraphLab, GraphX
- Resource management, e.g., Mesos, YARN, Borg
- Data lake, e.g., Delta Lake, Lakehouse
After passing the course, students should be able to (according to Bloom's taxonomy):
- ILO1: Understand the main concepts of data-intensive computation platforms.
- ILO2: Apply the grabbed knowledge to store and process massive data.
- ILO3: Analyze the technical merits of data-intensive computation platforms.
