CS5052: Data-Intensive Systems
This module is offered in 2020-21.
CS5052 is an advanced research focused module, which presents the programming paradigms, algorithmic techniques and design principles for large-scale distributed systems, such as those utilised by companies such as Google, Amazon and Facebook. It has a strong systems research flavour, which includes areas such as operating systems, databases, distributed systems and networking.
Please note the format of the course. The first few weeks of lectures will be presented by the lecturer and will introduce ground basics in distributed systems and distributed algorithms. The remainder of the class will be run as a series of student-led seminars based on the latest research in the field. Students will read, present and discuss several papers taken from a range of topic areas. In addition, there will at least one programming assignment.
- To present the programming paradigms, algorithmic techniques and design principles for large-scale distributed systems.
- To teach students how to engineer and work with systems which need to process big data.
- Distributed systems architecture and design
- Replication and fault tolerance
- Storage systems
- Coordination algorithms, e.g. Paxos
- Scheduling algorithms
- Cloud computing and virtualisation
- Programming models for big data, e.g. MapReduce and Spark
- Stream processing
- Decentralised systems, e.g. Chord
- Incentive-based systems, e.g. BitTorrent
- Social computing, e.g., crowdsourcing techniques
This module has no compulsory elements beyond those common to all modules (mark of 4 in each assessment component).